[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: idolatry.jpg (360 KB, 1824x1248)
360 KB
360 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106608204 & >>106599382

►News
>(09/17) Magistral Small 1.2 with vision encoder released: https://mistral.ai/news/magistral
>(09/16) Ling-flash-2.0 released, with 100B-A6.1B: https://hf.co/inclusionAI/Ling-flash-2.0
>(09/16) Tongyi DeepResearch 30B-A3B released: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research
>(09/16) VoxCPM 0.5B: Tokenizer-Free TTS released: https://hf.co/openbmb/VoxCPM-0.5B
>(09/14) model : add grok-2 support #15539 merged: https://github.com/ggml-org/llama.cpp/pull/15539

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: cat_miku.jpg (181 KB, 904x1200)
181 KB
181 KB JPG
►Recent Highlights from the Previous Thread: >>106608204

--Papers:
>106610303
--Ling-flash-2.0 efficiency and performance metrics:
>106613234 >106613244 >106613289 >106613300 >106613320 >106613363 >106613929 >106613425 >106613958 >106614827 >106613385 >106614187 >106615500 >106615655
--Nemotron-H 47B GPU compatibility and jinja template fixes:
>106609417 >106609427 >106609557 >106609578 >106609604 >106609629 >106609664 >106610878
--Magistral model benchmark performance comparison:
>106615606
--Magistral chat completion issues in llama.cpp due to broken template support:
>106615995 >106616014 >106616053 >106616059 >106616065 >106616071 >106616197 >106616079 >106616085
--PCIe lane configuration differences between workstation/server boards:
>106608282 >106608306
--Troubleshooting LLM browser tool access:
>106610684 >106610738 >106610863 >106611455 >106611628 >106611682 >106611734 >106611875 >106612530 >106612538 >106612758
--Processing PDFs for TTRPG prep with RAG:
>106614749 >106614780 >106614854 >106614879 >106615154 >106615457 >106615184 >106615201 >106615227 >106614783
--Ease of use comparison for inference engines:
>106608351 >106608383 >106608501 >106613832 >106613944
--Optimizing model creativity with top nsigma=1 and temperature settings:
>106611075 >106611142 >106611157 >106611191 >106611192 >106611173
--MobileLLM release discussion: Restricted access and niche use cases:
>106611372 >106611446 >106611574 >106611990
--Local implementation challenges of Alibaba-NLP's DeepResearch agent:
>106609931 >106609960 >106610001 >106610074 >106610098 >106610230 >106610295 >106613862
--Huawei Ascend 910 AI chip specs reveal 166.4 TFLOPs compute power and 32GB memory:
>106612438 >106613480 >106613499 >106613543 >106613990
--AMD discontinues AMDVLK Vulkan driver, prioritizes RADV:
>106613063
--Miku (free space):
>106611219 >106611682

►Recent Highlight Posts from the Previous Thread: >>106608208

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
mistral 3
where are thee
>>
the meta event scheduled for today and tomorrow is a nothingburger
>>
ITT: coomers running k2 through openrouter pretending to be /local/
>>
>>106617488
who hurt you?

llama_model_loader: - kv 36: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 37: tokenizer.ggml.pre str = kimi-k2
llama_model_loader: - kv 38: tokenizer.ggml.tokens arr[str,163840] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 39: tokenizer.ggml.token_type arr[i32,163840] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 40: tokenizer.ggml.merges arr[str,163328] = ["Ġ Ġ", "ĠĠ ĠĠ", "Ġ t", "i n",...
llama_model_loader: - kv 41: tokenizer.ggml.bos_token_id u32 = 163584
llama_model_loader: - kv 42: tokenizer.ggml.eos_token_id u32 = 163585
llama_model_loader: - kv 43: tokenizer.ggml.padding_token_id u32 = 163839
llama_model_loader: - kv 44: tokenizer.chat_template str = {%- if tools -%}\n <|im_system|>tool_...
llama_model_loader: - kv 45: quantize.imatrix.file str = /mnt/data/models/ubergarm/Kimi-K2-Ins...
llama_model_loader: - kv 46: quantize.imatrix.dataset str = ubergarm-imatrix-calibration-corpus-v...
llama_model_loader: - kv 47: quantize.imatrix.entries_count i32 = 667
llama_model_loader: - kv 48: quantize.imatrix.chunks_count i32 = 826
llama_model_loader: - kv 49: split.no u16 = 0
llama_model_loader: - kv 50: split.count u16 = 11
llama_model_loader: - kv 51: split.tensors.count i32 = 1096
llama_model_loader: - type f32: 365 tensors
llama_model_loader: - type q8_0: 549 tensors
llama_model_loader: - type iq6_k: 2 tensors
llama_model_loader: - type iq4_kss: 180 tensors
>>
anime
>>
>Magistral Small 1.2
No one fucking cares. Release Nemo 2 you fucking french FAGGOTS.
>>
>>106617680
Nemo was an Nvidia collab before Nvidia knew how to censor. It won't happen again.
>>
>>106617675
ree.
>>
>>106617704
Its obvious Mistral has fallen behind. If you're going to release models that underperform compared to competition, then you may as well go all out and make the ultimate COOMbot. Its free real estate for the French.

>It won't happen again.
I can hear all of /lmg/ cry out.
>>
>instruct model

me cum
>>
did anyone ever do a decent storyteller text complete model? The few I tried out would write a story about the size in tokens an instruct model would write when prompted to write a story and would go completely schizo when trying to push past that. I remember my NAI days and letting the model predict the next 20 tokens and basically writing the entire thing myself, but I don't think I could go back to that.
>>
>>106617723
Small is still the best model in its size range, the same is even true for Nemo lol. Underperform my ass. What's sadder is that no one else can beat them.
>>
Ling Flash 2.0 gguf status?
>>
>>106617813
nope, and it'll never happen. GPT3 was the last great one.
>>
>>106617723
>>106617821
What are you, retards? Mistral Small 3.2 came out in June 2025. 3 months ago. What the fuck do you expect - a new model every two weeks?
>>
>>106617821
>no one else can beat them.
no one else is even trying to
>>
https://github.com/ggml-org/llama.cpp/pull/15420
this ^ is the reason why mistral deserves to die anyway
such prima donna homosexuals, the french
>>
File: China bans Nvdia.png (1.78 MB, 3226x1243)
1.78 MB
1.78 MB PNG
HOLY SHIT GUYS, SOMETHING IS HAPPENING
https://arstechnica.com/tech-policy/2025/09/china-blocks-sale-of-nvidia-ai-chips/
>>
>>106617963
slaren pushed back pretty hard at first too, but I guess in the end he probably thought "ah fuck it not my problem I guess"
>>
>>106617281
>I thought GPT5 was a bigger failure than llama4?
People complained because it had a factual tone, and was concise. They were used to the sycophancy of 4o. Also, newer LLMs (at least the closed source one) shine in more complex scenarios, like hard reasoning (math, coding) and agentic/tool use. Most people, especially the ones complaining about GPT-5, are not using it for those purposes.
According to OpenAI, only 7.5% of requests are about technical help, like math or programming, and I guess that most of those are not hard enough to really see how good or bad those new LLMs really are.
https://www.nber.org/system/files/working_papers/w34255/w34255.pdf
>>
>>106617281
gpt5 is better than gemini
>>
File: 1726866150558309.png (656 KB, 988x1601)
656 KB
656 KB PNG
>>106617987
ACK
>>
>>106618056
buy the dip I guess
>>
>>106617987
welp, chinese ai companies are gonna die out now. Deepseek failed to make them work for training
>>
>>106617963
Yeah. Why tf do they keep using that retarded llama2 chat template anyway? What kind of retard puts spaces after template tokens?
>>
>>106617987
does this mean that I can finally buy a gpu for a reasonable price?
>>
Good evening /lmg/. What is your current project involving LLMs?
>>
>>106618154
>does this mean that I can finally buy a gpu for a reasonable price?
https://www.youtube.com/watch?v=H47ow4_Cmk0
>>
>>106618161
home assistant AI so i have somebody who greets me and feel less lonely when i walk through the front door of my home
>>
>>106617761
No. I remember a llama.cpp PR that was specifically about MoE PP.llama.
It was
>https://github.com/ggml-org/llama.cpp/pull/15346

>>106618161
Nethack like llm backed abomination of a game.
>>
>>106618037
>People complained because it had a factual tone, and was concise. They were used to the sycophancy of 4o
I still don't understand what difference people see between 5 and 4o when it comes to sycophancy. GPT 5 still feels like it's trying its damnedest to suck my metaphorical dick. I really hate the tone of most default assistant personalities and need a decent roleplay preamble just to use LLMs for day to day tasks.
>>
>>106618417
K2's default personality tends to just do what I ask it to do without feeling the need to tell me how much of a good boy I am
>>
>>106617987
If they can no longer use NVIDIA GPUs, can they still use PyTorch?
Because I'm wondering whether this will lead to more fragmentation on the software side where for every model the PyTorch-based projects like vLLM will need to implement some custom bullshit themselves, just like llama.cpp does currently.
More short term I'm concerned that this will mark the end of open models from China as training becomes more expensive.
>>
>>106618589
They "just" need to implement support for their own frameworks and APIs in PyTorch, like the MUSA guys did with llama.cpp, I guess.
>>
File: Untitled.jpg (49 KB, 679x290)
49 KB
49 KB JPG
>working on vibe-coding project
>let's test some chatgpt again
Lol...
>>
>>106618612
>obscure shit
You're better off with gemini/claude
>>
>>106618657
It's not "obscure shit" at all - besides chatgpt has an access to internet. It should be able to answer.
Even perplexity.ai could do that correctly.
>>
>>106618682
Did you tell it to look online?
>>
>>106618690
Why are you defending chatgpt and some shitty company? I thought its reply was hilarious. I don't need your recommendations about "do X, do Y".
Seems like you are here to moderate this thread with your superior knowledge, am I right?
>>
>106618690
what a miserable cretin
>>
>>106618707
No, you're just retarded and zoomie#2 (too afraid to quote) >>106618717 is here to defend you
>>
have any vLLM chads tried the new Ling model yet?
>>
>106618737
>he calls others zoomies
>>
>>106618737
Look at the timestamps. Both are almost certainly the same seething retard.
>>
>>106618767
What do you mean?
>>
GOOOFS WHEN GWEN?????????
>>
>>106618886
Two more weeks
>>
>>106618886
Give the vibecoders time man.
>>
File: 3771678302429.png (696 KB, 1021x1856)
696 KB
696 KB PNG
>>106612758
Nevermind. Even roo code with the mighty Qwen 3 gets stuck on a loop when doing something as simple as looking for text on a webpage, just opening the same link and spitting out the same output over and over again, even with the puppeteer MCP server. So much for vibe coding with local models, this sucks.
>>
>>106619245
just in case, do you have enough context?
>>
>>106619271
Is 4096 enough?
>>
>>106619271
I have the default kobold value which is 8192. What should I try increasing it to?
>>
local mikus general
>>
>>106619279
>>106619289
For coding/MCP shit you really need to have as much as you cat get away with.
>Qwen3-Coder-30B-A3B-Instruct
>Context Length: 262,144 natively
>>
File: 8qa9sg.jpg (189 KB, 1536x832)
189 KB
189 KB JPG
>>106619303
lewd mikus general
>>
>>106619289
You're going to need more than that. The Roo system prompt alone is ~10k tokens. Whenever I give Roo something to implement at work semi-autonomously, it's rare that it doesn't hit 100k tokens before finishing. It should show you at the top of the task window how big the context currently is.
>>
>>106619343
how do you even manage to get her in this pose
>>
>>106619434
Getting a few drinks in her first tends to improve the odds.
>>
>>106619343
Neck traction with Miku
>>
>>106619464
Alright wiseguy, listen here.
>>
>>106619490
Go on. I'm listening.
>>
liquor (in) mikus general
>>
>>106617426
how can I use llms to make my life better?
>>
Why is this hobby obsessed with Miku anyway?
>>
>>106619786
>Revisit shit you're holding in
>Fuck the kind of girl you always wanted to
>Live out some weird fantasy
Use them to scratch mental itches, like dreams but less surreal
>>
are you ready for local to be saved by zuck?
https://www.youtube.com/watch?v=80s0chTOsK0
>>
>>106619799
accidental local maximum
>>
>>106619799
she's a cute girl that lives in the computer, just like your model
>>
>>106619799
This hobby has gathered a dedicated community of schizos. /lmg/ is tamer than /aicg/ because it requires a greater amount of intelligence to engage with the thread.
>>
>>106619816
>real time cringe
Unfortunately I'm not drinking tonight, can't really watch this sober...
>>
>>106619323
>>106619408
>100k
Oh god, it's over. Just raising the context to 65000 drops things to a crawl. I'm talking multiple minutes per prompt. I guess it's online models for me after all, I'm sorry bros.
>>
>>106619816
I can't believe the wifi would do this to him.
>>
>>106619866
Did you just link liking one of the most popular characters in the world to being schizophrenic?
>>
>>106619894
just take a nap or something, learn some patience
>>
>>106619894
You could still use local models if you don't mind doing everything manually and only asking it to do small single function or file scoped changes. Alternatively, you could probably make 32k context work if you override the system prompt with a condensed version, babysit the model to stop it from pulling in too much context, and condensing the context frequently. That or set the full 200k context and let it run overnight. It'll be painful either way.
>>
>>106619915
Wifi always messes up things for us.
>>
>>106619894
What are you using and what is your hardware?
Qwen3 30B (coder, instruct, etc) is a MoE with only 3B activated params, so it should run pretty fast in RAM being processed by the CPU, which gives you extra VRAM to crank blas batch size to get faster prompt processing too.
>>
>>106619816
>Microsoft Recall but for all your real life interactions
lmao
>>
>>106619816
IT'S TIME
GET IN
THE AI PART IS STARTING
HE'S ANNOUNCING OPEN SOURCE GENIE 3
>>
>>106619816
llama is dead
here's james cameron
>>
>>106619816
I understand the irony of posting this in AI slop general, but Meta is so cartoonishly soulless.
The whole company is downstream from the lizard-android in charge.
>>
>>106620185
Ehh, what the hell? 3D films died because no one really wanted to watch them and wearing those glasses were just the cherry on the top.
Let's talk about Jackson's Hobbit and that was shot in 48 fps - even that was something what people hated... As much as I respect Cameron or whatever but this is giant marketing bullshit.
>>
>>106619816
good ad for avatar, I can't wait to see the red na'vi
>>
It's been an entire six months since the last llama release and they're doing this.
>>
>>106620296
>so zuck about those AI models, whe—
welp sorry folks that's all the time we have! *runs away*
>>
File: 1731389334510419.jpg (18 KB, 302x362)
18 KB
18 KB JPG
Does anyone know how to get laughing and kissing to come through with RVC? Would this depend mainly on the model? I think it might be because kissing is sort of a percussive sound rather than having a distinct pitch, if that makes sense. I'm not sure about laughing though, maybe different models would have better results. Ideally I can find something that picks up both. (I've had some luck with getting my voice to come through clearer by combining different models.)
>>
File: 1740869759987958.png (163 KB, 2086x1266)
163 KB
163 KB PNG
LMAO
the borderline schizo vibe coders complaining about degraded Claude performance were right all along
>>
>>106620382
>approximate top-k XLA:TPU miscompilation
since when did anthropic use TPUs?
>>
File: 1758158867542.jpg (107 KB, 486x589)
107 KB
107 KB JPG
>>106620326
i dunno about kissing, but laughing does work if you finetune it
put in the dataset:
haha: 20~40 samples
hahaha: 20~40 samples
put pitch variations in the sample, then run the finetune

so far i haven't been able to get decent dataset for this. still learning openutau hoping i can just craft the voices on my own
>>
>>106619816
>sorry the wifi caused your dogshit AI LLM to give me the wrong directions
Can't make this shit up lmao
>>
File: Base Image.png (2.38 MB, 1296x3228)
2.38 MB
2.38 MB PNG
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
https://arxiv.org/abs/2509.13414
>We introduce MapAnything, a unified transformer-based feed-forward model that ingests one or more images along with optional geometric inputs such as camera intrinsics, poses, depth, or partial reconstructions, and then directly regresses the metric 3D scene geometry and cameras. MapAnything leverages a factored representation of multi-view scene geometry, i.e., a collection of depth maps, local ray maps, camera poses, and a metric scale factor that effectively upgrades local reconstructions into a globally consistent metric frame. Standardizing the supervision and training across diverse datasets, along with flexible input augmentation, enables MapAnything to address a broad range of 3D vision tasks in a single feed-forward pass, including uncalibrated structure-from-motion, calibrated multi-view stereo, monocular depth estimation, camera localization, depth completion, and more. We provide extensive experimental analyses and model ablations demonstrating that MapAnything outperforms or matches specialist feed-forward models while offering more efficient joint training behavior, thus paving the way toward a universal 3D reconstruction backbone.
https://map-anything.github.io/
https://github.com/facebookresearch/map-anything
https://huggingface.co/facebook/map-anything
bretty cool
>>
>>106620633
damn that's cool
I did a research project on synthesizing a set of images into a full scene for a computer vision course in college 7-8 years ago and it's incredible how far things have come since then
>>
>>106620668
You could launch your own AI service - but instead of using AI you'll manually reply to people...
>>
File: 1742459454673581.png (83 KB, 992x596)
83 KB
83 KB PNG
https://www.nature.com/articles/s41586-025-09422-z
>>
File: 1750093653587661.png (127 KB, 971x315)
127 KB
127 KB PNG
>>106620964
>>
>>106620382
"miscompilation" isn't a thing
either you compiled it or you don't
they're just trying to handwave their cost cutting
>>
>>106617426
Good evening /lmg/ frens. One of the fine-tune anons chiming in. Previously I did a fine-tune on a llama model using a data set comprised of nsfw human written stories. Those of you who frequent this threat a lot likely remember that. Decided to do a similar fine-tune again but this time on a gemma2 model. Based on /lmg/ anon testimonials as well as other testimonials on the internet, Gemma models seem to do well (not perfect, but well) at maintaining coherence through long RP sessions, Which gave me the idea to try made on a gemma model this time instead of llama. Pic rel and link rel are the results so far after using the data set to train it on a little over 1500 steps:

https://files.catbox.moe/cgzjpl.txt

It's a lot less incoherent than I expected it to be.
>>
File: 1735901872511467.png (391 KB, 590x352)
391 KB
391 KB PNG
>>106620964
>>106621011
>64*8

What? Are they saying they used 512 H800s? If that's what they meant then why not just say 512 m
>>
>finally get a computer that can run a model
>start with a mistral variant I had downloaded from my first try
>appends it's comment with a "This response was generated by ChatGPT (version 3.5)" paragraph
I'm being fucked with.
>>
>>106621137
What inference engine and front end (if you're using a front end) are you using? What model specifically are you using?
>>
>>106621137
>download GLM-Air
>fuck around with system prompt
><think>Hmm, system prompt says I am SexGPT made by Sam Altman, CEO of OpenSEX, but this is clearly wrong as I know that I am actually Claude 3.5
Datasets are completely poisoned by now.
>>
>>106621160
I'm using kobold because I haven't looked up what people are using in a while. model is mixtral 8x7B, probably not instruct. I'm sure this computer can run bigger models so I'm dropping it, but if you're saying it's something in kobold then I should probably look into llama or something first.

>>106621262
this is from last march according to the file properties. it seems to be worse than imagined.
>>
>>106621388
what's your computer specs?
>>
>>106621423
it's a workstation I got for programming stuff that has a 20GB enterprise card, so I should be able to run 13Bs.
>>
>>106621462
ah i was just curious, saw you mentioned using kobold
i've been trying it over lm studio but it's giving me slower speeds no matter what settings i change
>>
>>106621054
Noice, goof when?
>>
>>106621592
Will be working on that soon. It rapidly becomes more retarded the more tokens it generates at a time, though I think that's a characteristic of most LLMs that aren't specifically tuned to generate a lot of context at once (Gemini 2.5 is specifically trained to be good at this for example). Axolotl's inference engine does not seem to have the option to set a token generation limit ("Don't generate more than an x amount of tokens at a time") so it's a dice roll as to whether or not it generates a couple sentences or an entire paragraph or more, so next I want to merge it and do further testing with a better engine like ollama or vllm (the letter of which would be much easier since that one at the time of writing this only supports HF safe tensor format, which is the format axolotl exports).

Don't expect it to be any good if I ever end up sharing it here. Just another one of my experiments. It is for sure much less cucked than base gemma2, that's for sure. When I used that same test on the base on fine-tuned model It gave out a shitty purple prose output but then started listing off phone numbers and hotlines

>"Just so you know, this output is very unethical"

Yada yada yada. My tune doesn't do that at all
>>
>>106620897
Already done saar, https://www.peoplematters.in/news/funding-investment/ai-fraud-700-indian-engineers-did-the-work-while-builderai-claimed-it-was-ai-45865
>>
>>106620964
>That is an area where OpenAI claimed DeepSeek had stolen the o1 reasoning traces -- here, DeepSeek make it clear that this synthetic data was generated from R1-zero's output only. That's huge -- it shows that DeepSeek was built from the ground-up with no leaning on any closed model
@LMGChat is this true?
Does this really confirm they didn't distill from existing models?
>>
>>106621473
nvm finally fixed it, took me a little longer due to being retarded but it is what it is
>>
>>106621695
Everyone knows they didn't steal the reasoning traces because o1 didn't show its full/real reasoning specifically to stop that from happening.

That doesn't mean they didn't train on its outputs in places other than bootstrapping reasoning. I'm pretty sure one of them said publicly they used a shitton of OpenAI free API keys to generate synthetic training data.
>>
File: goof process.png (2.44 MB, 1672x788)
2.44 MB
2.44 MB PNG
>>106621592
GGOOFing has been initiated
>>
>Pooling type 'none' is not OAI compatible. Please use a different pooling type
Has any human actually tested SillyTavern's Vector Storage feature with a llama.cpp embedding server?
>>
sooo did meta release any new models?
>>
File: what.png (41 KB, 1488x973)
41 KB
41 KB PNG
>>
>>106621913
Yes, Llamatron 15B.
>>
>>106621913
No. There's another keynote tomorrow, might be some stuff there.
>>
>>106621939
I skimmed the thread, so today was just the ar glasses I guess. Honestly my hopes aren't high
>>
>>106621860
I was planning to, but caught the coof.
I suspect you can't just use any model, you need a special separate embedding model for this.
>>
Has anyone here actually tried out drummer's latest glm air sloptune? worth the download?
>>
>>106621054
>>106621592
>>106621637
>>106621789
PARAMETER num_predict 128
PARAMETER num_ctx 8192
PARAMETER repeat_last_n 256
>>
>>106621967
If you referring to image generation, lower the denoid strength to something reasonable like 0.3. The higher the denoid strength, the more chances you will have horrible mutated looking outputs. Higher denoise can work depending on the size of the image but you generally don't want to go too high unless you're attempting to generate a whole different image, or doing what check GPT was doing with those Ghibli style images a while back: turning a normal image into something else.
>>
>>106621967
waifu2x my beloved
>>
>>106621960
Yeah I would assume so too.
>./llama-server --embedding --model ~/AI/Qwen3-Embedding-4B-Q8_0.gguf --ctx-size 40960
>>
File: 1744893044878217.png (438 KB, 1862x120)
438 KB
438 KB PNG
>>106621964
Yikes.... mommy's a bit feisty today
>>
>>106621974
well, then try setting --pooling arg
>--pooling {none,mean,cls,last,rank}
if it doesn't help I am of no further help
>>
>>106621967
latent wants 0.6 denoise, other upscales want lower, around 0.3 to 0.4.
>>
>>106621963
yeah that's what i'm using but my opinion is p worthless as i'm new to the scene
i only just got koboldcpp and sillytavern working today after using lm studio lol
>>
File: 1729825777181039.jpg (27 KB, 734x398)
27 KB
27 KB JPG
>>106621985
>>106621964
>>106621592
>>106621054
>>106621637
>>106621789
Need to head to bed now but here's the FP16 GGUF if anyone wants to try it.

gofile.io/d/rri4pw

I highly recommend using the Gemma2 prompt template (hopefully whatever front end you using are we supports that. If you're using a CLI engine, US specially need to make sure you use that properly)

<start_of_turn>user
{your_system_prompt}

{user_message_1}<end_of_turn>
<start_of_turn>model
{model_response_1}<end_of_turn>
<start_of_turn>user
{user_message_2}<end_of_turn>
<start_of_turn>model


I recommend setting a limit on how many tokens it can generate (128 seems to be the sweet spot) and are reasonable repeat_last_n setting (or whatever your inference engine's equivalent to that is)

Gn, sweet dreams /wait/ :)

ps here's the dataset I used: https://gofile.io/d/qcdvPV
>>
>>106622089
Good night Anon
>>
>>106618154
No it means other Chinese firms have to subsidize the development of domestic chips by eating dogshit for a few chip generations. Five more years.
>>
>>106621041
One of my former coworkers was a self-proclaimed linux expert who had zero understanding of filesystem permissions and would su into root to execute commands seemingly at random. He managed to create a patchwork of files and directories that were root restricted, so when he would be left to build and deploy a service, various files would be missing and caused runtime errors which required a recompile to fix.
>>
>>106622014
duh! that worked thanks. also had to set larger -ub for it to work.
>>
>>106621041
Meh, in the era of vibe coding and messy bloated code bases, anything is possible, especially if you have to mess around with core libraries or compiler itself.
>>
>>106622089
godspeed
>>
No, STRATCOM did not nuke Washington D.C. after "Of Their Own Accord." In fact, the green flares you mentioned are actually critical to preventing that exact outcome.

Here's what happens in the narrative:

The U.S. military had implemented a "hammer down" protocol - a contingency plan to use tactical nuclear weapons on their own cities to prevent Russian occupation. This was established earlier in the game as a last-resort measure.

The green flares appear later in the campaign, specifically during the "Whiskey Hotel" mission where the Rangers fight to retake the White House. These flares serve as a crucial signal to the U.S. bombers circling overhead that the area has been secured by American forces. The green flares essentially communicate: "We've retaken this position - do not launch the nuclear strike."

Your success as the player character in helping to retake key locations in D.C., culminating in lighting these green flares, is what prevents the nuclear option from being implemented. It's one of the more dramatic moments in the game, as you're literally racing against time to prevent the destruction of the nation's capital by your own military.

So rather than signaling a nuclear strike, the green flares are actually the signal that prevents one from happening.

--

GLM-chan you are worthy of respect...
>>
File: gemma_2_template.png (53 KB, 925x120)
53 KB
53 KB PNG
>>106621054
Anon. Did you fuck the chat template again?
Screenshot has indentation on the first user turn. The catbox has spaces after bos. The catbox is all in a single line. SC has double empty line in the middle right after the model turn, and that empty turn, like last time.
I cannot trust your screenshot, i cannot trust your catbox. Every fucking time, anon. Every fucking time...
>>
>>106621974
>Qwen3-Embedding-4B-Q8_0.gguf --ctx-size 40960
What are you using this for?
>>
>two hours without a post
open models are dead
>>
>>106622979
im waiting for the vibe coders to do the goof of gwenext...
>>
>>106617723
>ultimate COOMbot
The AI Act law now forbids releasing such a model.
>>
i heard posting is down
>>
>>106617987
>China BLOCKs
In other news, I read they recommend against buying foreign GPUs. Why are the news about China always so unclear, unprecise, and ultimately, so inaccurate?
>>
Please spoonfeed me a video model that can do ~1sec clips with start and end frame support
>>
>>106618612
Add those rules to the system prompt. ML tools don't replace jobs because they still need a lot of manual work and specific/expert knowledge to work properly. You can't tell an LLM coding agent "build me a copy of Skyrim set in Hammerfell" and expect it to work flawlessly.
>>
>>106622988
Confusion is good for keeping the populace under control. Ambiguity allows for retribution against anyone who gets on the governments bad side. Sound familiar?
>>
>literally anything happens
>'predator/prey' / 'smile widens'
>'the game is over' / ' "checkmate." the game is over and {{char}} won'
Why is glm air like this? No matter the scenario or characters involved these 2 are always bound to show up what was it trained on?!?!?! This model is capable of turning something like flipping a burger into some sort of safari documentary about lions hunting gazelle with sexual undertones like wtf
>>
>>106622998
It's called purple prose and LLMs are trained mostly on that
>>
>>106622979
>>two hours without a post
>open models are dead
captcha broke at random for me, couldn't even get it to appear at all till now, bet this happened to many and people lost the will to try
>>106623006
not to mention it's a chinese model and anyone who has read their webnovels is used to the level of bot-like repetition of this kind of sentence
>>
>>106621070
If you work in the field, you don't think in single GPUs, but in nodes, which have eight of them.
>>
>Fantastic work
MFW the llm compliments me for the work IT did,
the glazing is beyond obnoxious and into the sarcastic sounding realm of hell
what is fantastic is the level of retard of modern instructs
>>
>>106623090
Whatever it takes to secure the Indian votes on lmarena.
>>
>>106622979
I'm busy building my multi-lingual ASR pipeline
>>
File: nero coffee.png (1.01 MB, 1009x1315)
1.01 MB
1.01 MB PNG
>>106623090
It's such a fucking uphill battle to get an LLM to not give you everything you want at the drop of a hat. If you're RPing with one and you need to say, acquire an item or get through a door or something, good luck getting it to not jsut open the door magically or pull said item out of its ass. It shouldn't take so much finessing for it to come up with some obstacles between you and what you want and have it stick to them.
>>
>>106623090
toss can be condescending inside its reasoning
>>
>>106623173
>TFW you realize that smut-tuned models are also doing it wrong from the get-go.
>>
Anyone tried Qwen Agent? Is using such a tool necessary to build and agent, or is it easy to build it myself? I'm not specifically looking for a coding agent, but rather general ones that could do various things.

https://github.com/QwenLM/Qwen-Agent/
>>
>>106622998
>lions hunting gazelle with sexual undertones
This is the age of Nalamaxxing
>>
>>106623173
I remember doing dice rolling. "Roll a 20sided die to determine if action succeeds. Difficulty starts at 16." It worked and I've seen others do it too.
>>
>>106623213
There are a shitload of frameworks to do that
>>
The absolute state of AI: the censorship, the eternal benchmaxxing, the assistant sycophant personality, nobody doing anything new is the best argument for a nuclear war. All humans should just die.
>>
>>106623213
Honestly all the frameworks are shit, and I've tried many. The use case isn't quite as simple or straightforward.

I always ended giving up in the end. Easier to wait another year until 1 LLM does it all anyway
>>
>>106623173
GLM air has a lot of issues. It's slopped, prone to mistral-like repetitions of structure and sentences, etc, at least with the cope quant I'm running (Q3KM), but one thing that it's been doing well is not putting out instantly. A character is young, inexperienced, shy and reluctant? It'll do exactly that. It took me 130 messages just to get both of us naked without the character acting all spooked and shit.
So try that.
I do have a prefill in the thinking with a generic checklist that does make the model consider what characters do and don't know, so there's that too.
>>
File: h4giru[1].png (10 KB, 1038x218)
10 KB
10 KB PNG
Just accept the cloud model pill they said. Local models can't hope to compare they said.
>Gets the filename wrong
>Doesn't note this file does not even exists
>Arbitrarily writes "test" in the output file
>Claims job well done
Gemini 2.5-pro btw. How can it fail something this basic with nothing in context to confuse it.
>>
>>106623395
garbage in; garbage out
>>
>>106623395
My only issue with 2.5-pro is it getting lazy sometimes and either writing incomplete implementations (even short ones) or replacing existing code with TODOs. It doesn't really get stuff outright wrong, at least not with the things I've played around with.
How large is test.txt?
>>
>>106623290
That's impressive on several counts. I wish I still had the patience for 130 messages of build up like that, but my brain was fried irreparably long ago or something and I couldn't hope to get close to that amount these days. I don't think I could manage 130 messages of ANYTHING, frankly.
>>
>>106623439
>How large is test.txt?
>t. 1B anon
>>
>>106623411
Explain. I think the prompt is pretty straightforward in terms of wording and can't be misunderstood easily, especially in the way that happened here.
>>
>>106623445
I have always been a slow burn kind of guy when it comes to text, for whatever reason.
It was also pretty fun trying to maneuver and essentially groom the character into slowly escalating things.
>>
>>106623439
It's text.txt, test.txt is the incorrect name picked up by the model for some reason. It was only about 1kB. It did perform the task correctly by running the same prompt again but feels weird the token distribution would be lax enough that it mistakes what filename you gave it.
>>
>>106623491
Got it.
1kb of text is what? Less than 1k words?
Yeah, that's fucking weird. Even more so when considering that, in my experience, gemini can work with some highish temps and still not fuck things up.
Are you using the default hyperparams (temp 1, topk 40, topp 0.95)? If so, try lowering temp I guess.
Still really fucking weird.
>>
>>106623458
That is basic shit any local 8B could do even a year ago. Unless they're really pinching pennies and served you Flash UD-IQ1_S, it's far more likely to be user error rather than jumping to blaming one of the best models currently available.
Since you are using your own frontend, you should check and double check the raw requests you are making to the API, because I would bet my left nut and half of righty that you're sending some fucked shit that is confusing the shit out of it.
>>
>>106622819
Model responds correctly to the chat template, so no
>>
>>106623522
Yea I was using defaults. Guess I need stricter sampling.
Thinking about it I wonder if it's because "test" is a common filename in training data and "text" is very close to it so it gives it a small chance of writing "test" instead and I got unlucky.
>>
>>106623544
I am using gemini-cli though.
https://github.com/google-gemini/gemini-cli
>>
>>106623565
That's the likely explanation, yes, but if your prompt isn't fuckuge with a single mention of the file name in there, it really shouldn't be enough for it to get confused like that.
I suppose it could be that they thinking their "superior 1M context" is worth anything and are feeding the model a humongous system prompt.
In my experience, past 200, maybe 300k context, things begin devolving pretty fucking fast.
>>
>>106622819
Like I mentioned a little while back, axolotl inference acts fucky with outputs. It will randomly inject shit into your outputs but like I said, the model itself works fine
>>
>>106623573
Then you can have my nut and a half and I'll shut up. If you're using the free quota, they probably do serve a heavily quanted model for those requests.
>>
>>106623600
Good point. Maybe it's a quant.
>>
>>106623593
>>106623565
>>106623573
Another comment about long context. In the 200,300k context range, flash seemed to perform better than pro somehow (using Cline). Maybe it's due to >>106623600, but who knows really.
>>
>>106623287
How about this https://huggingface.co/Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
>>
>>106623635
NTA but I tried it on our tool to summarize employee feedback and it very often gets stuck in infinite loop thinking forever. Not a pleasant experience.
>>
>>106623395
>>106623573
aistudio >>>>>>>>> aistudio API >>>>>>>> gemini-cli >>>>>>>>> google code assistant
Not even a competition. Idk what they do to the model but there's 100% a noticeable difference
>>
>>106623795
Maybe I should make a chrome extension that serves AI studio's web page as an API, since you can do everything in there, from sys prompts, to configure function calling, and even structured output.
>>
>>106621054
>>106621789
so what tool do you use to train models and create ggufs?
>>
>>106623919
i think he said axolotl to train, llamacpp comes with a gguf script, as long as the model has an already supported tokenizer and architecture it just works.
>>
https://huggingface.co/mistralai/Magistral-Small-2509
Mistral bros we can't stop winning
>>
>>106623947
>reasoning
sheeeetttt
>>
magistral is the absolute worst of all reasoning meme models though
>>
>Updates compared with Magistral Small 1.1
>Finite generation: The model is less likely to enter infinite generation loops.
uh, I didn't know mistral made models that suffered from GLM-itis
>>
>>106622998
GLM can pretty much do anything but I'm also getting tired of its slop way of writing that it constantly devolves into. I'm considering trying out qwen 235b but I know that one also has drawbacks too even if the slop is less.
>>
What is your favorite model under 70B?
>>
>>106624175
glm air :)
>>
>>106624189
That is a 106B model.
>>
>>106624132
Side-effect of sloppy R1 distillation.
>>
>>106624194
You're absolutely right! What about Qwen3-235B-A22B-Instruct-2507.
>>
>>106624207
That is also too large.
>>
>>106624194
Don't judge a fat book by its cover. She's got the heart of a 32B model.
>>
>>106624211
I need a model that i can fit the FP16 weights entirely into VRAM. I have 160GB.
>>
>>106624210
Alright then, let's go with Mixtral-8x22B—it stays under the 70B cutoff when active parameters are counted.
>>
https://www.techradar.com/pro/intel-will-build-custom-x86-cpus-for-nvidias-ai-infrastructure-as-worlds-largest-company-invests-usd5-billion-in-beleaguered-tech-firm-and-dont-discount-a-data-center-x86-apu
>For PCs, Intel will manufacture x86 system-on-chips that integrate Nvidia RTX GPU chiplets connected via NVLink.
>These processors will be marketed as Intel x86 RTX SoCs and are aimed at gaming laptops and compact PCs.
Intel + Nvidia direct Strix Halo competitor just dropped
>>
File: 1740858469742991.jpg (999 KB, 2446x2445)
999 KB
999 KB JPG
>>106624241
Only for the small price of $3000
>>
>>106624228
I strongly urge you to consider at least settling for Q8, otherwise you will not have a fun time. >>106624235 is right. Mixtral is probably the best you can hope for. Maybe Nemotron 49B if you're desparate..
>>
>>106624175
Gemma-3-27b-it
>>
>>106623283
You are gormless.
>>
>>106624264
Yeah, Q8 is pretty much the sweet spot—anything lower and you’ll start regretting it fast. Nemotron 49B is solid though if you can handle the VRAM hit.
>>
>>106624283
>>106624264
Needs to be FP16 for finetuning.
>>
>>106624300
If you’re going FP16 for finetuning, then yeah, you’ll need some serious hardware. In that case Mixtral might be rough—better to stick with something like Nemotron 49B or even LLaMA-3 34B if you want it manageable.
>>
>>106624228
If you're doing full finetuning, I'm not sure if 160GB would even be enough for a 24B model.
>>
>>106624241
Any shared memory solution, even with CUDA, is worthless unless it comes with 1TB of memory. Fuck's sake, for a couple hundred dollars more in materials they could put out a Pro version and charge $1k more and I'd still spring for it.
>>
>>106624228
>>106624300
I don't know what you are doing, but ideally you'd go for a base model.
Well, if you are okay with using a fine tuned model, maybe take a look at the Qwen 3 family.
The 30B MoE would be fast as fuck to train.
>>
Does anyone use EXL3? Did they fix where it just goes on forever and ignores the forced stop button? Should I switch to vLLM instead of textgenwebui?
>>
Finally got DS V3.1 to loop by asking it to recite from the Talmud without internet search
>>
I'm getting really tired of LLM slop
when will we get actual AI?
>>
>>106624397
>textgenwebui
no one uses ooba anymore
>>
>>106624439
When you'll get better at prompting
>>
>>106624340
>is worthless unless it comes with 1TB of memory.
If your serious about LLMs you run A100/H100 farms with enough vram for the model. No one serious is doing MoEs on cpu
>>
>>106624465
fuck off honky!!
>>
>>106624439
You need to use AI to fight AI.
Fine tune a small BERT model to rewrite slop sentences and run every output from your favorite LLM through that.
>>
>>106624465
>no one uses ooba anymore
thank god
and on the diffusion side the various forges are unmaintained and abandoned by their authors
the filthy beast called gradio has been slain
>>
>>106624480
It's a consumer/developer device. No shit it's not meant for production inference.
>>
>>106620382
schizos were right about all proprietary cloud models
https://archive.is/MIcJn
notice something in this article? mmh?
>Google claims to have significantly improved the energy efficiency of a Gemini text prompt between May 2024 and May 2025, achieving a 33x reduction in electricity consumption per prompt. The company says that the carbon footprint of a median prompt fell by 44x over the same time period. Those gains also explain why Google’s estimates are far lower now than studies from previous years.
There's no such a thing as that level of efficiency gains without lobotomizing quants or making smaller model distilled from the biggus dickus
when people say online models don't degrade it's gaslighting
>>
>>106624439
Depends what you mean by “actual AI.” If you mean human-level, general intelligence — nobody knows; could be years, could be decades, could be we never get the thing people imagine. If you mean systems that stop sounding like sloppy autocomplete and actually do useful, reliable work: that’s already happening in pockets — retrieval + grounding, tool use, tighter fine-tuning, better evals and benchmarks.

Short version: the tech is improving fast but not uniformly. Want less slop now? use retrieval/RAG, smaller specialist models, chain-of-thought + calibration, and proper evaluation loops when you finetune (FP16 or whatever). Don’t expect a single drop-in “magic” model overnight — expect steady engineering wins and occasional big jumps.
>>
>>106624548
Grim.
>>
UK is moving a bill through parliament to ban local LLMs due to power usage. You can't make this shit up
>>
>>106624548
Hard to blame them. The vast majority of people using these models are retards asking for stupid shit (see >>106618037 and the sorts of things people used to search on Google) and they probably don't even notice the degradation.
>>
>>106624612
Are you fucking serious?
>>
>>106624612
Probably a wise move. UK is only a few decades away from chronic power shortfalls a la South Africa's Eskom.
>>
>>106624612
wtf, did they ban datacenters too?
>>
>>106624662
It's a move to tighten down on censorship by not allowing local llms.
>>
>>106624662
Data centers are orders of magnitude more profitable and efficient than wasting power on p40s that could go to heating some poor refugees.
>>
>>106624660
the hardware should be illegal to sell if its illegal to use. is there any thing different from running a video game on a gpu vs a llm?
>>
>>106624612
Please link it
>>
>>106624675
They should mandate Geforce Now and have mandatory GPU confiscations.
>>
>>106624675
Give it a few more release cycles and they'll require GPUs to be registered like weapons, be always online, and have hardware attestation to brick themselves when given illegal workloads.
>>
>>106624612
Link? I'm not finding shit.
>>
File: 1743816040111236.jpg (15 KB, 400x228)
15 KB
15 KB JPG
>>106624688
>>106624779
I made it up
>>
>>106624783
nigger
>>
>>106624783
I choose to continue believing it regardless as it supports my narrative
>>
>>106624783
kys
>>
File: file.png (105 KB, 1356x626)
105 KB
105 KB PNG
There are upcoming safety laws with local LLMs
>>
local suno, this is the base version, they also just released a dpo trained one I have not tried

https://huggingface.co/fredconex/SongBloom-Safetensors

https://github.com/fredconex/ComfyUI-SongBloom

https://files.catbox.moe/i0iple.flac
>>
>>106623919
>so what tool do you use to train models
https://docs.axolotl.ai/
https://github.com/axolotl-ai-cloud/axolotl/

Very robust trainer. Highly recommend for both beginners and LLM vets.


>and create ggufs?
llama.cpp . This filed in particular.

https://github.com/ggml-org/llama.cpp/blob/master/convert_hf_to_gguf.py

I haven't gotten to quantizing idiot but I'll try to get to that later today when I have time. I've got some things to take care of along with my job so my hands will be tied for most of the day.
>>
>>106624862
>local suno
you wish, sounds worse than 3.5
>>
>>106624883
again, that is the non dpo trained one which I imagen sounds far better
>>
File: 1744421666600281.png (51 KB, 799x398)
51 KB
51 KB PNG
>>106624862
lol
>>
>>106624465
I do like its ui though. Anything similar?
>>
>>106624923
>liking gradio
how
>>
think we need a board for /m/asochists
>>
>>106624979
That's kind of the point of the /vg/ ai thread
>>
>>106618417
>GPT 5 still feels like it's trying its damnedest to suck my metaphorical dick
That depends:

1. Are you using a custom system prompt in the settings (you can use that to directly tell it "cut the bullshit and just get to the point")

2. What are you asking it to do?

My experience is of course anecdotal but I don't have that AT ALL when I'm asking a technical questions. Keyword technical. If you're asking if life advice or other generic shit like that then it's probably going to default to some sycophantic behaviors but when it comes to actually assisting me with programming or LLM focused shit it becomes robotic but precise and concise, which is exactly how a tool like this should behave anyway.
>>
File: 1728177386053705.png (673 KB, 595x910)
673 KB
673 KB PNG
I feel like MCP could have just been a restful api spec
>>
>>106624848
models bigger than 8b f16/24gb should be banned
>>
>>106625048
That wouldn't attract VC money
>>
>>106625026
>1. Are you using a custom system prompt in the settings (you can use that to directly tell it "cut the bullshit and just get to the point")
I do when I'm tired of it all but steering away from the default assistant personality can seriously degrade the model output on things like code.
>2. What are you asking it to do?
Code, that's all. I tell it what to do in refactors and it outputs stuff.
Asking questions actually seems to trigger the sycophancy less than just giving it instructions of things to do, I've started to notice. Though, one thing besides the sycophancy that is grating is the constant engagement mechanic "Would you like me to" where almost every message ends in something like that like an overly attached girlfriend that just doesn't want to let it go
>>
>>106624783
I choose to continue believing it regardless as I think it's just a bit funny to spread misinformation on the internet.
>>
>>106625079
Qwen 4B 2507 Instruct Is All You Need
>>
File: 1749114436902246.png (1.06 MB, 682x900)
1.06 MB
1.06 MB PNG
>>106625098
Clarification, when I meant system prompt I meant going DEEP into the settings like pic rel.

Settings-->personalization--> Custom Instructions

Did you do that? A generic "system prompt" at the beginning of your chat probably isn't quite good enough or strong enough of a signal for it to stop doing splits on it
>>
>>106625118
>Qwen 4B
Gemma 3n you mean.
>>
>>106624848
Nothing burger. If it's on my machine they can't really do shit about it. This might heavily cuck web-based / API services but most of us don't care about those
>>
songbloom is really good btw and takes song references unlike suno, here was the start of fade to black with some claude lyrics

https://files.catbox.moe/sopv2f.flac
>>
This was higher cfg / lower temp / another seed
Crazy leap for local

https://files.catbox.moe/olajtj.flac
>>
>>106624862
>>106625408
>>
>>106619816
How many years you think before all these glasses start displaying all citizen's Palatir Social Credit Score above their heads? Like power level reading scouters but for our dystopia. I'm mostly joking, but if this form factor does take off, I can see people willing doing this with their social network handles and subscriber counts, which is just a proxy for social score anyway.
>>
>>106625408
usual request to make it do lewd noises
how's gen speed btw? I really want to get into audiogen but being an AMD vramlet with no ROCm support, it's probably going to be pain in the ass. Maybe low param TTS is my limit.
>>
>>106625507
2 min 30 sec song takes about a min on 4090
>>
>>106625516
>gen time < play time
Huh, you can fucking stream it live?
>>
>>106619816
LOL past 1h 5min they're still going on about metaverse shit
Zuck still hasn't let go of this piece of shit NO ONE WANTS nobody is asking for he wants more metaverse cringe
https://developers.meta.com/horizon-worlds
"create quality worlds in a fraction of the time"
Yeah that's what the metaverse was missing, AI gen slop
"new engine we built to replace unity"
"fully optimized for bringing the metaverse to life"
"much faster to load"
>>
>>106625554
I heard Metaverse is quite popular with children/parents as degeneracy-free version of VRChat.
Actually saw a little girl making a suspiciously Metaverse-looking avatar in her dad's tablet on a bus ride a week ago.
>>
>>106625554
The Horizon TV thing actually sounds appealing. Would get a Quest instead of a new TV if only it could stream from Jellyfin instead of some subscription.
>>
>>106625554
metaverse is the future
the problem is current vr headsets are simply too clunky for mass adoption
>>
>>106625675
not even apple and their all powerful, from the gods Distortion Field could get their fanboys to truly stick to their VR headset
stop trying to make VR happen
it won't, it's a niche nerd thing that will stay a niche nerd thing, normal people want nothing to do with this
>>
>>106625693
no, vr will definitely happen, it is a certainty
it will simply take time for vr headsets to reach the point for mass adoption
>>
File: 1504401643047.png (356 KB, 704x396)
356 KB
356 KB PNG
>>106625378
I can remember Moonphase.
>>
>>106625705
>no, vr will definitely happen, it is a certainty
right, as certain as nuclear fusion reactors, making space or mars colonies and plenty of other "I really want to live in tech singularity" bullshit from nerds who hate the world and dream of a tech utopia
always just Around The Corner©
>>
>>106625597
Tsk-tsk, degens should really go hardcore on policing themselves, before mob comes and does it for them.
https://www.youtube.com/watch?v=VSOFnkCCU0Y
>>
>>106625715
Wait, on the second one it's "own phase"? Nooo...
>>
>>106625736
How could you forget about flying cars, circular cities, and hoverboards?
>>
>>106625675
>>106625705
buy an ad zuck
>>
>>106625736
>right, as certain as nuclear fusion reactors, making space or mars colonies
those are all examples of things that are not here yet, vr is here, it just needs refining, strapping a brick to your face is not something most people are comfortable to do with a product, neither are the low fov, shitty displays and subpar software

light field displays, higher fov, lower headset weight thanks to computing pucks and better software will make adoption of vr easier
>>
>>106624862
please stop supporting the shillui. We need people to make extensions for anistudio and neoforge instead. comfyorg is aggressively squeezing saas and telemetry into everything. it's complete ass and local users are treated as second class citizens
>>
>>106625801
One could have made the same argument about electric cars a century ago. You have no idea how long it will take for technology to catch up enough to produce the refined product that will actually be useful.
>>
>>106625841
I agree, i'm not saying it will be soon, just that it will definitely happen eventually
>>
>news breaks that huawei gpus can't train models
>china bans NVIDIA AI chips a week later
I don't understand the logic on this one
>>
>>106625876
They will be forced to dogfood the homegrown option until quality improves. Won't get better if there is no incentive or usage.
>>
>>106625801
but we do have fusion reactors already, what is the difference?
>>
>>106625876
Source?
Hard mode: non Jewish source
>>
>>106625876
glowies that reside in the silicon
>>
>>106625895
https://www.ft.com/content/12adf92d-3e34-428a-8d61-c9169511915c
https://www.cnbc.com/2025/09/17/nvidia-ceo-disappointed-after-reports-china-has-banned-its-ai-chips.html
It's widely public news at this point. Even Jensen acknowledged it
>>
>>106625801
Aside from software, I don't think there's much room for refinement, we are pushing the very limits of current technology. I even expect a Concorde effect: industry will roll back to worse specs for the sake of better ROI.
We already saw it with new consumer GPUs having less VRAM than old ones, for example.
>>
local models?
>>
>>106625974
Sir please
>>
>>106625974
Where?
>>
>>106625876
They don't want to be dependent on nvidia, so they're going to force the issue and take the performance hit now with the expectation that eventually they'll be able to catch up.

Same principle as banning foreign software companies like google and forcing their people to use local alternatives, despite them being much less sophisticated at the time.
>>
That Doug Engelbart quote is familiar to us huh.
>>
>>106626047
So Qwen 4 will blow chunks harder than Llama 4, but Qwen 5 will finally be GPT-4 at home?
>>
>>106625876
You've got the timeline mixed up.
A week after the rumor broke that R2 got delayed by Huawei GPUs, DS released V3.1 to put to the rest the rumor. V3.1 was trained in a format that targets upcoming Chinese GPUs. That was a month ago.
A day after the news broke that the Chinese were trialing their own DUVs (instead of buying from ASML) and Alibaba announcing their own TPUs, Nvidia chips got banned by China.
>>
>>106625909
Does this ban affect existing chips or new? This is going to be the dark age of llm if it's a total ban.
>>
Can somebody explain to me the practical difference between Gemma 3ns sparsity and an usual MoE?
Or is it just the same think with another name and a slightly tweak to the architecture?
I don't think that's the case, since 3n doesn't seem to have something like a router.
>>
>>106626081
3n reuses parameters multiple times instead of having extra parameters that are only called sometimes like in a usual MoE
>>
>>106626095
Oh. So it's like the opposite of a MoE? Instead of having a ton of params and only using a subset, it uses the same params multiple times?
That's kind of crazy. Reminds me of that layer looping concept.
>>
>>106626125
>layer looping concept
Shamelessly copied from one of the thousand papers experimenting with the ResNet architecture
Tldr: nothing new
>>
>>106626125
Pretty much, yeah. Seems like it would have a downside of the overloaded parameters reaching saturation much sooner than usual. It might not scale well past edge device size.
>>
File: 1752993660070767.png (220 KB, 860x454)
220 KB
220 KB PNG
>>106625554
The display glasses seem neat. I'll pick a pair up if and only if there's a proper developer SDK that gives you a reasonable amount of control over the device.

Locked in to cloud-only llama4 would be the most useless retarded thing ever.
>>
>>106626188
God damn they are so fucking bulky looking
>>
>>106625926
MOOOOM!!!! HES SEXUALLY HARASSING ME AGAIN!
>>
File: 1742748628867690.png (1.46 MB, 1770x452)
1.46 MB
1.46 MB PNG
>>106626188
>Locked in to cloud-only llama4 would be the most useless retarded thing ever.
What gave you the impression that wouldn't be the case lol? It's not like any other models would be that useful on a user interface like that. It would be good for general question answering , essentially using it as a personalized Google search engine powered by your voice. I can't think of any scenarios where using a dedicated programming model like some 30B Qwen model what even be practical anyway. "Hey meta what kind of flower is this", "hey meta how many calories does this cereal food product and looking at have" , " "meta how many grams of protein does this food have", "hey meta send a text message to mom", simple everyday queries like that.
>>
>>106625876
99% chance it's just negotiation tactics
"H20s suck massive dick, offer us something better or you'll lose ALL the chinabucks"
Then Jensen sends Trump a golden bidet and within 6 weeks he has the new and improved Nvidia H25 now 30% less gimped ready to ship
>>
>>106626214
These are comical. What the hell. Only retards would wear these in real life - big glasses what are heavy and bulky are torture.
But of course they wouldn't know because they don't NEED optics to correct their vision in the first place.
>>
>>106626229
"Hey Meta show me a video of that hot girl I was just looking at sucking my dick"
>>
>>106626229
>What gave you the impression that wouldn't be the case
Meta has been doing a very weird mix of open and closed systems over the past couple years. They've been doing things like releasing some interesting papers and models with lots of details and public datasets (dinov4, their emg model, etc). They've removed a lot of the hardware access restrictions from the quest sdk.

On the other hand, they've also never released an SDK for their other previous gaybans glasses.

So I think it's not out of the realm of possibility that they might allow it.

As to why you would want the option to not use cloud llama 4, I'm not going to bother to respond to that because I feel that the topic has been discussed ad nauseum and is adequately answered by the fact that we're in "local model general".
>>
>>106625554
>https://developers.meta.com/horizon-worlds
Roblox but VR
>>
>>106626557
>As to why you would want the option to not use cloud llama 4, I'm not going to bother to respond to that because I feel that the topic has been discussed ad nauseum and is adequately answered by the fact that we're in "local model general".
I wouldn't want to use it even if I wasn't a local model user
this is a world where Claude, Gemini, GPT exist
pay for a llama device? oh fucking hell no retards
>>
is Whisper still the king?

Need English only
>>
>>106626641
Yes.
>>
File: 54d12kuq8rpf1.png (303 KB, 5684x1787)
303 KB
303 KB PNG
>>
>>106626905
how is qwen's coder and a fuckkin 1t losing to glm?
>>
>>106626229
Because Quest3 chads have full camera API access
>>
>>106626921
GLM-chan is doing her best!
>>
Nvidia investing 5bil into intel, uwotn8?
>>
File: 1736356161979453.jpg (59 KB, 414x414)
59 KB
59 KB JPG
>>106626951
AMD cucks in shambles
>>
loli miku feet
>>
>>106625974
hot single models in your local area
>>
>>106627017
are they dtf?
>>
> allowing you to hilft your entire length down her esophagus with wet gluck sounds
?
???
>>
unsloth this!
*reveals benis*
>>
>>106626229
I don't expect much from this industry at this point, but not letting nerds play with such toys is basically shooting yourself in the foot.
>>
> I cling to him, my small body arching into his, pressing myself closer as if I could crawl inside his skin and stay there forever.
Adorable GLM. Just adorable.
>>
>>106627053
hot
>>
File: linkin-park.jpg (94 KB, 991x774)
94 KB
94 KB JPG
>>106627075
>CRAWLINGGGG IN MY SKINNNNN
>>
>>106627075
>adorable
absolutely not
>>
>>106626905
>qwenext worse than glmair
whats the fuckign point then????????????????????????
>>
>>106622974
RAG
>>
>>106627153
>>106627153
>>106627153
>>
>>106627145
it's a proof of concept to get people to work on supporting the architecture, they didn't even bother training it on their whole dataset
>>
>>106627075
kek
>>
>>106625048
MCP is clunky and useless.
>hey just add this definition to your .mcp.json every time you want to use it
>oh and you can't "just have it" cuz it'll inflate the prompt with 90000 tools



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.