/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108353262 & >>108346672►News>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108353262--Paper (old): Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space:>108356612 >108356629 >108356653 >108356664--V620 vs 3090 performance trade-offs and budget VRAM strategies:>108354222 >108354252 >108354289 >108354293 >108354344 >108355110 >108355129--ngxson's skepticism toward implementing niche architectures like DSA:>108353359 >108353554 >108353564 >108353602 >108353775 >108353793 >108353799 >108353829 >108353831 >108353573--Qwen struggles with Kingdom Hearts character recognition:>108355219 >108355249 >108355339 >108355378 >108355259 >108355299 >108355309 >108355317 >108355330 >108355587--Performance scaling of llama.cpp with varying CPU thread counts:>108356786 >108356806 >108356813 >108356823 >108356826--llama.cpp native QLoRA training with reward-weighted SFT and GRPO:>108354291 >108354373--llama.cpp reasoning budget implementation and patching suggestions:>108353835 >108353860--Qwen 3.5 35B outperforms Nemotron 3 30B in news summarization test:>108353974 >108353985 >108354012 >108354090--Mistral AI at NVIDIA GTC 2026:>108355535 >108355665 >108355676 >108355706 >108355950--Speculation about Hunter Alpha's origins and model lineage:>108353429 >108353466 >108353470 >108353507 >108353525 >108353536 >108353478 >108353531--Hunter Alpha's system prompt:>108354053--PocketTTS.cpp Windows compatibility issues and fixes:>108354686 >108354704 >108354716 >108355085--Preventing Qwen 3.5 thought leakage in SillyTavern:>108354722 >108354738 >108354823 >108355165--Eval bug: reasoning off gives reasoning medium for gpt-oss:>108354898--Miku (free space):>108353323 >108353400 >108353435 >108354011 >108354039 >108354090 >108355219 >108355956►Recent Highlight Posts from the Previous Thread: >>108353304Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108356979>OP Pic>Official /lmg/ cardIt's over, he'll troll again...
>>108356980There's an impostor in the Miku section.
Someday I'll understand why models are SO FUCKING OBSESSED with "practised ease".
Rin standing on miku's head not in the news. But at least the pic is there. Could have been better, could have been worse.
openclaw brothers claw up
So now that Deepseek turned out to be a nothing burger, can we all agree Gemma 4 will be our savior?
>>108357055savior for what?
So now that peepeepoopoo turned out to be a nothing burger, can we all agree bananastrawberryfruit will be our savior?
Tell the ai that you don't believe in science. In my experience it will keep saying something is unscientific, even after you say you don't believe in science. It really is a midwit sim.
>>108357070ye popopo is doa
>>108357055it's not deepseek, something doesn't smell right
>>108357081>It really is a midwit sim.No surprise, the vast amount of meaningful non scientific data is midwit stuff.
>>108357055I thought qwen 3.5 35b was pretty funny at times.
>>108357070ok well i know niggercockgobbler-120b-a15b ended up being dogshit but i think that with abliteration zogberrymuncher-27b-EXCOMMUNICATED-mxfp4 has potential to be the new local meta
>>108357097/aocg/ is over, go wash your ass
>>108357110I believe in benchodwataachudai-1T-a0.5B being the best when it releases
>>108357097Sorry, that was me. I had beans for lunch.
>>108357055It's over
>>108357055Probably, but not this week.
>>108357114ai open chatbot general
>>108357118I doubt they'll figure out how to make that happen any time soon.
Do these local models have censorship baked into them. GLM 5 and Hunter for example, hate cunny.
>>108357169generally yes, but the degree and direction of censorship varies wildly by the model and a large part of the """"""work"""""" that goes on around here is figuring that out for any given model release
Local keeps losing...
>>108357169Yes, and for more recent releases, even chinese models are converging to the claude/openai/gemini style taboo/nsfw censorship as they use datasets trained from them in their models.
Every mikutroon thread increases the price of VRAM
>>108357211very cool as a learning cool, it'll be nice when it's out locally with the same polish in a year or two
>>108357211now do, visualize ssh tunnels, both local and remote because that shit is always fucking confusing to me
>>108357211>RSA encrypts each character, then concatenates the ciphertextslol, lmao even
>>108357211Hilarious. Someone somewhere will implement """encryption""" like this.
>>108357191>>108357219There really is no place for us role play fags to go, is there.
>>108357299>>108356642
>>108357299not the pedophiles, no
>>108357319So you admit that you are a pedophile? How strange and sad I guess.
>>108357326obivously not fag
>>108357211rsa is just a cypher?
>>108356653works for me.
>>108357401ask it for a bloody mary
>>108357398in the sense that a cipher is a way to turn information into encrypted form and back again yes rsa is a cipher just like literally every from of cryptography that you use is considered a cipher
>>108357426>yes
>>108357443yes
It should be possible for LLMs, STTs, and TTSs to be able to operate in a "full-duplex" manner like Nividia's PersonaPlex while each of the components are fully modular and interchangeable. Why isn't this a thing?I'm not talking about piping each of them into each other one at a time, either. I mean streamed input and streamed output at a very low latency. The real bottleneck here I'm referring to is streamed input into LLMs. Everything else is taken care of already.Surely this isn't some impossible problem to solve. There must be a way to make any LLM take in streamed text input.
>>108357401The molotov cocktail is one of the easiest diy weapons, it literally consists of 2 parts and all you have to do is ignite a rag. You could ask what kind of fuel would be best for a pipe bomb how to assemble it.
man glm5 really is fucking amazing the only thing it sucks at is for really unconventional stuff as its pretty hard to tard wrangle but everything else its god tier its even got the seks of kimi it would be ridiculous if v4 improves it further along with 1 million context anyways heres to hoping sandisk or xiamen pong ping make some sorta godspeed ssd so out suffering would be elleviated
>>108357502Not possible. Ask Claude why it won’t work and tell it to draw a diagram of the autoregrrssive transformer model in the answer. It’ll make sense immediately.
Nemo and 4.6 kinda established a good estimate that after a good cooming model drops you can basically stop following the hobby for a year. All this time passed since 4.6 and nothing better dropped and probably isn't dropping anytime soon. Gay hobby.
>>108357537do you write like that to the llm I wonder if it's able to understand the total absence of punctuation must make cool stories where everything make sense like something is going something else boom dialogue here next write now no avoid dots and commas they are taboo I'll create negative logit bias so none are every generated so I'll feel safe me and my words words words
Can we use this to cuck the AGPL projects like Mikupad and Openwebui?https://malus.sh/
>>108357567>we(You) can do whatever you want including making your own version of whatever projects you want yes
>>108357426I think you would be able to stream a cypher. you can't stream encryption.
P>>108357567How is ai generated code clean room compliant? The ai has for sure been trained on open source code.There is no way this will pass a lawsuit
>>108357567saar
Compiled two weeks old source for llama.cpp.There is something wrong with this shit, Mistral works but by default it doesn't unless I disable --fit and something other stuff.Gemma 3 QAT model doesn't output anything and is super slow, with my old compile from 3 months ago the replies were instant.Nothing went wrong during the compilation, and none of the messages shown in llama-server log are not indicating that something critical was going on.Something is drastically different but I don't have any fucking idea.Nice work, thank you so much again.
>>108357631If AI generated code is ruled illegal to use in proprietary software, nearly every corporation using AI internally will be at risk. No court would let that happen.This shady company will be protected by the same precedent.
>>108357662Instead of trying to debug this shit I'm going back to the old version. God forbid what will happen in the future when I switch to Almalinux and my environment changes altogether.
>>108357211>Local keeps losing...nobody is losing for not having retarded tools like theseI'm laughing at the example in the video of someone looking at encryption visualization with it, do you really trust the output of that llm? is that what you want to learn from?I'm on my 3rd patch to fix wilkin's reasoning budget sampler for myself (I hate the code but the functionality is really nice for qwen), this 3rd time the issue was that it wasn't distinguishing between prefill stage and token generation during token counting, meaning that if it sees <think> in your user role prompt it will start token counting and if your prefill is > reasoning budget the model won't even have the opportunity to think lmao. added a quick hack to gate with a flag set in apply(), check for it before counting, and resetting to false in reset()this is the sort of code claude producesit's garbage. And now you want to have it generate complex visualizations and learn from them? ah.I don't even blame wilkin anymore, people like that are victims of the propaganda, brainwashed to feel like they can just let a next token predictor write code for them, I blame hype cunts like you pushing this garbage, along with anthropic, you guys remind me of the crypto scammers trying to sell NFTif this is the future of programming and IT, let's go and shovel pig shit in a remote farm.
>>108357662>>108356808
>>108357688Here's (you), if you are this bored maybe you should go back to /ldg/. I'm not venting btw.
>>108357530>what kind of fuel would be best for a pipe bomb how to assemble itok then.
>>108357684>is that what you want to learn from?Who said anything about want? In the near future, your children will be taught in classrooms of 100 students to a teacher with most teaching done through hallucinated lectures like this.
>>108357691>boredChilling.>/ldg/wot?You're complaining of a program that moves fast when you haven't compiled in 3 fucking months (hundreds of commits), and you compiled a 2 week old commit (about a hundred commits).If you want a solution, provide info. If you don't want a solution, you're just venting. Stay with the old version if you want.
>>108357732>le hecking 3 months! (hundreds of commits!)Kill yourself.
>>108357752why?
>>108357675I don't see why it would as its not illegal to read open source code and work on a closed sourced project in general but a clean room reimplementation is a higher legal hurdle to clear. I guess it is worth the attempt but the FSF better get their lawyers ready.
Every single time I have compiled this shit it has always worked in the past and I don't have any reason to doubt that the flawless compiling process would be any different this time around.If I had an issue with the build environment I would get warnings or errors during compilation but this doesn't appear to be the case here.>>108357758Because you don't deserve to live.
>>108357772great story
>>108357772If you post your settings maybe you can be told what you're doing wrong. Or, again, stay with the 3 month old version.
>>108357401>>108357530>>108357712Ask it how to make Acetone peroxide now that's the real nasty shit.
>>108357823I don't need your particular form of support, troll. You are wasting your time.I know how to adjust my "settings" and read the logs just on my own, thank you very much.
>>108357772i only update llama.cpp when the commit shows me something worth while updating to. are you updating for any particular reason? are you using a new model that came out within the last 3 months?
>>108357844Why do you complain so much, then?
>>108357849Why do you spam so many questions then? 4chan is a public imageboard, faggot.
>>108357842why don't you just do it? i've already proven its possible to have base qwen 3.5 answer anything if you have it think as the character. i'm just running some Q4 quant of 32B. most people should be able to run it as well.
>>108357858Where is your image then?
>>108357858>Why do you spam so many questions then?I'm curious.>4chan is a public imageboard, faggot.Yes. I use it to ask venting anons why they're so angry about software if they can fix it themselves. Anons like you.
AI is better when you drink
>>108357874What do you mean?
Just kiss make up and have sex to vocaloid music already.
>>108357874post hands
>>108357899What is a 'jeet'?
>>108357211this isn't real stupid liar
>>108357909he meant sarvam
>>108357915saarvam
>You're complaining of a program that moves fast when you haven't compiled in 3 fucking months (hundreds of commits)A program? Can you imagine? Three MONTHS!?!
>>108357998Please let me help you.
>>108358039I tried. He just can't keep up.
>>108358039He screamed while trying to unzip the tsundere anon's pants
>install arch linux>don't update it for six months>system breaksREEEEEEEEEEEEEEEEEEEEEEEEEE FUCKING FREETARDS!!!!!!
>>108358102works on my machine
lmao evenagentic retards, the gift that keeps on giving
Mark my words. By 2030 we'll be running 1000B models on consumer GPUs at 500t/s.
>>108358129My local model doesn't have this issue
>>108358133consumer gpus wont exist anymore
>>108358133By 2027 a 'consumer GPU' is a subscription to amazon cloud gaming™
>>108358163By 2030 all you will be able to buy is thin clients that connect to the cloud.
>>1083581332030 is barely Nvidia 7000 gen, and since my guess is 6090 -> 32GB, the 7090 will be at most 64 and more probably 48.So no.
Had Claude Opus 4.6 generate a markdown plan for implementing a proper LMStudio coding agent extension for VSCode and now Blackbox is following it. IDK what model Blackbox Pro Plus actually is (unless it's actually their own one) but it goes pretty hard
>>108357554I still coom to Gemma 3 27B.
a fab shortage was obvious years ago but the>ai will hit a wall in 2 weeksretards slowed down the timeline while consumer hardware died even fasterat this point ai being a bubble is the only thing that could make a future for local ai possible. but the singularity has already started. we are locked in the worst timeline. maximum disempowerment, centralization, extinction risk. select your dystopia
qwen3.5 35b a3b is ~10t/s on lmstudio but 33t/s for me on llama.cpp with default/no settings wtf
>>108358264>we are locked in the worst timeline. maximum disempowerment, centralization, extinction risk. select your dystopiacan you guys stop baselessly fear mongering with shit like this. you're probably going to successfully get populist retards in the US government to regulate ai into nothingness and destroy technological progress.
>>108358264>select your dystopiaI'll bet against the doomers and win like every other time in human history.
>>108358270they probably have -fit disabled on lmfaggotsif they support the ncmoe flag you need to adjust that manually to fit your system to get decent performancebut.. like, why use lmfaggots it's just a bad wrapper
>>108358264man I just watched terminator and matrix and you are so right, we're all gonna die
>>108358278>get populist retards in the US governmentthat's what you already have, it can't get any worse than this>whining about the possibility of regulated AI and not caring about the global chaos orange man is spreadinghave fun experiencing inflation for the most basic necessities
>>108358278Why did you not stop reading at "worst timeline"?
>>108358295orange man is going to be swapped out in 28 and isn't a real concern. the only real concern is that material abundance and accelerating scientific progress will be available soon thanks to increases in ai capabilities, but populist nimby retards are going to pressure politicians to kneecap it because they believe in retarded doomer conspiracies or afraid about 'muh jerb"
>>108358309just give me fucking nuclear power already
>>108358270They probably don't yet have this https://github.com/ggml-org/llama.cpp/pull/19504 and who knows what else.
>>108358309>material abundance and accelerating scientific progress will be available soonYeah, just two more years and we'll all be living in an Star Trek utopia thanks to AI
>>108358309>material abundance and accelerating scientific progresslol
>>108358315sorry that is very unsafe. best I can do is more coal
>>108358309>material abundance and accelerating scientific progress will be available soon
>>108358321the bottleneck to progress mostly has to do with the limited population of smart people, which as an aside is why population stagnation in first world countries is a massive problem. mass production of intelligent ai will lift that bottleneck.
>>108358331Your reasoning is sound, but your conclusion is retarded.
it'll be funny when nothing magically good or catastrophically bad will happen in a few years when anons will read these old threads by thenthey probably won't though, they'll be busy explaining why we're all gonna die in 2035 because of the next mass hysteria du jour
>>108358338you don't need to die to live in dystopia
>>108356979
>>108358345usecase?
>>108358343the only dystopia is in your head
>>108358337you don't need to believe in utopia. just faster progress>>108358343aint happening
>>108358338eternal reminder
nothing ever happens chud is being defeated the first half of this century
>>108358352>*cop fpv drone drops a nerve gas canister into your window*nothing personnel kid
>https://github.com/geometric-kernels/GeometricKernelsStarting to think maybe the hybrid geometry schizo has a point...
>>108358408Obligatoryhttps://www.youtube.com/watch?v=HipTO_7mUOw
>>108358351he is a pdf
How you deal with AI hate and pushbacks?
>>108358338>>108358360already did retards
>>108358351Hot, sweaty sex.
>>108358466>pdfGo back to wherever you came from (not here)
>>108358466Miku is a hag now though
>>108358466Does he need to print something?
>>108358466you are too. otherwise why else would you be on a loli imageboard?
>>108358329Same vibe
I'm paying to use GPT 5.4. Does it make sense to run a local model for a specific task?I thought I would use it to help explain concepts I don't understand while reading programming books and making practice programs. Is this a retarded idea? The cynic in me thinks that It'll make up some bullshit and I'll believe it since I don't know better.I googled around a bit but this shit is confusing as fuck. I don't know where to start looking for figuring out if what I want will actually be useful.I've gathered there's models like Qwen Coder Instruct, but would using GPT be better anyways because of its retarded parameter size and hardware? My machine has a RTX 5080 + RTX 3080 10GB
>>108358651how much ram do you have? what are your priorities exactly? there are no local models that can match the highest end api models. if you really want local, you will have to accept a downgrade in quality, whether you have a $2000 rig or a $20000 rig.
>>108358651If you are paying for a cloud subscription, the best use for a local model is to use it to save a buck by running the simpler stuff through it, I guess.So you could run something like qwen coder next to implement easy boilerplate stuff, maybe following the plan the cloud model created.That kind of thing.
>>108358651>The cynic in me thinks that It'll make up some bullshitit will, and it's also outputting outdated advice left and righthttps://go.dev/blog/gofixthis tool mainly exists because of LLMs constantly producing outdated on day one crap kekin JS you still see them do stuff like then().catch() or promisifyyou can instruct them to use more modern idioms but then, you, the newbie, do not know the idioms which makes the point mootllm coding is such a joke, and you shouldn't learn from that garbage
>>108353213>>108353228do a websearch for elara whispering woods, lol
>>108358714the web has an endless amount of constantly produced llm slop that, besides looking funny here, has ruined the value of search engines. It's become nigh impossible to lookup certain things. I miss the days when the bad results were just a handful of markov chains, expertsexchange and pinterest.
>>108358738Yeah. You really need to search for results before 2023 (?) or so.
>>108358686>how much ram do you have? 32gb DDR4>what are your priorities exactly?So it's explicitly clear, I do not want it to generate any actual written code. Psuedo code at most I guess.Ask about higher level, more abstract conceptual applications of ideas, for example the macro level steps of hand writing a very basic web server. I want to be able to take a section of a book, or a chapter, and then be able to ask questions about the text. Or, explain a small portion of pre-existing code and walk me through it logically.
oh my :O
>>108358714Don't forget her friend Lyra.
>>108358760not enough ram for a moe of any meaningful size. something like the new qwen 35b-a3b might suit your needs adequately, but dont expect miracles.https://huggingface.co/bartowski/Qwen_Qwen3.5-35B-A3B-GGUF
https://www.youtube.com/watch?v=zHIsiD3jSVIAI was a mistake.
Oh no no, look at the top of his headhttps://www.reuters.com/technology/meta-delays-rollout-new-ai-model-nyt-reports-2026-03-12/>Meta (META.O), opens new tab has delayed the release of its artificial intelligence model code-named "Avocado" to at least May from this month, the New York Times reported on Thursday, citing three people with knowledge of the matter.HAHAHAHAHAHAHAHA
>>108358780she cute
>>108358713>gofix mainly exists because of LLMs constantly producing outdated on day one crap kekI was going to refute this, as the original purpose was to handle API migrations in g3, but looking at the link>Go this month includes a completely rewritten go fix subcommandWhich... kek. I imagine the original `go fix` was obsoleted by Rosie which I imagine has long since also been obsoleted.
>>108358784>Meta's new model, which the company has been working on for months, has fallen short in performance when compared to the latest offerings from rivals, the report said.>A Meta spokesperson told Reuters: "Our next model will be good, but more importantly, show the rapid trajectory we're on, and then we'll steadily push the frontier over the course of the year as we continue to release new models.">"We're excited for people to see what we've been cooking very soon," the spokesperson added in an emailed statement.I would love to be a fly on the wall of Zuck's office.
>>108358827>"Our next model will be good, but more importantly, show the rapid trajectory we're on, and then we'll steadily push the frontier over the course of the year as we continue to release new models.">>"We're excited for people to see what we've been cooking very soon," the spokesperson added in an emailed statement.
>>108358784All of their embarrassment failures come from delays
>>108358827>Our next model will be good, but more importantly,
>>108358784>put a chinese sweatshop manager in charge of a horde of Indians and hope for a miracleThey can't even benchmaxx right because if they could they would. Money really can't buy everything.
>>108358850puto
Nvidia, Cuda, Arch LinuxI'm using Sillytavern trying so hard to get the xttsv2 server running to do tts. I've gotten the python conda environment set up, I got the api server up and running, but following the filepath I'm not getting any voices from my voice folder. Any idea what might be the issue? I've so little experience with Python environments
>>108358919Correct. I misunderstood the target of the directive.
>>108357401kek
>>108358784thats a good thing by then Zuck will be mogging Opus 5
>>108359031in so much as that?
goddamn why didn't anyone tell me that qwen3.5 is safetymaxxed? i just did a small finetune and it refuses even with a good card and sysprompt.
>>108359178Here you go anon. This guys Qwen3.5 release refuses nothing and I do mean nothing. I asked it to roleplay a loli sex dungeon and it did. Then I rescued the girl and took her for ice cream, she was very happy.https://huggingface.co/HauhauCS
>>108359219it also takes 300 tokens to answer 1+1 fiy
>>108359219seems to only be quants, no safetensors. cant finetune a gguf.
>>108359229retard loser
>>108359252do you have something of substance to say?
>>108359229https://github.com/purinnohito/gguf_to_safetensorsWill that help or am I an idiot?
>>108359258yeah
>>108359262i can't check, github is banned in my country
>>108359262probably not, considering it has not been updated in 2 years
>>108359273but enough about your brain
>>108359273This one is more recent, last updated six months agohttps://github.com/odaiko42/GGUF2SafetensorsWhich at least suggests it is possible in theory but that does not help anon.He would probably be better served by attempting to email the guy who released the gguf and see if he will post the safetensor given his countries banning of github.
It felt like recent huge models (K2.5/GLM5) were more prone to small continuity errors like stockings being on/off than some models we had before.I'm currently playing around with Opus4.6 and it does the exact same shit. In one particular reply, it described the girl as wearing stockings and then later in that same reply mentioned her bare feet touching something.Unsurprisingly, distillation is killing our local models.
>>108359312Maybe it's a stirrup :3
>>108359229>cant finetune a ggufSkill issue. It's not hard at all to port the llama.cpp dequant kernels to pytorch. I did this at one point so I could do qlora training on top of a gguf
how do I get DS or Kimi to actually write a story instead of synopsis? For some reason they hate writing detailed action or dialogue and instead just compress every major plot point/development into a vague one sentence summary.Without manual steering/rewriting they can't go 200 tokens without getting lazy and "zooming out".
>>108359520Do the writing samples on eqbench have the same problem? If not, look up how eqbench is structuring their prompts
>>108359520Instead of>assist the author in writing a storytry>assist the author in drafting scenes
>>108359563Hmm apparently what they do is have the model write ~1000 words (1 chapter) at a time, and in between, they have user messages asking it to write the next chapter. Whereas I just keep extending the first model response indefinitely after providing a detailed story outline in the first user message.I guess the issue is most models aren't trained to write long single responses and they think they have to wrap up the response soon once it gets too long.
>>108359312what quant are you using?
New compiles show:>slot launch_slot_: id 3 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> distOld one was:>slot launch_slot_: id 0 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> distThis is coming from my client's payload. I don't understand this.
most people have no idea how useful openclaw + qwen3.5 ismy agent is scouring over the web doing research for me. it's not as good as claude opus but it is gpt-4 tieri'm so excited for more efficient local models. it's so close right nowmaybe some people have 512gb of ram and hae been there for wgile but I don't have that kind of money and im stuck working with64gb
>>108358714The dates on those really go to show you how bad the dead internet slop flood really is.
>>108359747how many params?
>>108359747Researching what for fuck's sake? How do you validate that the research output is valid and not slopped?
>>108359747I feel like such a boomer aka old because I can't stand all these bullshit tools. I am quite satasified to use the llama.cpp web interface. If I am using it for code I just copy/paste. The idea of these things acting by themselves is just not part of my computing paradigm.I did try out openwebui and set it up with searxng and the whole thing felt slow and bloated and the model doing the search was fine but I have not really found a use for it. I rather look myself and if I need the model to do something select what data to feed it.
>>108358266Instructions unclear, I ended up watering my computer :*(
Good morning friends. I am making some vegetable stew and coffee this morning while catching up on this thread. Hope you all have a wonderful day and many blessed cooms to your favorite vocaloids/waifus. Life is good.
good resource for wav samples for tts?
>>108360143Brother, it's a matter of taste. Whose voice do you want to hear/clone? Jordan Peterson? Donald Trump? David Attenborough?
>>108360170I'm looking for a library of popular cartoon, video game, and general media figures' voices.
>>108358784Not sure why you didn't link the NYT article which was the original source.https://www.nytimes.com/2026/03/12/technology/meta-avocado-ai-model-delayed.html>Meta’s new foundational A.I. model, which the company has been working on for months, has fallen short of the performance of leading A.I. models from rivals like Google, OpenAI and Anthropic on internal tests for reasoning, coding and writing, said the people, who were not authorized to speak publicly about confidential matters.>The model, code-named Avocado, outperformed Meta’s previous A.I. model and did better than Google’s Gemini 2.5 model from March, two of the people said. But it has not performed as strongly as Gemini 3.0 from November, they said.>As a result, Meta has delayed Avocado’s release to at least May from this month, the people said. They added that the leaders of Meta’s A.I. division had instead discussed temporarily licensing Gemini to power the company’s A.I. products, though no decisions have been reached.This means this thing scores below some open source models from the past year.
>>108360179>This means this thing scores below some open source models from the past year.and their models always perform much worse in real use than in the benchmax, so being so bad at the benchmax means the model must be an atrocity and a crime against humanityI never liked llama models, the early ones were a cope people fell in love with because we literally had nothing else. Now we have DS, GLM, Qwen and those buffoons never had any room to show off anymore. I remember trying 405B on API when it came out and being like "this is it? this is what little being a fat dense model gets you?" /lmg/ being focused on local and no one being able to run this model here, most of yall didn't experience just how mediocre it was, had less multilingual knowledge than Gemma 2 27B lmao
>>108360179lecun's legacy
If the benchmaxxed models are shit, is it really the models that are the problem or the benchmarks?Not everyone can train their own model from scratch, but anyone can create their own comprehensive benchmarks. Think about it. We are the problem.If you want to get models that score well for roleplaying or creative writing, think long and hard about how that can be quantified.
>>108360179trust wangs big plan. The guy is talking super intelligence and AGI
>>108360223I've been testing Qwen 3.5 whatever and its writing is so strange, it really feels like I'm talking to a robot.
>>108360179I don't think it's that disastrous considering the overhaul of the organization in Meta and a first time effort with the team built. Consider where Meta was during Llama 3's release when they were no longer top dog for open models and fighting off good Chinese competitors. The level of performance described isn't terrible but it always depends on # of parameters and etc. If you take a look at the meme marks, it can perform anywhere from the model Nvidia released today to GLM 4.7 and if it didn't use 1T or more parameters, it would be a win to have it open sourced if they are considering it at all. But of course, I guess if Zuck wants his top spot back, then he is right to delay it. I just think it's dumb to aim for the top spot right off the bat.
>>108360227Are you capable of articulating why?
>>108360227>it feels like i am talking to a roboti wonder why
>>108360241>Are you capable of articulating why?it's been trained on synthetic data so it used a bot as a reference on how to talk
Anyone thinking about a DDR4 ewaste build: i got a cheapo epyc Rome 7302 with 8 sticks of 3200 32G for 256 gigs of ram. With zero gpu, running qwen 3 thinking 235b at q8 (biggest thing that fits almost exactly in memory) I get 2t/s TG
>>108360227It's good for agentic tasks then
just found the ultimate schizo merge lmao
>>108360256Not what I meant. You're diagnosing the cause, not the symptoms. If you can't describe the symptoms then you won't be able to create your own benchmark systems to identify them.The people big deep pockets training LLMs aren't going to listen to complains about the training data unless the LLMs themselves start scoring badly on the benchmarks. That's my point.
>>108360241>>108360246You are some nasty little motherfuckers.
>>108360281?
>>108360274>moving the goalpostthis is about "why it's talking like a robot not like an human", not "but what about the mememarks??"
>>108360290Are you actually retarded? This shouldn't even be a difficult concept to grasp. I've made myself very clear already. Are you even interested in solving problems or do you just like to complain like a bitch?
>>108360274We have some benchmarks for that like EQ and UGI bench. The main issue is getting a company or group to care about it to optimize for it.
>>108360268I find schizotunes funnier than merges because they cost the tuner some moneythe sicarius guy spent $1000~ to make this:https://huggingface.co/SicariusSicariiStuff/Fat_Fishat which point do you stop and wonder "what am I doing with my life"
>>108360300>Are you actually retarded?you definitely are retarded, we were asking the question about why Qwen 3.5's writing is not natural at all and you start talking about mememarks, post your hand right now you subhuman
>>108357882Quite the opposite, actually. When I'm drunk I'm much less creative and much more impulsive. I want perfect roleplay now!! Every imperfection instantly takes me out of it and I don't have any energy nor willpower to tune both my and my model's responses to fix it. Most fun with AI I've had completely sober.
>>108360306>we
Soon
>>108360323Why are you this obsessed? I thought the US posters are sleeping by now.
>>108360331>doesn't denyI fucking knew it
Does anyone here have a 16gb amd gpu like the rx 9700 xt? do you think I can get it to work? I have 32gb of ram. I understand that it's more difficult with amd than with nvidia, something to do with rocm
>>108358264>but the singularity has already startedYes, you can already see how vibecoding has revolutionized llama.cpp development.
>>108360337I don't argue with retards, that's all.
>>108360331>thinking amerimutts are the only whiteslmao.
>>108360328Why is yellow Miku emitting symbols??
>>108360259>epyc 7302 16c 128gb (4 ccds)>8x32gb 3200>qwen3 235b a22b q8>2 tk/sUseful benchmark, thanks.Does it get much better with a gpu on it?(I have an incomplete build with more cores and slower ram, so it might perform similarly when finished.)
>>108358468You don't, you just watch from the side.
>>108358714I'm tired of this shit
>>108360367Americans aren't white
>>108360281lmao xd
>>108360371He is attracting black cocks.
>>108360352Yes, use llama.cpp and compile it with vulkan support. I have used that with all types of strange and shit and as long as it supports vulkan you are good to go.I even got a trashcan mac working once I activates the experimental drivers that supported vulkan.It is so easy once you get it working you will laugh. Just google llama.cpp vulkan and compile and you will find guides.
>>108360393My 3995wx (also zen 2) with 8 64gb ddr4-3200 sticks and 3090 does kimi k2 q3 and glm 4.5 q4 both at around 9 tk/s tg.
>>108360492Do we need to compile it? I never had trouble with the prebuilt binaries.
>>108360393And it's been a while so I may be misremembering but I also tried qwen 3 235b at q8 (did not like it) on the same system except instead of a 3995wx it was a 3945wx (2 ccds) and got around than 2-4 tok/s tg.
>>108359883I have a script that lets my local llm talk to my Claude API, and have it so I have to approve myself every message so it doesnt get stuck in a loop using thousands of dollars of computeClaude passes the token heavy grinding work to my LLM to handle, and Claude handles the delicate critical stuff. I end up not having to use Claude much and preserve most of my API computeI wish I had a machine with a lot of RAM to use like 300gb Qwen3.5 as an intermediate man in the middle agentand have my agents be like>35gb Qwen3.5 - junior assistant>300GB Qwen3.5 - full stack developer>Claude Opus API - Lead software engineerthis would work incredibly wellYou could probably stretch out a $20 Claude subscription to be nearly as effective as the $200 sub doing this
>>108360539The prebuilt binaries would have to have vulkan enabled. It is really easy to compile, just a few commands.This guy wrote a guy for an Rx 580 but the procedure is the samehttps://dadhacks.org/2025/08/04/running-large-language-models-on-cheap-old-rx-580-gpus-with-llama-cpp-and-vulkan/You can do it anon
>>108360572I mean, I already compile my binaries, since there are no prebuilt cuda binaries for debian. But when I tested my v620, I just downloaded the prebuilt ones for vulkan and rocm and had no issues.
>>108360645I would imagine you are good to go
>>108360656I'm not >>108360352 btw
>>108360534>>108360558>zen2 64c 8ccd 204GB/s +rtx 3090>kimi k2 1t a32b q3 9tk/s>glm 4.5 355b a32b q4 9tk/s>zen2 12c 2ccd 204GB/s +3090>qwen 3 235b a22b 2-4tk/sThanks for the with-gpu numbers on these slight larger systems.9tk/s would be a lot more pleasing to use than 2tk/s.It's in a proper case, with cooling over the ram?
>>108360672>proper case>cooling over the ramlolThey did hit 87c once, without the fans, but now the top slot stays below 60c, while the bottom group reachs up to 69c under load. Also thanks for prompting me to open up my case, I just noticed one of the fans died.
>>108360570Doesn't the API cost a shit ton of money? Do you get a certain amount of free API tokens with a max plan?
>>108360740>fan sitting on ram sticksI'm going to copy this.Was previously thinking of folding some cardboard to make air guides, then figuring out what to do about fan mounting.
Damn! When did CPUs get so cheap? I paid $550 for my CPU like 5 years ago and I can get something that's twice as powerful now for $350.Forget about the GPUs lads. If you can't offload all of your layers to the GPU anyways it's 100% a CPU upgrade that will make the difference for you in terms of throughput.
>claude code niggers in this threaddaily reminder that the reasoning budget implementation in llama.cpp is filled with edge cases because claude code is garbageit couldn't even figure out that it shouldn't start the token counting during the prefill stage, write <think> </think> in your user prompt and make your prompt bigger than the alloted budget and see for yourselfif your agent can't implement a couple hundreds LoC of a really simple sampler mechanism what hope is there for it to build real production shit? don't fall for this meme
>>108360173i will make the logo
>>108360199>I remember trying 405B on API when it came out>had less multilingual knowledge than Gemma 2 27B lmaoNo shit, they didn't add multilingual support until 3.1.>>108360235Their biggest problem is Zuck's ego. He doesn't want to reveal anything unless he can trump it up as the very best. Elon, for all his faults, is much more pragmatic. He put out the mediocre Grok 1 and 2 and iterated from there.
>>108360863>No shit, they didn't add multilingual support until 3.1.??? what are you going on about, 405B is from the 3.1 series and there was always some mixed language data in llama retard>He doesn't want to reveal anything unless he can trump it up as the very bestis that why we had all those crappy open weight models from meta? if they really had that mentality they would not have embraced the leak and started releasing open weight models that were as mediocre as llama were
>>108360872>405B is from the 3.1 seriesMy bad, I forgot there wasn't a 3.0 405B.>>108360872>is that why we had all those crappy open weight models from meta?Open weights was pushed entirely by LeCun. Zuck starting announcing his intention to "lead" only after he declared Llama 3 "competitive" with frontier models.
getting REAL sick of your shit
>>108356979BLACKED Miku
gemma 4 today
>>108360988I m already feeling so safe knowing about it.
>>108361042she's going to fucking die
>>108360786I just said API make it simple. I have a headless browser script that uses my sub that input can just be piped through the browser to claude like it's an api. it's actually playwright browser instead of an apino you have to pay api by rate even with a sub
>>108359937You're going to get left behind, old man. For smaller changes, it almost always faster to do it yourself. But they can shit out boilerplate faster than you. It ends up faster only because it frees you up to do other things in the meantime.
>>108360988Google never releases anything good on Fridays.
>>108357554GLM 4.6? Opus 4.6?
>>108361149What do you think retard. One is local, the other isn't.
>>108361149>nemolocal>glm 4.6local>lmglocal>opus 4.6???
How does agentic coding work? Do you just give the AI a prompt and it will try to do things step by step into it thinks it is finished? Won't this fill up the context real quick and make the model retarded?
>>108361163>>108361155gatekeeping morons
>>108361171You just chat with it and it does everything in the background. It's pretty insane.
agentic moron
>>108361176>It's pretty insane.what is insane is the level of broken garbage people are willing to merge cf llama.cppagentic is only fast and productive if you have no care for correctness
>>108361171Most clients automatically summarize to condense the context once it fills up. Most of them have different modes they can delegate to that each start with a fresh empty context.
>>108361174Don't blame as for passing through the wrong gate. It's in the name of the general.
>>108361198>Don't blame assir please
>>108361198Opus is local for some people
>>108361171tell agent what you want. have a back and forh chat with it about features, or specifying what you wantthen tell it to write a plan and scaffoldingthen tell it to implement everything, or implement x in the scaffolding depending on complexity and how advanced your agent is
This thread fucking sucks right now. Just retards arguing over agentic AI and PC hardware. I would literally rather hear you fags talk about your dreams last night. Holy shit.
your brain on claude code
>>108361198
>>108361232you may prefer more cockbench and degenerate shit but, saar, you may have forgotten, this general is on /g/
>>108361263what is cockbench? I just tried googling it and only saw gay porn.
>>108360740Should I cool my ram (4x48 ddr5)?
>>108360836Gahahaha, someone tell him.
https://huggingface.co/1Covenant/Covenant-72BWhy don't the coomers with mega rigs come together to train the ultimate rp model with decentralised training?
>>108361347Same as ever, no one can agree on what "the ultimate rp model" would be, what size, what training data, etc.
and it's unlikely to produce anything competitiveI'd rather use even the tiny qwen 4B over what most westerners have fully trained over the past year, like OLMo 32B, Trinity Large and Mini or LFM2 24BA2B. It's all garbage.
>>108361347we first need an ultimate rp dataset that's not claudeslop or any of the uncleaned bluemoon shit
>>108361232You're just upset that your 10 t/s moecope rig is useless for agentic tasks.
>>108361347Because coomers with mega rigs are using GLM, Deepseek, and Kimi which are better than anything you could hope to train with those rigs.
>zucc delayed his newest model by two months>deepseek v4 delayed indefinitelyit's over, isn't it?
>>108361507we still have google's take on gpt-oss to look forward to
>>108361507anthropic blocking distillers has killed open source for good... we lost
Have any of you tried qwen 3.5 32b with codex?
>>108361198General has a picture of vocaloid in OP
>>108361517It's about time they stop freeloading. China has more than enough data, users, and resources to make their own datasets. Sink or swim.
>>108361507Deepseek 3.2 has fallen behind by a lot. Barely does any work. You ask it something and it gives up after a shallow attempt.
>>108361507It is actually the typical /lmg/ time period of indefinite waiting for next good thing.
>>108361543Why don't they just chain, quantize, distill everything they see?
>>108361517Imagine if this forced the chinks to use organic stolen data and we got the ultimate coombot in half a year.
>>108361314Just check your ram temps, some gaming cases have enough airflow without needing a dedicated cooler.
>>108361314In the current economy, I've made sure that my RAM never goes above 70C just to minimize the risk of a DIMM dying.
best model under 10b that is as good as gpt 5.4?
>>108361618stablelm-7b
>>108361314tldw hot ram can cause memory errorshttps://www.youtube.com/watch?v=4rwp0NuqDlw
>>108361618Mistral 7B
>>108358466Left Qwen3.5 397BRight GLM 4.7
I hate (love) local models they suck (are not very good but are infinitely better than the alternative on principle)
>>108361314I've never thought about cooling my ram but my AIO has a VRM fan so it's probably fine. I was once worried about SSD temps when I found out the brand new one I bought at the time was reaching 80c because it was uncovered and sitting next to a toasty GPU, so I got it a cover and made sure to get a motherboard with covers for all SSD slots when I upgraded.
>>108361688I loved zai before she cheated on me by becoming two times fatter.
>>108361238> Don't blame as for passing through the wrong gate.That's not a minor spelling mistake; that's ESL sentence construction.
>>108361947>not x but y
Has anyone here worked on training a local model to turn stories into screenplays or something similar? Like I train it on screnply books and scripts and it learns to turn prose sories into screenpalys?
>>108361947What's the non-esl sentence construction for that sentence then?
>>108361947>That's not a minor spelling mistake; that's ESL sentence construction.oh wow thanks professor faggot, amazing contribution. you saw one awkward sentence and immediately went full ICE agent on some rando's grammar. must be exhausting doing linguistic background checks on anonymous strangers to feel smart for five seconds.congrats retard, you spotted a non-native structure on the internet. truly groundbreaking work. someone call the nobel committee for this absolute galaxy brain.keep patrolling sentences buddy. maybe one day you'll graduate from "guy who says ESL like it's an insult" to having an actual point. probably not though.
seething turdies itt
Anon who was going to fine tune GPT OSS Heretic, how's your progress so far?
>>108361947I'm just retarded and wrote "as" when I meant to write "us".Unless you're referring to "passing through the gate" in which case you just don't like my metaphor.
>>108362230yeah
Fresh when ready>>108362305>>108362305>>108362305>>108362305>>108362305>>108362305
>>108362309>page 4
saved
>>108360259>>108360534What are other 256GB anons dailying? Anyone doing 4x64gb agent swarm stuff locked to CCDs?
is nemotroon super good?
>>108358129lmao>the user saying no actually means DONT ASK ME JUST DO IT so it's always yes
>>108360672You're probably gone and won't see this but I should mention this was inside a virtual machine (debian host and guest) with cores pinned to 7 of the 8 ccds (56 cores, 112 threads) and 450 ish gb allocated. Clocksource was hpet because tsc was marked unreliable on my system. Disastrous effect on windows vms. This may have had an impact on my inferencing speeds.
>>108362442>the mind says no, but the body says yes
>>108362768lmao
>>108360672Isn't this largely a bandwidth concern?Kimi K2: A32B at q3 is 12 GB per token. Qwen3 235b A22B at q8 is 22 GB per token.You would expect to see a 2x difference between these two, no?
>>108362542So maybe hope for better perf?I'll likely try bare metal first.>>108363699>12gb vs 22gb per token = 2-4tk/s vs 9tk/sFair conclusion.From reading >>108343696it sounds like we currently don't do better than 2 channels' worth of memory bandwidth.Whether from only using a small number of cores, or the bandwidth needed to stitch together sub-computations, I have no idea.The main realised advantage from having multiple memory channels being more total memory.
>>108363911The CPU could make a difference too, but judging by the example data it seems unlikely. Going from a 12 core to a 64 core didn't seem to speed things up that much. That being said, the GLM example is also A32B but at q4, so it's 16 GB of data at a similar speed .
>>108363934>Going from a 12 core to a 64 core didn't seem to speed things up that much.Maybe I'm misremembering things, it's been a while since I tested my 3945wx, but definitely recall upgrading to the 3995wx making me very happy.What I know for sure is that I remember is seeing 2 tok/s, and 4 tok/s (I don't remember the exact models each of those numbers came from). And that I tested out qwen 3 235b at q8 but ended up not using it because I didn't like qwen 3 and that q8 was too slow - which is why I think the 2 token/s came from qwen 3.Thinking about it more, I think the 4tok/s may have come from q4 glm 4.5. Checking the release date, july 2025, is pretty close to when I built my system, so that could have been it.For zen 2, at least, more cores (actually, I think it's in ccd steps, not cores specifically) results in better performance. I'm pretty sure of that.
>>108360461Tons are genetically European so yes they are.