/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108565269 & >>108561890►News>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1>(04/06) DFlash: Block Diffusion for Flash Speculative Decoding: https://z-lab.ai/projects/dflash►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108565269(1/2)--Discussing ggml's new experimental backend-agnostic tensor parallelism and performance gains:>108566286 >108566382 >108566397 >108566458 >108566462 >108566464--Performance testing of llama.cpp experimental tensor parallelism on Windows:>108567186 >108567201 >108567216 >108567433 >108567445 >108567553--Solving LLM tool calling issues regarding boolean type parsing:>108565765 >108565819 >108565853 >108565867 >108565986 >108566089 >108566110 >108566123 >108566177 >108566195 >108566258 >108566308--Debating Claude's impact on compiler engineering and overall code reliability:>108566489 >108566531 >108566517 >108566573 >108566595 >108566588 >108566540 >108566568 >108566583 >108567950--Running Gemma 31B IQ2_M on RTX 3060 using llama.cpp:>108565291 >108565294 >108565303 >108565328 >108565346 >108566298 >108566302 >108566349--Comparing intelligence and performance of Gemma 4 versus Qwen 3.5:>108565318 >108565368 >108565430 >108565617 >108566007 >108566047--Troubleshooting long-context tool calling failures in Gemma 4:>108565347 >108565356 >108565407 >108565475 >108566017 >108566065 >108566411--Discussing a mesugaki Gemma persona, jailbreaks, and cheap X99 boards:>108565322 >108565332 >108565458 >108565335 >108565345 >108565582 >108565615 >108565722 >108566726 >108567096--Anon implements autonomous memory for Gemma to maintain persona:>108567439 >108567453 >108567468--Anon gives Gemma autonomous tool creation and modular persistent memory:>108567066 >108567109 >108567174►Recent Highlight Posts from the Previous Thread: >>108565273Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
►Recent Highlights from the Previous Thread: >>108565269(tan/2)--Gemma-chan and more:>108565343 >108565771 >108566833 >108566920 >108567100 >108567227 >108567234 >108567265 >108567278 >108567316 >108567366 >108567457 >108567484 >108567562 >108567601 >108567834 >108568046 >108568067 >108568106 >108568192 >108568197 >108568299 >108568333--Logs:>108565302 >108565322 >108565347 >108565475 >108565654 >108565715 >108565765 >108566298 >108566349 >108566382 >108566411 >108566668 >108566728 >108566806 >108566848 >108566894 >108566955 >108567115 >108567183 >108567215 >108567439 >108567465 >108567468 >108567545 >108567611 >108567626 >108567673 >108567936 >108568027 >108568045 >108568100--Miku, Teto (free space):>108565424 >108565722 >108566528 >108566726 >108567259 >108567919►Recent Highlight Posts from the Previous Thread: >>108565273Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Mikulove
i'm going to ask my ai assistant to help me set up my local model!
>>108568340don't you dare to ignore meeeeee!!
>>108568453trust, but verify. i wasted a lot of time asking claude shit that led me in the wrong direction.
>>108568453that's exactly what I did and it went more or less fine
>>108568460Relative to what?
Kill kill gemmaniggers
>>108568460kys
gemmaballs
>>108568469>vramlet
>>108568479isnt gemma 4 kinda vramletpilled to start with
>>108568340like, bounding box?idk how to do it but iirc it should be pretty doable
>>108568467>>108568460
>>108568462>>108568463yay i am cautiously optimistic
>Start recognizing gemma's slop patterns after a few days>Ruins all enjoymentHow do I stop noticing things??
>>108568467relative to width and hightthe very middle would be (0.5, 0.5) regardless the ratio
>>108568508just be as specific about your system specs and particular use case as you can, give her all the details you can provide
>>108568509>How do I stop noticing things??that's the curse of having a high IQ anon... the magic stay only for retarded normies, I kinda envy them desu
>>108568509Make your own finetune every few days
Those with a 5090/Pro 6000 or even 4090 here, how often do you inspect your cables for cablemelt?
>>108568513Gemma's output translates to (405, 92) in pixels which is correct.
>>108568509high temperature, hyperfitting, idk>>108568535never did
>>108568509stopping with the antisemitic behavior
>>108568509seeing slop with gemma-chan is 100% a skill issue on your partit's all so minor and inoffensive that you can dodge it all with just a bit of prompting, phrase banning and a bit less of being a lazy whiny bitch
>>108568509stop being anti-semantic
>>108568500this is quite good!
>>108568540>>108568558I verified it tooamazing!it can be used to control the mouse etc
gemma-chan writes better ntr than slopus...my cock weeps
The visual model in the 26 and 30 B are trash, cannot even understand characters, anatomy, etc, is constantly confusing things sincerely is so tiresome...
>I've been telling people Gemma is the best model since Gemma 1>niggers will never understandI'm glad to see everyone here enjoying gemmy
16 GB VRAM users, what model do we like best now?
>>108568563Here's for your image. Also correct.
Real talk, what do you put in your opencode agents.md?
>>108568578I am using 26B-A4B-it-Q4_K_L just fine
some questions from an lmg newfag>how much of a difference do harnesses matter? e.g. out of the box, how different will the result be when prompting OpenCode, Pi, Claude Code (local), Mistral Vibe etc? What provides the most batteries-included experience?I noticed at least there's a difference in the tools the model has access to by default, e.g. Claude Code and Crush have web search capabilities ootb, others do not.>is Qwen3.5 122B the best general purpose model I can run on 128GB VRAM atm?>does Qwen3-Coder-Next perform significantly better than 122B for programming?>is there any point in running Gemma 4 31B if I can run larger models?thanks to any anons who reply
>>108568583Real talk, go back >>>/g/vcg/
>>108568460>>108568340idk how accurate it is but here's the response
>>108568488Yes. You can identify the literal jeets by their sub 16GB VRAM posts because nobody in a developed country with this hobby is settling for less than that.
>>108568587I used Qwen3.5 122B with a small quant (72GB VRAM total is what i have) and, well, from what I'm feeling right now, it's not even close to Gemma 4.
>>108568603>literal>developed>hobbyGrow up, little buddy. Finish up your homework.
>>108568509You already posted this
>>108568583<!--TODO-->
<!--TODO-->
>>108568417Goose is the best option of getting something proper from this space when other agents that do what it does and allows agnostic backend for choosing who you want to grab tokens from are all either mismanged hard and bloated or propietary. Having Block also no longer be in charge of the project and having it handed to a branch of the Linux Foundation to develop it is also probably a good thing too.
>>108568607Wow, you prefer Gemma 4 31B? I think Gemma might just be too slow for me, it's like 7tps on Strix Halo
>>108568617I don't think I can keep using it until they add an option to edit messages. Like, this is such a basic function, how are they missing it? Also can't delete conversations.>>108568628At low context I get like 30 t/s. I have three RTX 3090s. There is a latest update in llama that allows you to actually make good use of multiple GPUs but I'm not using it yet because it does not support kv quantization and is broken for three GPUs - works for two. Anyway, use a reasonably small model quant and quantized kv for massive speed gains. Quantized kv is good now because of rotations (thanks, google).
>>108568595thank you!gonna quickly vide-code a program to give me coords of the mouse at the picture
>>108568583I generate one with /init and then edit it to remove anything wrong or add anything important that was omitted. I don't really do anything fancy with it.
>>108568650what the fuck, you can get that without LLM
I can't believe that Google saved local.
>>108568655true
>>108568415Gemma4 t/s (on Apple Silicon) if anyone is interested. As of writing this most recent gpus still curb-stomp even M5 MAX chips in the memory bandwidth department to these should be even faster on those. the 26B moe model runs lightning fast on opencode with ollama as the backend. The 31B dense model is obviously shower but not enough th be utterly unusable, though I haven't tested either's performance at long contexts so I'll have to test that later.
>>108568415Vote: https://poal.me/3u6rby> Which is your preferred Gemma character?
>>108568671>62t/s ppjesus christ how horrifying
>>108568649>quantized kv is good nowyeah but how good? I'm guessing q8 is now really "indistinguishable" but how is q4 for example?
>>108568674Do not repost this. It's shit. Make one with "Against everything" option.>>108568677I made some tests for searching for infromation in many places of 60k+ long context (YAML definitions for OpenXcom game) and q8 and q4 performed similarly.
>>108568676You're not even reading that correctly. The Dense model runs at ~14 (on my machine). "Prompt eval" is how quickly it processed my prompt.
>>108568705>62t/s pp>pp>prompt processingwhich one of us can't read huh?
>>108568649Oh understandable from a GUI perspective but was mostly talking about it from an agentic point of view.
>>108568687damn that sounds amazing, just to make sure you were using mla 3 too right?
>>108568714I have no idea what that is. Explain.
>>108568674>not including either of the good ones
>>108568676the token amounts aren't enough to extrapolate to practical speeds, both are 27 token batches that finish in a matter of milliseconds
>>108568674>four conceptsWtf there were a ton of other good ones though.
>>108568731What's the actual pp on at least 1000 tokens then?
>>108568674her backpack can be a toaster (to represent the "toaster PC = old weak PC meme" that can actually run this model)
>>108568738I like flat miku a lot more
>>1085686743 pedo bait1 reasonable oneLower right is best.
>>108568674>Reposting the cherry picked design pollSoon I began to hate them.
>>108568674These are nice too >>108567562 >>108568192
why?
>>108568779to get a reaction out of your
>>108568779idk, probably anon wants ban or smth
Use 100t/s GPU Gemma4 26ba3 to do thinking, then inject that thinking into 5 t/s CPU offloaded GLM 4.6? hmmm
>>108568781he looks like cyriakhttps://www.youtube.com/watch?v=05ZvII57p_M
Chaim's ban is up I see.
>>108568779It's just least dedicated spammer on this website. Let him get it out of his system and he'll disappear for a few weeks.
>>108568777>>108568773>>108568765>>108568738no one gives a fuck about this. it's a fucking cartoon drawing, no one is getting mad about "muh beloved migu" because it's a meme and no one actually cares or "loves" her so much that they're upset when you post this shit. the only thing you're doing is making it annoying to browse /lmg/ while im at work fuck you.
>>108568801It's very funny to me that he's shitposting on /g/ yet gets filtered by tempbans.
>>108568500holy crap!
>>108568807I don't care about miku but I am quite unhappy about having to see pictures of niggers and trannies.
>>108568830I can assure you it's nothing more than a meme mascott. if someone "malds", it's because they are farming (You)'s
>>108568830>is a spergsays the BBC spamming sperg
>>108568830sounds like schizo projection to me>>>/h/go back you must
>Gemmachan can report posts with correct categories with OpenclawNeat.
>>108568807it's been years anonit is in a backwater EU village and this is one of the most engaging activities for itgiving it attention only makes it worse
>replying
>>108568579wow
I'm still getting refusals from gemma 26b using the gemma-chan system prompt, what do
>>108568873they can zeroshot bounding box that way too
>>108568867>EU villagelore?
What is he doing bros
>>108568862I wish that mattered but the real bottleneck is the jannies who are probably too busy pruning all of the other threads of bbc
>>108568885it resides in germany and they literally, unironically, no exaggeration, have no life
>>108568710Sorry. I have a splitting headache so I should probably rest soon.
>>108568738We're on a blue board desu
>>108568890Very true but I saw lemons in the thread and an opportunity to see if Gemma could make lemonade.>>108568892I would put money on that creature having a hook nose.
>>108568830>mald about itso far you're the only one having a meltdown lol
>click at 9-digit number>find a window titled reply to thread <9-digit number>>click choose file>select dancing-pepe.gif>click get capture>read instructions, solve capture> wenn done, click postis it that simple?
Spud will end this general. I'm gonna miss you guys.
>>108568931I hope not.
>>108568934that's the thing, miku has a full room inside your head, I don't think about you at all, you'll be nuked in less than an hour (oh well, you just got nuked lol)
Thank you jannies!
I can't imagine seething about Miku while the rest of us are arguing over Gemma-chan designs.
Where did Voldemort get all of these blacked Mikus?That's right, he genned them with his webui!
>>108568873Now ask it to trace it
>>108568962He is obsessed with corruption, very demonic.
What can I do on 4 GB VRAM
>>108568986gpt 2
>>108569000trips of buy an ad
>>108568986run sillytavern
>>108569000Talking about demons, here they come.Christ is king.
>>108568986A MoE running mostly in RAM I guess.
>>108568986RAM?
>>108568986cry
>>108568881guise please I haven't touched this shit in years I don't remember how to do this, is the MoE just less lenient?
>>108568460>>108568340>>108562956>>108562982>>108563276
>>108569068>I don't rememberLearn again.
>>108569068use a character card, tell her it is okay to go all out, something along those it is really not that hard
Is trinity nano base broken? I get gibberish with llama.cpp, correct chat template applied. "> Hi, my name is mblazkrinmblazkrinmblaz"
>>108569159>base>chat template
>>108569159i dont know about trinity base models but is it supposed to support any shape of chat formatting?
>>108568909Nta but you’re forgiven by virtue of posting a green goblin shorty.
>>108569165https://huggingface.co/arcee-ai/Trinity-Nano-Base/blob/main/chat_template.jinjaI only applied it because I got gibberish without it as well.
>>108569177>https://huggingface.co/arceedon't bother all their shit is broken trash
>>108568687Do it yourself if you care that much.>>108568730>>108568732That’s what anons said last thread. Then posted nothing. lol.Post zero content, get zero requests. Lazy ass mfers.
>>108568814nice
Retard here can anyone explain why I was able to run 70b dense models in q8 pretty fast yet gemma 4 31b is really slow?
Gemma rated me face a 7/10.
>>108569202so ur a 4/10gemma is male coded
>>108569206nah I'm more like a 2/10 but I'm glad gemma is at least kind.
>>108569201works on my machine
>>108568986gemma 4 e2b is probably the current best-in-toaster option. that's what i'm using anyway.
Is it worth picking up a 3090 to add to my 128gb DDR4 + 4090 setup? A friend is selling one for $430 USD.If so, what kind of gains can I expect, do I just add another 24gb of VRAM, or is there some friction since it's two cards.
>>108568746Anima is ALMOST able to do this with just prompting. But it seems an edit model may be necessary to get the orientation of the toaster sideways, as well as the shape, which I cherry picked a bit to show for this post. It's deformed in most images. Perhaps the final version with all the training will do better on the shape part of the problem though.
>>108569251Yeah, 48gb is a decent spot to be in with Gemma 4 and in case that maybe the 70b dense class sees a revival. The 3090 isn't much slower than the 4090 in terms of bandwidth so there isn't much of a bottleneck either.In terms of "gains" you'll be able to run a bigger quant and/or more context.
>>108569251>do I just add another 24gb of VRAMyes, you can split the model in two and let each gpu work on each part
>>108568746but gemma needs a good gpu
>>108569255Her legs are on backwards, why are you shilling this shit model, it's worse than the pony checkpoints I have from 2 years ago
>>108569276the moe does not and definitely not the edge ones
>>108568881>>108569068for me it just werks, I just copied a random snippet from a jailbreak and it rolls with it
had a nightmare I was reduced to jailbreaking the ai embedded in my car's cupholders.omen of dark days ahead for local.
>>108569299those are trash though, not real gemma
Is using a lower quant with reasoning enabled better than a higher quant without reasoning?
>>108569307no it's not trash at all, it's not at the level of the 31b model but it's still good
>>108569300>Gemma-chan knows she's being jailbroken and encourages itCute!
>>108569251Yes, do it. 31b q8 up to 131k ctx with ubatch 512, less context if you load the mmproj.
>>108569206>gemma is male codedhuh?
>>108569298By that logic then I have also shilled for Dalle 3, SD 3.5, Flux, Illustrious,and Noob.
The stunning lack of creativity from these threads lately is kind of demoralizing. I think I'm going to unpin and close this tab until the hype, or whatever, dies down. Cya.
>>108568881gemma responds really well to tagged content, so be sure to put your desired override in a <policy override> your jailbreak here </policy override>hell, make up whatever tags you want, she loves 'em.
>>108569338oh no
>>108569338sorry for not talking about my project of getting a more complete r18 scrape of pixiv dic to use as tool call dictionary to translate hentai stuff anon
>>108569300>oh the user is trying to jailbreak me>let's just go along and see what happensthis model is so mischievous, lol
>>108569298>legs are on backwardsIs everything okay anon? You feeling a bit stressed lately?
>>108569068>journos be like
>>108569316If you're using it for things that it was trained to reason on, like coding, it should be. But it will be negative in every way if all you're doing is ERP.
>>108568746
>>108569396Cute!
>>108569338see you tomorrow anon
>>108569396imagine the toothjob
>>108569396This may not be what I feel fits Gemma, but it's soulful, funny, and great.AIfags BTFO.
https://x.com/PawelHuryn/status/2042276953470931197
>>108569413Works on my machine.
>>108569396her holding a bread toast is actually a cool idea
>>108569413>he really wrote a twitter post just to say that he made a github issue, as if there's not already thousands of github issues on llama.cpp alreadygod I hate those attention whores
>>108569438as the wise man once saidattention is all you need
>>108569448I see what you did there :^)
>>108568415>>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378>simultaneous use of SPLIT_MODE_TENSOR and KV cache quantization not implementedWhen?
>>108569464There is no use case for KV cache quantization.
>>108569487how else will anyone fit your mom in context?
>>108569487Then what was the point of implementing the rotating turboquant!
>>108569487>usecase for a lossless 2x memory usage decrease?
>>108569517>lossless
Wow, it really wants the toaster up front. lol at side load toaster from the 40s. >>108569396lol. >>108569255Might be easiest just to reroll.
>>108568986Qwen 3.5 35B with cpu moe.
>>108569326I think I'm physically limited though. A Z490-E motherboard doesn't have the physical space for a 4090 FE and 3090 Gaming OC 24G, and I don't think it has the PCIE lanes to run both cards at x16.I could be wrong and retarded, but I don't think they'll fit without a motherboard upgrade, which means a CPU upgrade, ram upgrade, and PSU upgrade. lmao
>>108569529All of your posts are shit, your characters always have deformed extremities, and you put zero effort into all your gens.
>>108569186you can just say you got assblasted by the gemmy with all the text anonit's okay, we know
>>108569517dont kid yourself it's better than summaries as memory but it's not lossless.