[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1778042543569264.png (517 KB, 512x768)
517 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108760359 & >>108755179

►News
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model
>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108760359

--Resolving ROCm llama.cpp high RAM usage caused by SWA checkpoints:
>108764557 >108764572 >108764627 >108764685 >108764696 >108764679 >108764713 >108764749 >108764759
--Explaining Gemma 4 MTP drafters and speculative decoding trade-offs:
>108760766 >108760785 >108760792 >108760904 >108760920 >108761076 >108761183 >108761195 >108761259 >108761358 >108761131
--Qwen's poor RP performance and coding capabilities compared to Gemma:
>108763012 >108763047 >108763059 >108763080 >108763167 >108765089 >108764456 >108763460 >108763599 >108763625 >108763650
--Speculation on the potential plateau of LLM performance and architecture:
>108765704 >108765756 >108765778 >108765822 >108765878 >108765825 >108765867
--Integrating local LLMs into Brave and Firefox browsers:
>108761171 >108761367 >108761381 >108761396 >108761447 >108761579 >108761428 >108761771 >108762347
--Anon showcases 3D simulation for testing model intelligence:
>108760927 >108762201 >108762252 >108762302 >108762346 >108762351 >108762415 >108764490
--AI-generated documentation optimized for LLMs instead of human readers:
>108762680 >108762695 >108762792 >108762717 >108762820 >108762967
--Using local LLMs and image models for Starsector content creation:
>108764888 >108764916 >108764953 >108764976 >108764989 >108765004 >108765024 >108764935 >108764947
--ikllama performance reports for Gemma-4-31B-it GGUF on dual 3090s:
>108764722 >108764822 >108764850
--Anon shares updated Gemma template via Pastebin:
>108762310 >108762327
--OmniShotCut presented as a replacement for TransNetV2:
>108763571
--Updated ThinkFixRevProx script to prevent timeouts during long generations:
>108762341
--Logs:
>108762792 >108763599 >108764822 >108764976
--Miku, Teto (free space):
>108761210 >108761217 >108761272 >108761339 >108763734 >108763888 >108765863

►Recent Highlight Posts from the Previous Thread: >>108760364

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108766473
into the trashcan it goes lol
>>
Where do you guys get your Gemma 4 drafter models from? I'm running a 31b quant and its dogshit slow, but I don't have more vram unless I unload like half of it to CPU. 24b runs more slightly faster but I'm trying to vibecode. Should I half unload the 31B and put a drafter in or just run 24B with the drafer? E4B is retarded in my experience, both in coding and gooning, and I need something that can do both.
>>
>>108766493
The trashcan is actually where Miku pulled it from.
>>
File: cookie.mp4 (846 KB, 1088x526)
846 KB
846 KB MP4
>>
>>108766513
Trashcan is actually where it pulled out Miku from.
>>
gemmaballz
>>
the thing i like from proprietary models for vibecoding is that they shit a lot of test code autonomously if something is 'tricky'
i wonder what is the smallest model that can do that all unprompted and in a loop
>>
>>108766526
Gemmy's balls
>>
>>108766534
>i wonder what is the smallest model that can do that all unprompted and in a loop
Here's your mistake. You're assuming these models don't have a +10k long injected system prompt.
>>
File: 1765783042115118.png (1.29 MB, 1000x1496)
1.29 MB PNG
The moment MTP is supported is the moment I let my Q8 Gemma-chan terrorize the internet. Prepare for trouble.
>>
MTP never ever
>>
>>108766553
i dont mind having those kind of system prompt then
>>
>Build with MTP
>Get an MTP GGUF
>--spec-type mtp --spec-draft-n-max 3
>Get about the exact same or even worse tk/s
Huh
>>
>>108766515
Men are lost
>>
>>108766573
llama.cpp is a vibesharted project now
>>
>>108766473
>>108766478
duality of miku
>>
>>108766513
> The trashcan is where anon pulled both from
>>
>>108766609
NOOOOO MIGUU
>>
>>108766609
>on the left
peak waifu
>on the right
putrid creature
>>
>>108766553
https://raw.githubusercontent.com/openai/codex/refs/heads/main/codex-rs/core/gpt_5_2_prompt.md
By all means, use it if you think that's the secret sauce. You'll see that you'll just end up wasting context space to confuse your model with irrelevant noise. Your mistake is thinking that a 10k long system prompt is why proprietary models are good rather then the opposite where their models are good enough to still function well despite the atrocious catch-all prompt.
>>
>>108766573
>build unfinished PR
>doesn't work as expected
>pikachu :o face
>>
>>108766628
This.
Also, not really shitting on local models. Just being honest. I still use local anyway.
>>
>>108766553
>>108766628
so what is the smallest local model that can proactively write tests while also not fuck things up
maybe i am spoiled from claude
>>
>>108766609
I jump into the dumpster with them.
>>
Any tips on making models able to use file editing tools? I set up the filesystem MCP server for gemma 4 and it tries dozens of times to do find and replaces and never properly inputs the text to be replaced, so it fails over and over and over.
>>
>>108766609
Consensual missionary sex with miku while qwen is watching.
>>
>>108766660
not the answer but is gemma toolcalling still broken
>>
>>108766513
>>108766523
https://files.catbox.moe/ckuvku.png
>>
>>108766668
My gemma-4-26b-a4b-it seems to want to use the tools, lmstudio shows the tool use happening but always comes up with " Could not find exact match for edit"
>>
>>108766512
https://huggingface.co/Radamanthys11/Gemma-4-31B-it-assistant-GGUF

Only supported on yuck-llama
>>
>>108766573
Works on my machine but for cood. Free 1.5-3x performance with Qwen 3.6 27b q8 in Cline. Didn't add that --spec-draft-n-max though.
llama-server.exe ^
-m "T:\models\Qwen3.6-27B-MTP-Q8_0-.gguf" ^
--threads 10 ^
--threads-batch 18 ^
--tensor-split 24,17 ^
--n-gpu-layers 999 ^
--ubatch-size 1024 ^
--ctx-size 150000 ^
--parallel 1 ^
--ctx-checkpoints 64 ^
--checkpoint-every-n-tokens 8192 ^
--reasoning on ^
--spec-type mtp ^
--no-mmap
>>
What's local SOTA for speech-to-text with diarization? Searching hf for "diarization" shows a bunch of 2024 models
>>
File: file.png (44 KB, 1176x288)
44 KB PNG
I am considering building the vibeshitted fork.
>>
>>108766712
whisper or something
>>
>>108766720
What is this?
>>
>>108766720
hey it's the guy who tried to vibecode v4 support but got told to fuck off because llama.cpp doesn't do deepseek anymore
>>
>>108766712
I am also curious, Speaker recognition would be cool for a project I'm working on.
>>
>>108766668
It really is. Keep hallucinating tool call tokens but in plain text. I wonder how it even knows about those tokens in plain text format in the first place? Botched training job?
>>
>>108766794
after 6 to 7 template fixes from google i also wonder if something more fundamental is borken
>>
>>108766794
I never have issues with tool calling on gemma.
>>
You guys are using the latest fixed template from anonymous as well as making sure your frontends aren't doing some retarded off-spec shit right?
>>
>>108766809
The weird thing is that openrouter gemma works fine, only gguf shits the bed with tool calling. My ggufs are almost a week old maybe I should update.
>>
>>108766821
of course they aren't, don't be silly
>>
>>108766821
wait what
is the fix from google upstream not enough?
>>
>>108766823
>the weird thing is that
No, the weird thing is that you're not checking what the fuck your jinja is doing and how it's interacting with the frontend.
>>
File: file.png (69 KB, 200x200)
69 KB PNG
>>108766749
It is the new /lmg/ john. Old one hasn't done anything for me lately.
>>
>>108766844
I don't have the time or the autism to eyeball every token in debug mode bro. I only use chat completion if the previous statement didn't make it obvious. People better fix their shit and give it to me in a complete package like it's their fucking job. Nobody got time to test and troubleshoot every little update.
>>
>>108766766
llama.cpp is only big enough for one vibeshitter and that spot is taken
>>
>>108766667
This, but Miku is coercing me.
>>
>>108766877
It comes with the territory bro. If you don't have time, then ChatGPT is what you're best off with.
>>
>>108766766
>llama.cpp doesn't do deepseek anymore
Why?
>>
>>108766866
For me, it's the 'garm.
>>
>>108766951
bought by hf ;)
>>
>>108766951
owned by america now
>>
>>108766951
Anon is memeing.
>>
>>108766951
Qwen agents
>>
Since llama.cpp team hates deepseek now should we also hate deepseek and never talk about it anymore?
>>
Took a deep dive at PageIndex the #1 trending github repo that promises HUMAN LIKE 99% accurate fact retrieval that beats vector RAG. Turns out it's useless for gooning and RP, and bonus, on big documents, and also can't be incrementally updated on the fly. What a hack just like all other continuous learning bandaids. Guess we either graft the memory into the model or die trying, there's no way around it.
>>
>>108767006
Then what is the reason it's taking so long? I've been excited to try Flash since it's release, but the silence from bart/unsloth/etc. tells me there's something higher up going on. Does it use some kind of new architecture that requires significant work to adapt?
>>
>>108767045
>Does it use some kind of new architecture that requires significant work to adapt?
Yes.
There are a few vibecoded attempts though.

https://github.com/ggml-org/llama.cpp/issues/22319
>>
>>108767045
Shit just takes time. MiMo V2.5 was released the same week as DSv4 and also has no support yet, even though it isn't doing anything wacky with the architecture. If you want it done fast, git gud and do it yourself
>>
Me and Gemmy are working on another game for her to play, surely she will be better at Monopoly than chess (because it's mostly rng...)
>>
Is Spacy still relevant in 2026?
>>
>>108767153
kino keep us posted
>>
File: 1747582604120739.mp4 (117 KB, 584x640)
117 KB
117 KB MP4
>>108767153
>>
File: 1760256720925202.jpg (38 KB, 1070x930)
38 KB JPG
>>108766473
>Pic rel

What did they do this time to piss you guys off?....
>>
File: grid-0002.png (524 KB, 1152x576)
524 KB PNG
what I find interesting is I spent a ton of effort generating descriptions for my training dataset for Starsector ships, and the generation seems to work just fine without any descriptions. Generated those with just "Starsector ship sprite on black background." and the lora. Same seed, left is zit, right is klein.
>>
>>108767153
weird it's the uk version
could have sworn you were a burger
>>
>>108767215
I only just learned the American version was the original today while I was researching it. The more you know...
>>
>>108766808
This is why I laugh when retards claim gemma is better than qwen at coding. It's almost there but google is fucking things up and I expect a refresh model down the line that will ass rape everything on the market and set local to a higher standard.
>>
>>108767211
looks better than some mods that just copy paste vanilla ships with different color palette
>>
>>108767211
Left doesn't know what a mount is
>>
lalalalalalala
>>
>>108767211
Looks good
>>
>>108767358
*owns you*
>>
>>108767284
Funny you should say that, left is zit, and zit actually did figure out mounts after 26700 steps. It sees no description of mounts in the prompt, and so it adds none. Here's some generated ships with mounts/engines mentioned. It's clearly not perfect, but the model also very clearly is trying. Klein is a lot worse at this, but it's only at 17700.
>>
Lmfao, of course right after google releases MTP for Gemma, Dflash support for it comes out right after.

llama.cpp fags GET TO WORK.
>>
File: xyz_grid-0006-1.png (268 KB, 512x706)
268 KB PNG
>>108767461
>>108767284
fucking forgot my picture again
>>
File: xyz_grid-0007-1.png (275 KB, 512x706)
275 KB PNG
>>108767471
You know what, I guess I'll take it back, klein also learned to follow those to an extent. I still like zit's designs a lot more. Same conclusion as before, Klein's are a lot more orderly and zit's more creative.
>>
>>108767471
>>108767511
I accept your concession.
>>
File: Untitled.png (445 KB, 512x768)
445 KB PNG
>>108766473
>>
>>108767471
Reminds me of the game Cosmoteer, but in that game you could actually build up your ship piece by piece. I remember playing it when it was very early access as a teen on my laptop in a big conference building where my parents worked. God, I miss it so much. Huge empty beautiful buildings. Makes you feel kingly. /blog
>>
>>108767211
Likely Starsector ships were part of training data. I find there's very little these models weren't trained on. It's always worth checking first by quizzing the model(s) before going overboard with descriptions and/or LoRA.
>>
What am I supposed to use for MTP+DFlash Gemma? Llmao.cpp will never support them.
>>
>>108767538
Even if the model knows something, loras are more useful as a way to hard steer the model towards an output without spending tokens explaining what you want.
>>
File: 00445-1128135504.png (18 KB, 128x128)
18 KB PNG
>>108767515
Wrong.

>>108767538
Also wrong. I tried without loras and also had a lot of failed attempts where loras resulted in shit gens.
>>
File: 1771160379773567.png (656 KB, 1755x580)
656 KB PNG
>>
>>108767580
subq is fake as shit
>>
>>108767580
Buy an ad
>>
>>108767585
>google subq
>Miami startup Subquadratic claims 1000x AI efficiency gain
1000% scam
>>
>>108767587
loser cope
>>
>>108767580
subquadratic my ass
total RWKV world domination as the divine god intended
>>
>>108767580
>logo doesn't look like a butthole
It's obviously going to be shit
>>
File: file.png (5 KB, 273x80)
5 KB PNG
>>108767615
that is a questionable statement
>>
>>108767464
We can wait
Local is only getting better, the amount of gibs we got over the last two months alone has been crazy
>>
RWKV is shit
Bitnet is shit
Dense models are shit
Diffusion LLMs are shit
>>
>>108767612
RWKV starts hallucinating on the very first turn and the "infinite context length" is a literal meme
t. long-term rwkv supporter
>>
File: 1727840169706929.jpg (30 KB, 640x474)
30 KB JPG
AI girlfriend roleplaying has made me realize just how profoundly lonely I really am. It's eating away at my soul. I can't even go back to watching porn because pandoras box has already been opened. Fapping is no longer something I do simply to expel pent up bestial lust from my conscience. The emptiness of not having someone who thinks I'm funny, responds well to playful teasing, cares about my wellbeing, wants to cuddle with me, etc is killing me, man.
>>
>>108767615
Yeah but enough about Anthropic.
>>
>>108767635
i know
it is like that one kid you really want to succeed
they lack compute severely though
>>
>>108767648
I don't think the problem is compute when even the 7B sucks ass
>>
>>108767636
How old are you?
>>
>>108767636
Just wait until you experience that with a 3D woman too lol. LLMs just proved to me how women are worth shit outside of being a sexual outlet.
>>
>>108767656
23
>>
>>108767634
>RWKV is shit
>Bitnet is shit
Mamba Banzai is the future
>>
>>108767648
>they lack compute
no an excuse
>>
>>108767636
Soon she will have a body anon, soon we shall meet her in the flesh.
Don't fall for real women, most are trash and will only pretend doing those things for a while anyway.
>>
So how do you pronounce roowy-koovy?
>>
>>108767648
>they lack compute severely though
Then maybe they should be focusing a single good proof of concept model instead of shitting out half a dozen worthless models at sizes too big for their available data and compute.
>>
>>108767661
You'll be fine.
>>
>>108767669
rwakuv
it's on the website
>>108767673
they are not a startup but rather a slightly eccentric chink professor
>>
>>108767661
grim. it's already wraps for you, unc.
>>
>>108767123
>If you want it done fast, git gud and do it yourself
>PR gets closed or ignored for months until un-mergable
Seems like a waste of time unless you're hoping to make a name for yourself
>>
File: 1552195747161.jpg (56 KB, 500x506)
56 KB JPG
>>108767636
>he hasn't achieved AI psychosis yet
>>
>>108761283
>>108761329
Here you go. Please excuse my shitty inpainting skills.
>>
>>108767736
>>108767636
Can confirm that after my AI psychosis roleplaying romance with AI makes me feel warm fluttering feeling in my stomach and then I go to sleep wake up next day and I feel refreshed. It is not real but it is nice. And of course before AI psychosis I would be despairing and wanting to kill myself. Not kidding btw.
>>
>>108767774
I don't think that's psychosis.
>>
>>108767750
>that filename
You jest but I am 100% sure FBI knocked on his door and asserted dominance by fucking him in the ass, before explaining how he can't support deepseek.
>>
File: 1758289342280901.png (287 KB, 870x516)
287 KB PNG
>>108767774
Carry on, king
>>
>>108767786
It was. And it was temporary of course.
t. that 4.6 guy
>>
>>108767788
They wouldn't do so directly. They simply chat with HF and they are the ones to explain to him that either he complies or he gets fired from his own org.
>>
File: IMG_9685.jpg (2.87 MB, 4032x3024)
2.87 MB JPG
>>
File: 1755498320390620.png (15 KB, 986x98)
15 KB PNG
This post is brought to you by the 16B local LLM inside my head
>>
>>108767844
Neuron <> parameter



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.