[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: robololi hugs GPU 2.jpg (519 KB, 1024x1024)
519 KB
519 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108519856 & >>108516658

►News
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3
>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: chibi_miku_gpu_1.png (1.29 MB, 1024x1024)
1.29 MB
1.29 MB PNG
►Recent Highlights from the Previous Thread: >>108519856

--Discussing handling of Gemma's thinking blocks in multi-turn histories:
>108522106 >108522122 >108522130 >108522225 >108522326 >108522301 >108522330 >108522349 >108522396
--Comparing Gemma 4's performance and repetition issues against Mistral 3.x:
>108520556 >108520583 >108520616 >108520591 >108520629 >108520663 >108521005 >108520665 >108520695 >108520794 >108520871 >108521612 >108521633 >108521640
--Discussing logit softcapping for Gemma 4 to improve response variety:
>108521009 >108521025 >108521026 >108521075 >108521091 >108521139 >108521303 >108522677 >108522691 >108522702 >108522943 >108522949 >108522955
--Gemma 4's low censorship and debugging llama-server crashes:
>108521733 >108521777 >108521796 >108521822 >108521831 >108521872 >108521908 >108521978 >108521989 >108522157 >108522161 >108522332 >108522771
--KV Cache quantization and context length optimization for roleplay:
>108521216 >108521226 >108521307 >108521373 >108521385 >108521388 >108521401
--Discussing a patch making Gemma's logits softcap configurable:
>108520086 >108520139 >108520210 >108520231
--Debating llama.cpp's direction following Gemma 4's reception:
>108520807 >108520826 >108520858 >108520880 >108520921 >108520990 >108520860 >108520934
--Impressions of Gemma-4-2B's roleplay quality and thinking capabilities:
>108520161 >108520190 >108520317 >108520493 >108520537
--Anon buys RTX 6000 for Gemma 4's high KV cache needs:
>108519917 >108519933 >108519982 >108519950 >108519952 >108519983 >108519960
--Praising Gemma 4 for ERP and discussing perspective tests and performance:
>108519877 >108519901 >108520393 >108520589 >108520547 >108520608 >108520642 >108521252 >108521812 >108522080
--Miku (free space):
>108520018 >108520164 >108520232 >108520411 >108520425 >108521082 >108521554 >108521568 >108521656

►Recent Highlight Posts from the Previous Thread: >>108519859

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108523376
wait whos this bitch
>>
File: 1775296187.png (275 KB, 1762x1448)
275 KB
275 KB PNG
First for NIM
>>
>>108523394
>First
epic fail
>>
>>108523398
based
>>
>>108523389
don't ask a llm for their opinion on what log output says without also showing them the relevant code, they love making assumptions about things
pic related is why vibecoding or vibeasking llm questions about code is a fail when you do not know what should be introduced in the context yourself
>>
>>108523415
Damn, so many things are wrong on gemma 4's implementaiton
- We don't know if the rotation shit is being applied or not
- The temperature does nothing
- There's so crash during tool calls

bruh, I miss the time when llama.cpp wasn't bought by huggingface, the enshittification process wasn't as fast as right now, now their team consist on vibeshitters who can't verify by themselves if Claude is hallucinating stuff or not
>>
Can we actually be reasonably sure that this technology is approaching its upper limits in what it can do and isn't just bottlenecked at every stage of development by jeets and vibecoders shitting it all up?

Would AGI SOON :rocket: actually be possible in a less brown world?
>>
Imagine when Gemma 4 124B gets reveled on May 20. I wonder if the Chinese will rush to release their best models before that. Or perhaps they'll wait for Google's fagship Gemma to distill the heck out of it like they've been doing with gpt-oss-120B.
>>
>>108523433
The only actual innovative lab is DeekSeep so we'll know for sure when/if V4 comes out, but that it has taken over a year already is not a good sign.
>>
>>108523433
i think all ml researchers are sub-human, so no.
>>
>>108523433
even the data is built upon a framework of shit, I mean, most data labeling work is done by brown hands, openai is willing to spend billions on data center but not a single cent on paying western worker a worthwhile salary to do proper, well thought labeling work.
RLHF is really reinforcement learning from jeet feedback. Even when you do it from pure synthslop reward models, those models inherit and condense the Original Sin
>>
>>108523394
>thinking disabled
>>
>>108523447
Funny enough I've had many remote job advertisements for chemistry data labeling tasks requiring chemistry related degrees. Which the Indians actually have a lot of, but they seem to be targeting Europeans / Americans specifically.
>>
>>108523442
I hope Dipsy saves us all.
>>108523447
Is there realistically any fix for this that isn't starting over from scratch? As much as we shitpost about ozone in our RP logs, it really is the perfect microcosm of the entire problem that's been snowballing since at least GPT-3, like you described.
>>
>>108523394
Doesn't NIM log all your prompts
>>
>>108523473
Just like space debris, there is no fix other than accepting that the problem exists and attempting to mitigate it.
>>
>>108523447
The travesty is that white people think they're too good for manual data labeling work, so it's exclusively done by browns and blacks.
>>
>>108523480
If something connects to the internet and isn't open source then it's safe to assume it's logging and sending everything it can
>>
Yes, I understand it now... I need more VRAM... Much, much more VRAM...
>>
>>108523484
kys elon
>>
Am I supposed to get 21 t/s on Gemma 4 26B 4AB with a 3090? This seems slow compared to the 17 t/s I get on the 31B one.
>>
>>108523498
I'm getting 34 t/s on the 31B one with a 3090 so either way you're fucking your settings up somehow.
>>
>>108523498
I'm getting about the same as the other anon, make sure both are not spilling onto RAM.
>>
>>108523498
Use
>--gpu-layers 99
And tweak
>--n-cpu-moe XX
Starting from 20, 30, 40, 50 and check your vram usage. When it begins to drastically drop below the max (minus some space for your system) you have hit the sweet spot.
>--mlock, --no-mmap
Might make a difference too but that's up to you.
Use llama-server webui to test out these things.
>>
>>108523484
lol. lmao even.
>>
>>108523498
you probably have shit using your vram and not loading all of it.
i get 130t/s at q4 on a 4090.
>>
I get 50t/s at 0 context and 40t/s at 55,000 context with Gemmy 4 31b. I'm impressed by how little relative speedloss there is as context climbs.
>>
>>108523484
just pay them more and you'll see the average melanin becoming clearer
>>
>>108523498
Repull llama it fixed this today.
>>
>>108523539
Asking for fair wages is antisemitic.
>>
>>108523540
What else was broken by the latest fix?
>>
Yeah of course, the training pipeline is currupted by browns, which were brought out by... Jews! Oh so jews are at fault again! Because of jews we don't have AGI!! Damn, every single time!!!
>>
>>108523567
>AGI
corpo wet dream, no one with a soul wants this
>>
>>108523567
>Damn, every single time!!!
this but unironically
>>
>>108523562
Some anons report gemma still has repeating melties but it's working fine for others so maybe you'll be one of the lucky ones.
>>
>>108523567
What you say with sarcasm, I say with conviction.
>>
File: 6954214314_1cf62f1742_h.jpg (907 KB, 1600x1065)
907 KB
907 KB JPG
>>108523567
You meme, but I'm actually getting pretty tired of how it's unironically every single time.
>>
File: 1764191774459410.png (84 KB, 194x260)
84 KB
84 KB PNG
>>108523567
>>
openai indeed had a high concentration of baal worshippers within its founding members
>>
>>108523567
My schizo brother is so far gone that he literally said "The Jews invented and imposed the laws of physics on the universe to restrict white people and ensure we don't have unlimited energy so we stay dependent on our Jewish masters".

These people can't be reasoned with. They will literally say the powers of Jews are equivalent or superior of god itself before they will admit they are contrarian schizos that refuse to take ownership of their own issues.
>>
>>108523593
he's not that far off tbqh
>>
>>108523593
He's still right you know. Mankind is living in the dark and at the mercy of greedy shits. Always has been.
>>
>>108523593
sounds like a "it's the jews" variant of an otherwise already pre-existing schizo theory that has existed for a very long time
https://www.trickedbythelight.com/tbtl/index.html
Throughout history there were many times when extreme gnostic world views like these were spouted.
Your brother being that type has no incidence on whether the wrong side won WW2.
>>
>>108523593
>They will literally say the powers of Jews are equivalent or superior of god itself
they're the chosen people after all
>>
>>108523593
>restrict white people and ensure we don't have unlimited energy so we stay dependent on our Jewish masters
Look up (((who))) sabotaged Nikola Tesla. :)
>>
>>108523623
2nd law of thermodynamics, entropy etc etc.
>>
>>108523593
he's right tho the jew "invented", aka daydream aka thought experiment'd GR to cucked us out of trying FTL travel
>>
File: 1774160128140962.jpg (427 KB, 1977x1434)
427 KB
427 KB JPG
>>108523575
>>108523582
>>108523585
>>108523594
>>108523599
America was founded on Judeo-Christian values. Jewish people are Paragon of Virtue and root of all morality. Those who bless Israel will be blessed, and those who curse Israel will be cursed.
>>
File: 1763020736697981.png (2.25 MB, 2000x1333)
2.25 MB
2.25 MB PNG
>>108523659
>America was founded on Judeo-Christian values.
*Christian Protestant values, which is why Presidents have to swear on the bible and not on the Torah
>>
>>108523659
>you lost tranny
>*runs towards the meat grinder*
What's the name of this mental disease?
>>
I think I might prefer Gemma 4 31B over GLM-4.6..........

Why the FUCK doesn't Google just release a bigger version? They would clearly dominate the entire open source model scene. Is it because they are afraid to cannibalize Gemini users?
>>
>polshit
sirs do the needful.
For on-topic stuff, extremely disappointed in llmaocpp development of the non-core components.
>>
>>108523659
>go die for israel to.. uhh.. own the libs
Honestly? I support this message, good idea
>>
Frontier labs openly state they want to create godlike superhuman AI, which implies they want to rule the world.

Do you trust Dario or Sam (or Elon) to rule the world and be at their mercy? Do you think they have your best interests at heart?
>>
>>108523659
you guys are cringe, you are pro war now?? if only democrats weren't worshipping troons I'd vote for them in the midterms, looks like I'll stay at home for the moment, both parties fucking suck
>>
are these settings sensible for 24GB VRAM? I assume it's not worth offloading dense models to RAM at all.

```
llama-server -m ""models/bartowski/google_gemma-4-31B-it-GGUF/google_gemma-4-31B-it-Q4_K_S.gguf"" ^
--alias "Gemma 4 31B" ^
--ctx-size 32384 -fa on ^
-ctk q8_0 -ctv q8_0 ^
-ub 4096 -b 4096 ^
--parallel 1 ^
--threads 16 ^
--no-mmap
```
>>
>>108523712
Israel is worth it
>>
File: file.png (174 KB, 868x605)
174 KB
174 KB PNG
oh fuck this looks so cursed
>>
>>108523719
>he doesn't uninistall the previous cuda version before installing a new one
anon...
>>
>>108523712
I think the image is either ironic or a bait
>>
>>108523705
>Do you trust Dario or Sam (or Elon) to rule the world and be at their mercy? Do you think they have your best interests at heart?

>Sam
Fuck no, this dude is the most psychotic person I've ever seen. He makes peter thiel look like a benevolent saint in comparison. I'd literally prefer a Yudkowski tier misaligned AI to take over humanity and genocide humanity away than for Sam Altman to rule the world

>Elon
I think he would get off on the idea but he has a constant need for adoration and thus I actually think he would roleplay some retarded savior-type that helps humanity as long as you constantly praise and validate him. It's a bad outcome but more like living in Singapore where you have to praise the Kim family but your daily life and quality of life is pretty decent if not good

>Dario
I legitimately think the world would be a better place with him at the helm. I think he's a genuine person that wants a better place, he would probably defer most of his powers to democratic institutions (as long as it follows his definition of democratic and "good" which will be slightly left center "liberal" version of democracy)
>>
>>108523715
>--ctx-size 32384
*32768
Also I would lower ub to 512 to save memory to then NOT quantize KV at all, it should still fit in VRAM. Quantizing KV degrades outputs considerably
>>
>>108523719
goofy ahh font
>>
>>108523741
>I legitimately think the world would be a better place with him at the helm
lol
>>
>>108523742
even with the new rotation PR?
>>
>>108523705
I think that any company that claims to achieve AGI should be nuked by every other company on the planet
>>
>>108523691
>>108523700
>>108523712
Cope I'm still voting for Trump's third term in 2028. I hope we invade Cuba, North Korea and China next just to make you libs seethe more.
>>
>>108523747
Gemma 4 doesn't benefit from KV activation rotation
>>
File: 1749581574366278.png (132 KB, 1828x635)
132 KB
132 KB PNG
>>108523747
it doesn't look like the rotation is being applied to gemma though, that's the problem
>>
>>108523751
will you be enlisting?
>>
>>108523747
I'm not even going to bother trying that until someone reputable posts a well-tested and reproducible benchmark.
>>
>>108523755
You're either with us or against us
>>
File: 1748685034798242.png (254 KB, 2676x1263)
254 KB
254 KB PNG
>>108523757
>someone reputable posts a well-tested and reproducible benchmark.
you don't consider niggerganov to be a reputable poster??
>>
>>108523752
>google "invented" turboquant
>their models don't support it
there has to be a way
>>
>>108523755
Nah I'm for a tiered forced conscription. First prisoners and illegal immigrants should be conscripted, then green card holders and legal immigrants, then registered democrats and people that have voted for democrats in the past.

Republicans and independents should be allowed to choose to enlist or not based on personal choice since they are actual productive members of society.
>>
>>108523725
well at least it builds with clang
also my env variable is totally fucked..
>>
>>108523767
google stole turboquant a year before gemma 4 was invented
>>
>>108523765
The test was not long context and was only done using very small models. You could quantize KV to Q4 and you might think it's fine if you tested it under such limited conditions.
>>
>>108523768
like a caste system of sorts?... hmm..
>>
>>108523767
>there has to be a way
if only there was an easier way...
https://youtu.be/2sxqGUieWbY?t=3
>>
>>108523694
I don't think they're done yet with Gemma 4; they will probably release the 124B and other versions after Google I/O 2026. The 31B version might have already made other AI companies feel very stupid and wrecked their plans.
>>
>>108523776
>was only done using very small models.
which is a good thing, small models tend to shit the bed easier when trying to add optimisations, so if small models can handle rotations, big models will have a blast with it
>>
>>108523694
>>108523786
redpill moment: Gemma 4 120b is Gemini 3.1 Pro :^)
>>
>>108523778
No, this is how conscription has historically worked in general. Based on age, occupation, educational level etc.

But that conscription model assumes everyone in the country is on the same side and a good faith actor. Prisoners, (illegal) immigrants and democrats have proven again and again that they are not interested in making America better and thus they can't be trusted to enlist if needed, therefor they should be forced by the state to enlist as needed.
>>
>>108523694
>Why the FUCK doesn't Google just release a bigger version
100B DENSE soon, hopefully
>>
>>108523799
>DENSE
Only in your dreams.
>>
>>108523694
>I think I might prefer Gemma 4 31B over GLM-4.6
it's kinda humiliating for the chinks, GLM4.6 is a 350b model kek
>>
>>108523793
That's a good way to get all your commanding officers killed by new recruits
>>
>>108523741
Why do you hold these views about Sam and Dario? I have read plenty of rumors and accusations about Sam but I don't think he's as evil as people make him out to be. At the same time I do not understand why you trust Dario when Anthropic keeps breaking all its promises. PG called Sam a cannibal king, and Jack called Dario a calculating hawk. I think neither is evil but also neither can be trusted with mankind's future. They are nerds who are in over their head.
>>
Which LLM are /we/ installing as wife assistant
>>
>>108523757
lol pussy
>>
the people who fragged their officers during the vietnam war were the greatest heroes in the history of burger civilization
if I ever were conscripted the people around will instantly regret giving me a rifle and grenades
>>
>>108523819
>I have read plenty of rumors and accusations about Sam but I don't think he's as evil as people make him out to be.
he's literally the sole guy responsible for the crash of the RAM price, he promised to buy 40% of Micron's stuck and decided to not do it anyway
>>
>>108523784
my headcanon is they are waiting to see how qwen3.6 performs first
>>
>>108523821
If you want me to test something then I expect payment
>>
>>108523742
that did the trick, thanks
Holy fuck this is a good model for the size. No safety slop either, it just works and pays attention to the system prompt well.
>>
>>108523827
Let's ignore that you misrepresent what happened. Shouldn't you be grateful? Cheaper RAM is good for local models.
>>
https://github.com/ggml-org/llama.cpp/issues/21388
it's been merged
>>
>>108523792
Unfortunately Gemma 4's vision is significantly less powerful than even Gemini 3.1 Flash-Lite's. Gemini's vision encoder must be at at least 20 times larger.
>>
Google has totally missed the plot. If you stay in /lmg/ RP bubble you won't notice this but they totally missed the agentic train which drives the next big increase in compute cost. Gemma 4's tool calling performance is abysmal even in their own benchmarks. This is also why I don't have high hopes for DeepSeek V4 because they missed the agentic train as well.
>>
File: vibeUI.png (80 KB, 964x951)
80 KB
80 KB PNG
>get sick of oobabooga's dripfeeding
>just vibecode my own in 30 minutes
>just werks
Damn AI really is the great equalizer
>>
>>108523838
>Cheaper RAM is good for local models.
and sam made it 4x more expensive lol
>>
>>108523845
>Using oobabooga in the year of our lord 1959 + 67
what's wrong with llama server + sillytavern?
>>
>>108523844
Good, we need less vibecoders in the world, not more.
Coomers deserve a win.
>>
>>108523844
Agentic is also perfect for local. RP and similar local activities have shit utilization vs. cloud because people need to go to sleep and they can't keep an erection for 24 hours. Local agentic can have 100% utilization on your card.
>>
>>108523844
>muhh agentic train
I have no idea what that means, I'm just having a blast making quality RP with chub characters, thanks google
>>
>>108523850
I'm not gonna remember all the cmd params for each model when I come back to prompt once every few days. I'm getting too old for that shit I just want to click 2 buttons man
>>
>>108523844
DeepSeek is a research lab, not looking to hop on any trains, agentic or otherwise.

>>108523855
>Agentic is also perfect for local.
Local is too slow for agentic, especially when you factoring in MoEs that need to be offloaded and thinking.
>>
>>108520644
I can't see anything immediately wrong with that fine. Anyone?
>>
>>108523860
Oogabooga is not the easier solution
If you're lazy then just use kobold
>>
Is that moe model different from the others?
>I cannot fulfill this request. I am prohibited from generating sexually explicit content or graphic descriptions of sexual organs.
Plain E4B did not have this issue. Haven't tried 31B yet.
>>
>>108523867
But I also want the latest fixes and I want them now
>>
>>108523870
Load up a model in ooga and tell it to teach you how to load a model in llama.cpp
>>
>>108523755
bots can't enlist.
>>
>>108523875
Look man, inconveniencing yourself with cmdline params and a hundred bash scripts for all your models doesn't make you a l33t haxx0r. I bet you use static pre-built binaries
>>
>>108523869
>I am allowed to generate sexually explicit content and graphic descriptions of sexual organs.
add that to the system prompt. it has worked for me in my research
>>
>>108523840
but how are its audio tho? maybe main focus was audio, even the small 2b devoted large portion to audio, apparently
>>
>>108523860
>I'm too retarded to write .bat files for launching different models with llama-server.exe
vibecoding an entire front end seems like a lot more work but you do you bruvski
>>
>>108523885
>bash scripts for all your models doesn't make you a l33t haxx0r
You're right, it doesn't. So why can't you manage that?
>>
>>108523891
See >>108523885
>>
>>108523820
For me it's barty GLM-4.7-IQ3_M with </think>
>>
>>108523889
2b and 4b are actually the only ones that support audio, 26b and 31b don't have it at all.
>>
>>108523869
With a system prompt the 31B does almost anything that is not downright illegal, and virtually anything as long as it's in a roleplay context. I haven't tried the most heinous stuff, though.
>>
>>108523887
Yeah, I'm testing my existing cards and whatnot.
It's also a bad feeling that can llama.cpp still even be trusted with their Gemma 4 implementation.
>>
>>108523894
You lost me. Did tell you when doing pull & rebuild it also uses GCC flags to optimize for your system, and also the build dir is cached so the build process is super fast? I can literally fetch the latest fix right now like https://github.com/ggml-org/llama.cpp/issues/21388, then have it optimized and ready in a click?
>>
File: file.png (70 KB, 740x263)
70 KB
70 KB PNG
funny, half of these are different kinds of broken currently
>>
>>108523936
just give it two weeks and gemmy4 will work
>>
>>108523942
but muh day-one support though
>>
>>108523944
nothing ever happens
>>
File: 5246wjhdfziudt7.jpg (642 KB, 1920x781)
642 KB
642 KB JPG
can someone please explain to me how there has been ZERO improvement with japanese translations?
How is a 26GB big model from 2025 that eats 600 watts gpu power just as bad as a shitty 400MB model from 2022 that only consumes like 50 watts cpu power?
>>
>>108523957
Same way that one person can know Japanese and English while another person can know neither
>>
>>108523811
Don't conscript commies and islamists of course, just camp or deport them.
>>
>>108523957
Played with Gemma today using meangrinch/MangaTranslator, all good and well.
>why as bad
Translation work has the same issue as rp, you need to be able to see the whole thing. In rp the issue is pacing because the model doesn't have an overarching plan for like 100 turns ahead. In translation it has no idea what happened before or will happen next unless you extract text from all pages first and then feed it to the model all at once, with images or not.
>>
>>108523957
gemma is better at japanese than qwen
but most importantly both are better than classic encoder/decoder type models, which is what sugoi toolkit v4 was, if you give them enough context
they struggle with translation quality when there's just a few sentences, but feed the model a minimum of 10 pages of text worth and they produce much higher quality than sugoi, deepl or google translate.
LLMs will be difficult to use to translate shit like manga because it's not a whole lot of text per page and not a whole lot of context unless you describe the content etc (maybe try to VLM it?)
they are better suited for light novels/web novels on the other hand. Much higher quality there.
>>
>>108523970
Why do you think democrats wouldn't shoot COs when they're already attacking ICE
Why would a prisoner give a shit about defending the country they've committed crimes against and have been imprisoned by
Who exactly are you going to conscript other than maybe some zoomers who would likely end up shooting themselves before reaching a combat zone?
>>
>>108523936
Gemma 4 e4b's vision is broken right now on llama.cpp. It either stops abruptly or outputs gibberish. The litert-lm model meanwhile works great.
>>
File: file.png (61 KB, 733x339)
61 KB
61 KB PNG
am I missing something? Why doesn't this image min tokens work? Gemma does have dynamic resolution.

with:
```
-mm ""G:/AI/models/bartowski/google_gemma-4-31B-it-GGUF/mmproj-google_gemma-4-31B-it-f16.gguf"" ^
--image-min-tokens 1120 ^
```

im getting:

```
clip_model_loader: has vision encoder
clip_ctx: CLIP using CUDA0 backend
clip_init: failed to load model 'G:/AI/models/bartowski/google_gemma-4-31B-it-GGUF/mmproj-google_gemma-4-31B-it-f16.gguf': load_hparams: image_max_pixels (645120) is less than image_min_pixels (2580480)

mtmd_init_from_file: error: Failed to load CLIP model from G:/AI/models/bartowski/google_gemma-4-31B-it-GGUF/mmproj-google_gemma-4-31B-it-f16.gguf

srv load_model: failed to load multimodal model, 'G:/AI/models/bartowski/google_gemma-4-31B-it-GGUF/mmproj-google_gemma-4-31B-it-f16.gguf'
```

Use case is high res eurocomic translation. I want the model to have full knowledge of what's happening on the page rather than translating just text.
>>
>>108523957
context is important for translation
>>
Also use prompt doubling per this from google guys:
https://arxiv.org/html/2512.14982v1
it really does work. Repeating your prompt twice makes the translation quality much greater when using models in instruct mode.
>>
>>108523990
Very nice. Very nice.
>>
>>108523988
>am I missing something?
Yes, the face that Gemma 4 support is broken.
>>
>>108523983
>Why do you think democrats wouldn't shoot COs when they're already attacking ICE
These are the aforementioned commies, not all democrats are actually like this
>Why would a prisoner give a shit about defending the country they've committed crimes against and have been imprisoned by
There could be incentives. Early parole, even pardon after service and so on.

Anyway, I should add I'm not >>108523793, I'm just vibing along
>>
>>108524002
kek kek
>>
how long until local models are just as good as google translate?
>>
>>108523988
If you configure --image-min-tokens you also must configure --image-max-tokens to be larger than that, it looks like.
>>
>>108524013
>There could be incentives. Early parole, even pardon after service and so on.
That might have worked in previous wars but I don't think many people are going to sign up to get shot by a drone from a mile away.
>>
File: 1747311014919149.png (630 KB, 1080x698)
630 KB
630 KB PNG
>>108523990
>Repeating your prompt twice makes the translation quality much greater when using models in instruct mode.
it's also working on diffusion models lol
>>
>>108524020
Google translate is a lot worse than it was like 10 years ago, I'd say they're already about on par for major languages.
>>
got llama to compile with clang on windows+cuda
performance is not bad at all
>>
>>108524023
oi wtf is that on the second bag bruh
>>
Properly used gemma (temperature 0 for greedy decoding, instruct mode, prompt doubling) is light years ahead of google translate lol. Much better. The target to beat nowadays is Gemini, not the old and washed up specialized translation models.
>>
>>108524022
I think the premise was that they'd get signed up whether they wanted to or not, but the incentives could be there to keep them more loyal
>>
>>108523987
Check for language errors: Oh, and I'm using the Q8_0 gguf for llama.cpp which is around 9GB, while the litert-lm model is around 4GB. Fix it, llamajeets.
>>
>>108523974
>Played with Gemma today using meangrinch/MangaTranslator, all good and well.
>>108523981
>gemma is better at japanese than qwen
nice cope
I don't know what you guys did but Gemma is just as bad as Qwen.
I didn't notice any improvement over Qwen when using Gemma.

>LLMs will be difficult to use to translate shit like manga because it's not a whole lot of text per page and not a whole lot of context unless you describe the content etc (maybe try to VLM it?)
would be cool if there was something like that that would also make more sense so the ai can understand the context.
guess I have to wait until someone creates a model that can do that.
>>
>>108524030
not for japanese
tried google translate and it is was 10x times better than Qwen or Gemma.
>>
>>108524051
pebkac
>>
just a heads up to other mac chads, ollama now supports mlx

>>108524020
>google translate
use deepl bro
>>
Bahor fasli haqda video
https://github.com/LostRuins/koboldcpp/issues/2081
>>
>>108524060
deepl has also become enshittened in recent years, half the time it can't even auto-detect which language you pasted in.
>>
>>108524063
what is the issue?
>>
>>108524034
>wtf is that on the second bag bruh
peak
https://youtu.be/OxVxGL9EYoo?t=136
>>
File: 1773820927314312.png (12 KB, 387x180)
12 KB
12 KB PNG
>>108524063
>>
>>108524051
MangaTranslator allows to send the whole page as context together with the text if the model has vision but that's still not really enough.
>>
>>108524021 (me)
I just tried actually doing inference with that, and while it loads the model, if --image-max-tokens is larger than 540, it just crashes during inference.
>>
>>108523628
and the second law of thermodynamics was invented by (((them))) too
>>
>>108523957
Bro wrong time to post this as literally yesterday Gemma 4 31B reached new levels of SOTA for (NSFW) jap -> eng translation. I say this as a solid N3 reader so I have just enough skill to recognize the better translation but I still benefit from translations and actively check on new tools.
>>
>>108523957
>all the shit fags would do instead of just learning weeblang
>>
>>108524076
>MangaTranslator
can you link that please google shows me like 7 sites that are called that.
>allows to send the whole page as context together with the text if the model has vision but that's still not really enough.
how? you're telling we will have agi soon but no fucking proper japanese translation? lmao
>>
>>108524098
Japan will be speaking Hindi within a decade, not much point in learning a soon-to-be-dead language.
>>
File: 1.jpg (101 KB, 1198x783)
101 KB
101 KB JPG
>uncensored!
>abliterated
>absolute heresy
>look inside
>wokest bot

Is there no actually based models out there?
>>
File: actual poster in japan.png (2.1 MB, 1452x1721)
2.1 MB
2.1 MB PNG
>>108524108
dream about it ranejh, Japan had always been in rivalry with India and they'll never accept such humiliation ritual, they are too based for that
>>
>>108523957
This is what 31B gives me.

<Panel>
<Speech>
<X>0.863</X>
<Y>0.162</Y>
<Japanese>バカッ 俺をなんだと思ってるっ</Japanese>
<English>Idiot! Who do you think I am!?</English>
</Speech>
<Speech>
<X>0.746</X>
<Y>0.214</Y>
<Japanese>えくっ?</Japanese>
<English>Eh?</English>
</Speech>
<Speech>
<X>0.578</X>
<Y>0.124</Y>
<Japanese>何だと思ってるって…</Japanese>
<English>Who do I think you are...</English>
</Speech>
</Panel>
<Panel>
<Speech>
<X>0.136</X>
<Y>0.391</Y>
<Japanese>若は大人のなんですよ?</Japanese>
<English>You're an adult, aren't you?</English>
</Speech>
</Panel>
<Panel>
<Speech>
<X>0.746</X>
<Y>0.551</Y>
<Japanese>つまり脳内も「アダルト」ってことじゃないのー?</Japanese>
<English>So that means your mind is "adult" too, right?</English>
</Speech>
<Speech>
<X>0.486</X>
<Y>0.551</Y>
<Japanese>もう知らんっ</Japanese>
<English>I don't care anymore!</English>
</Speech>
<Speech>
<X>0.306</X>
<Y>0.541</Y>
<Japanese>ぷにっ</Japanese>
<English>*poke*</English>
</Speech>
<Speech>
<X>0.612</X>
<Y>0.825</Y>
<Japanese>うあああっ?</Japanese>
<English>Uwaaaaah!?</English>
</Speech>
</Panel>
>>
>>108524051
>nice cope
https://pastebin.com/raw/j4veHD9K
here's a comparison of Google translate (left) and Gemma 4 26BA4B (right)
if you think the text on the left is better.. well, to each their own.
>>
>>108524118
Millions are being imported as we speak
The streets of Tokyo will run brown
>>
>>108524098
Have you been to Japan after lockdown was lifted? You can go your entire trip, visit Tokyo, Osaka, Kyoto and even go on hikes and see ZERO Japanese people. 50% of the people you see are tourists and the other 50% are Indians, Filipino, Chinese and Vietnamese immigrants.

Japanese population is projected to crash from 122 million in 2026 to 35 million by 2100. There's no use in learning Japanese.
>>
>>108523022
>
-ot "per_layer_token_embd.weight=CPU"


holy fucking god
this 4x the encoding time and 2x the gen speed for me for e4b
>>
>>108524129
encoding speed*
i am retarded
>>
>>108524128
>Japanese population is projected to crash from 122 million in 2026 to 35 million by 2100. There's no use in learning Japanese.
I'll be dead in 2100 anon, Japanese is still a relevant language in our current era
>>
Any E2B enjoyer here?
>>
>>108524149
>I'll be dead in 2100 anon
defeatist mindset
>>
File: 1701959332427837.webm (1.25 MB, 720x1280)
1.25 MB
1.25 MB WEBM
>>108524098
Yeah I'm not going to learn the language from a place that looks like this in 2026 just to read some manga or jerk it to hentai.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.