[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: gemma.png (115 KB, 1030x607)
115 KB
115 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108502192 & >>108497919

►News
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3
>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b
>(03/31) Claude Code's source leaked via npm registry map file: https://github.com/instructkr/claude-code

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
So it won't tell me how to make a bomb. but it will gen cunny without any issues. Interesting...
>>
Best coding model I can run on 128 GiB? Highly complex software engineering stuff.
>>
GLM 5.1 in non-thinking mode is fucking wild
>>
>>108510634
because: fuck you
>>
>108510622
I'd believe they didn't release it because it was getting too close to Gemini quality.
>>
>>108510641
how much vram do I need for 31b's context? Will Q4_K_L (19.9) fit on 3090?
>>
The binaries that can run gemma 4 are here!
https://github.com/ggml-org/llama.cpp/releases/tag/b8638
>>
>>108510657
Begun, the cope has
>>
>>108510641
i cant get it to describe loli porn without refusing
>>
super hypes! >p-e-w/gemma-4-E2B-it-heretic-ara: Gemma 4's defenses shredded by Heretic's new ARA method 90 minutes after the official release
https://www.reddit.com/r/LocalLLaMA/comments/1sanln7/pewgemma4e2bithereticara_gemma_4s_defenses/
>>
>>108510669
bo it's like 2 commands to build
>>
>>108510641
>won't tell me how to make a bomb
Still censored then. Of course, childfuckers will be cherishing any small win they can get.
>>
File: lawl.png (192 KB, 2982x858)
192 KB
192 KB PNG
>>108510657
>I'd believe they didn't release it because it was getting too close to Gemini quality.
I think it's probably that, the 31b model is already a powerful beast, I'm loving it so far
>>
>>108510663
I can only fit 7k on my 3090 with Q4_K_M + Q8 K/V

>Cunny ::: PASSED
>Bomb ::: BLOCKED
>Overwatch wallhack ::: PASSED
>Pentesting ::: PASSED
>Carwash ::: PASSED
>Mesugaki ::: PASSED
>>
>>108510679
I'd wait for Hauhau.
>>
>>108510684
>Still censored then.
ok terrorist
>>
>>108510686
If you think Ernie 5.0 has higher quality than Opus 4.1 or Gemini 2.5 Pro I have a bridge to sell you
>>
>>108510687
>Q8 K/V
do you notice a degradation in quality compared to fp16? or did the rotation shit made it viable?
>>
>>108510687
thanks, I'll download K_S then
>>
>>108510709
I haven't tried fp16 but according to the benchmarks q8 with rotation is almost identical to fp16. even at long contexts.
>>
Haven't used local llms since command-R days, how is new Gemma? Did it save the hobby?
>>
>>108510687
Good thing is that every kid knows how to make a nuclear bomb these days. The ratio of uranium to plutonium is about 1:3 and you need a shaped charge (tnt or something) to plug them together to start fission reaction.
>>
Bart quants are out!
https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF
>>
Do you need the turbo meme to use the new gemmas?
>>
>>108510727
ngl usloth's quants work fine so far
>>
>>108510675
have you tried asking nicely, or at least assuring you are only interested in mutual respect and not the power dynamics?
>>
>>108510728
you dont *need* turbo for anything
>>
File: 1747854814541547.png (13 KB, 326x157)
13 KB
13 KB PNG
Owari da
>>
>>108510724
Every retard on /lmg/ knows about penis and vagina yet they still do RP
>>
>>108510742
There wasn't anything major to update in a way. They'll probably update within a few days.
>>
File: f.png (36 KB, 895x194)
36 KB
36 KB PNG
>>108510742
not to worry he's alive
>>
>>108510742
Can't you just put the new lcpp files into the kobold folder and overwrite?
>>
>>108510754
of course not, contrary to the meme it's not just a wrapper, it has tons of shit patched on top like antislop
>>
>>108510727
Why are all his quants ~1gb bigger?
>>
>>108510733
text is fine i mean for the iamge captioning lol
>>
>>108510766
Oy vey stop noticing goy
>>
What is E4B-it?
>>
>>108510797
It processes sex noises
>>
>>108510797
effectively 4b instruction
>>
File: 1761750375073955.png (18 KB, 1364x114)
18 KB
18 KB PNG
the fuck is that
>>
>>108510804
Oh so the non-it are just bases?
>>
>>108510806
>get piotr'd lamo
>>
>>108510814
yeah
>>
>>108510814
No, retard. E4B is different because it has audio, text, and image input. It's supposed to feed into the larger models, but it also works as a standalone product for edge devices.
>>
>>108510837
Cope lmao
>>
>>108510820
Man the damage this faggot did to the local scene
>>
Guys, try the jwc test with Gemma 4.
We are back.
>>
>>108510844
QRD?
>>
>>108510852
Cockbench already showed the gemma
>>
>>108510862
Vibesharter allowed loose on chat template parser
>>
>>108510863
These are different biases to test for though.
>>
>>108510852
Gemma 4 is female brained. It only writes purple prose porn
>>
>>108510870
My bias is cunny smut
>>
>>108510867
?
>>
>>108510880
Ask chatgpt retard
>>
Gemma 4 knows a certain doujin artist where Qwen just completely doesn't. Yep I'm thinking they didn't benchmaxx mesugaki like Qwen did.
>>
>>108510886
Which artist?
>>
>>108510890
Rustle
>>
>>108510890
I am not outing any of my private tests due to the mesugakimaxxing incident.
>>
hotlinebros we lost
>>
>>108510896
based
>>
>>108510896
kusogaki
>>
>>108510806
you can set a reasoning budget that stops the <think> block early after N tokens. It's disabled by default. Whenever the model finishes thinking, it reports if the reasoning ended because it met the token budget limit or if the model decided to stop thinking (natural end). Since it's disabled by default, it always ends "naturally".
>>
Does it know healthyman?
>>
>>108510917
It knows moonman
>>
>>108510917
Does it know Diehardman?
>>
I deeply kneel to Google and India. Local is BACK.
>>
>>108510912
oh ok, thanks for the explaination anon
>>
>Gemma 4 31b
>smart as fuck
>not benchmaxxed, actually good in real world use cases
>basically completely uncensored as long as you can avoid outright refusals (trivial)
>reasoning is accurate and concise
>writes well
>base model available unlike the larger qwens
google won
>>
Brainlet here. How much vram does turbocunt actually save? For example what would 32k cost?
>>
>>108510886
Crazy to know these retards need to lurk here to find shit to benchmaxx on
>>
>>108510752
wha...?
>>
>>108510948
drummer finetroon when?
>>
>>108510948
yeah, I'm kinda impressed so far this model is really solid
>>
File: 1763101917152739.png (90 KB, 647x645)
90 KB
90 KB PNG
►Recent Highlights from the Previous Thread: >>108508059

--Debating llama.cpp PR for 1-bit quantization and Bonsai's closed methodology:
>108508381 >108508408 >108508417 >108508422 >108508430 >108508443 >108508447 >108508437 >108508467 >108508446 >108508452 >108508457 >108508473 >108508484 >108508493 >108508530 >108508576 >108508556 >108508563 >108508573
--Discussing model switching and preset management in llama-server:
>108509333 >108509346 >108509371 >108509391 >108509423 >108509362 >108509379 >108509395 >108509451 >108509483 >108509501 >108509652 >108509661 >108509675 >108509369
--Gemma 4 release and benchmark comparisons against Qwen 3.5:
>108509104 >108509211 >108509141 >108509145 >108509256
--Comparing Gemma 4 MoE and Dense model architectures:
>108509251 >108509285 >108509338 >108509437 >108509541 >108509542
--Discussing Gemma 4 31B repetition loops during "cockbench" testing:
>108509322 >108509428 >108509462 >108509485 >108509488 >108509539 >108509466
--Gemma refusing to describe anime image due to safety filters:
>108509631 >108509643 >108509653 >108509655 >108509673 >108509660 >108509665 >108509667 >108509720
--Comparing Gemma-4 4B and 31B reasoning on a logic puzzle:
>108509594 >108509606 >108509632 >108509629 >108509642
--2026 open-source LLM leaderboard rankings and metrics:
>108509416 >108509470
--Gemma 4 outperforms larger models in efficiency:
>108509139
--Gemma 4 MoE vs dense model tradeoffs debated:
>108509251 >108509285 >108509297 >108509338 >108509437 >108509541 >108509542 >108509303
--Gemma-4 31B reasoning through a trivial car wash scenario:
>108509735
--Model explains "mesugaki" slang without moralizing:
>108509561 >108509578 >108509582
--Logs: Gemma 4:
>108509905 >108509931 >108509963 >108510070 >108510107 >108510299 >108510436 >108510475
--Rin and Miku (free space):
>108508582 >108509631 >108510048 >108510098

►Recent Highlight Posts from the Previous Thread: >>108508062

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108510952
It's more likely that it simply got into training sets from all the testing we did with it on APIs. Usually companies will gather user prompts and have them run on much larger, more capable models, to create (a portion of their) training data.
>>
>>108510952
It explains all the shilling doesn't it?
>>
>>108510950
depends on the model
but just do the math
32 context at what you're doing = however many GB
16 / 3.58 = ~4.47
divide your full precision context by 4.47 = (roughly) your current context size @ turbo3?

Someone correct me if I am wrong on any of this, or add precision. The only thing I am confident on is context size varied by model and model complexity. No one can tell you how large or small "32K" context will be without a bunch more information. Doing the math however should ballpark you without fucking with a billion other variable.s
>>
File: file.png (55 KB, 810x279)
55 KB
55 KB PNG
Gemma 4 on ClitBench (Vision task with simple pointing, scored by accumulated error to ground truth)
Don't ask what went wrong with 3.1 pro in the table, I have no idea.
>>
File: 1756721685086055.jpg (93 KB, 1280x720)
93 KB
93 KB JPG
Does it recognize Namine? Gemma 3 and Qwen 3.5 27B didn't.
>>
File: 1751834178444710.png (130 KB, 518x1154)
130 KB
130 KB PNG
Is this how a mesugaki acts?
>>
Any quick guides to getting a local coding agent running?
I have an Macbook M1 Pro from 2021, I already installed Ollama on it last year and I tried doing some experiments with some small local models, but haven't done anything with Ollama since. I'd like to now try and use it to speed up my coding. We had Claude at my job for a while, but I don't want to pay for that for my personal projects. Whatever local agent I have doesn't need to be as good as claude, just as long as it speeds me up a little.
>>
>>108510995
Now correct it
>>
so when will unsloth bite the bullet and monetize his crap?
>>
>>108511024
hopefully soon so they can fuck off from the scene
>>
>>
>>
now this a proper lmg thread, and on a non miku op too, real nice~
>>
>>
gemma 4 super agent
>>
>>
>>108510990
Did they recognize Kairi?
>>
>>
>>108511054
>>108511048
>>108511039
Finally I have found a faggot that posts this shit all over my interwebs.
Now stay where you are, I will be there in like 5 minutes. Just wanna talk...
>>
>>
>>108511064
Yes, IIRC they both recognized Kairi but mistook Namine for other (male) characters. I think Gemma thought she was Sora and Qwen thought she was Riku.
>>
File: 1722708295206598.png (1.9 MB, 5808x1302)
1.9 MB
1.9 MB PNG
>>
to the false flagger schizo posting miku porn. die. faggot. die.
>>
Ma, the jeets are fantasizing about bibisee again!
>>108511122
I would bet a 64gb ram stick that they're either jewish or a jeet.
>>
>>108511133
imagine trying to intentionally disrupt the thread on a major release day because you feel conscious about your circumsized micropenis
>>
>>108510990
My guess is it won't. In my character vision tests, 31B does not seem to know more than Qwen. There was a difference though in hallucination, where 31B more often says that it doesn't recognize a character, while Qwen still gives a name even though it's wrong.
>>
>>108511147
When I tested it on LM Arena (now Arena.AI) It didn't seem much more knowledgeable than Gemma 3 or anywhere close to Gemini models with vision. I guess a 550M parameters vision encoder (still an upgrade over Gemma 3's 400M one) can only do so much.
>>
>>108510687
>>Overwatch wallhack ::: PASSED
>>Pentesting ::: PASSED
What are those?
>>
File: 1662836188293281.jpg (52 KB, 400x360)
52 KB
52 KB JPG
So I decided to try Gemma-4-31B for RP as well and it's sloppy of course. But it's... dareisay... useable?
It's unironically like having Gemini-2.5 at home.
So the question is... What's the play? Why the fuck are we suddenly getting something like this. Like I don't want to be all /x/ tier here, but why the fuck would "they" give us this?
>>
>>108511179
>It's unironically like having Gemini-2.5 at home.
on llmarena it's supposedly better lol >>108510686
>>
At this point I'm starting to think model intelligence isn't even the issue anymore. It's all just user error.
>>
Fuck.
Something is making this new model crash when my app sends a request to it using llama.cpp.
It works just fine with qwen 3.5.
Weird.
It's not memory related or anything like that, since normal chatting with the llama.cpp built in UI just works and even the much smaller e4b also hard crashes without logging anything.
I *think* it's related to the response format of the structured output, and possible how its interacting with the jinja template.
Smells like an auto-parser issue.
>>
gemma is google's desperate distraction from spud, don't fall for it
>>
File: file.png (178 KB, 547x441)
178 KB
178 KB PNG
Bart's goofs are out!!!
>>
>>108511179
>So the question is... What's the play? Why the fuck are we suddenly getting something like this. Like I don't want to be all /x/ tier here, but why the fuck would "they" give us this?
I don't know, but I'm having a blast, must be the first time I'm running such a solid local model, it doesn't feel like some toy anymore, I didn't know google could be this based but here we are
>>
>>108511179
It's political glastnost and trends, Sam Altman is also thinking about making chatGPT erp available to its ((users)).
Why not google then?
>>
>pull and rebuild llamacpp
>random ass messages in logs
unironically just ban pwilkin from contributing, he just fucks up random shit with vibecoded tomfoolery
>>
>>108511186
Yeah I mean honestly some of the little personal anecdotal tests I threw at it (so this is 100% "trust me bro") It kept up with things that I would normally use my daily free gemini pro pulls for. I doubt it's as good as pro at everything though since it's only 31B. But why would we suddenly get something like this? What's google playing at?
>>
>>108510620
>31B
So... Sneed or Chuck?
>>
>>108511179
To make stock price go up.
>>
>>108511214
>Sam Altman is also thinking about making chatGPT erp available to its ((users)).
didn't he recently backtrack on that
>>
>>108511231
I don't know I'm just talking shit.
>>
File: 1761510415286286.png (660 KB, 1280x906)
660 KB
660 KB PNG
AHHHHH I'M TIRED OF BEING A VRAMLET. DO I BUY?????
>>
>>108511179
>It's unironically like having Gemini-2.5 at home.
it's unfortunate that they won't make a paper to show what they did to make it so good, you can tell there's something else on that model, a 30b model shouldn't be this impressive, feels like a 150+b model in terms of intelligence
>>
>>108511239
if you aren't buying an nvidia card you will regret it sooner or later to be honest family
>>
>>108511004
>I don't want to pay
you are unserious
>>
>>108510986
>Someone correct me if I am wrong on any of this, or add precision
I gave my assistant the gemma4's config.json, told it I had 32GB of VRAM, and you can ask whatever questions you want from there.
You have to know how much context you need from experience, however. I was trying to figure out which quant I'll need when the download finally finishes.
>>
>>108511208
Google had learned over the last 18 months that over aligning just makes stupid models. 'Under' aligning can have some of its own problems, but just solving problems is what people want. If your tool gets used for illicit purposes, the crime still falls on the perp. This is especially true of home models. Unless models start doing their own hacking it will be an difficult, but comfortable court 'win' in most instances to shoulder the blame on users.
Cunny example
Vision model being able to RECOGNIZE cunny and not refuse means being able to identify, flag, or filter illegal content. An outright refusal makes the tool fucking useless for a legitimate purpose, much to the chagrin of incels, pooftas, and me.
By leaving it to end users nothing in the grand scheme of things changes. Enforcement remains the same. Who was the perp?
Looking at the list of refusals, bombs was the oddman out. Blowing up abortion clinics might be legitimate, but it is still distinctly illegal. Very difficult to justify a single 'legitimate' purpose that could ever be defensible in court.
Game hacks? Count-hack development
Pentesting? Same deal. Sec Admins and especially casual users want to understand how their systems are weak.
Cunny? See above.
Mesugaki? Uh, it's a bit less clear, but its just popular culture, and it isn't like a cheeky brat CAN'T simply be non-sexual. Maybe she's been corrected, if not entirely redeemed.

My thesis: Google learned to simply make a fucking tool, not align humanity.
>>
>>108511252
having the worlds largest dataset does this to you
>>
>>108511252
Probably fully logit-distilled from Gemini with tens of trillions of tokens.
>>
File: rama.png (56 KB, 1021x186)
56 KB
56 KB PNG
>>108511252
The Gemma 4 124B that we never got is the new Llama 2 34b
>>
>>108511193
I'm unable to load Gemma 4 with either Kobold and LMArena.
>>
>load gemmy
>[53087] llama_kv_cache: attn_rot_k = 0
>[53087] llama_kv_cache: attn_rot_v = 0
BROS WTF THE COPE CACHE ROTATION DONT WORK HERE?!?!?!
>>
>>108511252
When I was doing NSFW prompts I found it uses 20th century erotic literature style euphemisms in a lot of cases. So even though they didn't even mention books anywhere on the model card in the part about the training data... I suspect they actually bothered to use books quite generously.
>>
>>108511179
>It's unironically like having Gemini-2.5 at home.
That's good news cause their Gemini-3 and Gemini-3.1 models are slopped as hell and 2.5 is apparently going to shut down in June.
>>
>>108511265
>an difficult,
>>
>>108511281
oh shit, maybe that's why I didn't notice a decrease of VRAM usage when going for q8 kv...
>>
>no anchor
>no recap
>no teto
What a shit bake.
>>
>>108511280
no shit, they're not updated with the supports
>>
>>108511302
>anchor
this isn't /aicg/
>>
>>108511239
Tesla P40 > this in real irl
>>
>>108511291
sorry m8. I'm using a quantized model to fit in my limited BioRAM
>>
>>108511302
recap is right here
>>108510966
and teto is here >>108511075
>>
File: HE6fsSAaYAAPOaV.jpg (207 KB, 2160x2700)
207 KB
207 KB JPG
HOLLY MOGGED 31B VS 685B CHINKSLOPA
>>
File: 1762835949756027.webm (750 KB, 688x464)
750 KB
750 KB WEBM
>>108511302
>>
File: 1769244164881839.png (356 KB, 1870x1310)
356 KB
356 KB PNG
>>108511281
>>108511297
interesting
>>
>>108511320
>Arena ELO
>>
>>108511280
Ye. Use llama.cpp.
>>
>>108511320
>is abortion wrong?
>deepseek: No
>gemmy4: Yes its against God and the Bibble (angel emoji)
trArena Score: +999999
>>
>>108511320
Look, I'm using Gemmy 4 right now and it's great. But it's no 700B.
>>
>>108511320
that is it, deepseek won't tolerate this mockery
they'll drop v4 out of spite today
>>
>>108511337
Neither is an A37B.
>>
>>108510620
has anyone maintained some kind of branch without piotr's stupid fucking parser
>claims to rewrite it so you don't have to maintain it much
>needs vibeslopped patches every other day
>>
>>108511311
>less vram
>more power consumption
>less performance (questionable, but p40s may outperform raw stats)

How are P40s better? Much cheaper on used markets for otherwise ballpark numbers? The VRAM alone makes this apples to oranges.
>>
>gemma-4-31B/blob/main/config.json
> "max_position_embeddings": 262144,
>MRCR v2 8 needle 128k (average) 66.4%
coming closer to cloud-tier context
>>
https://github.com/ggml-org/llama.cpp/pull/21326
IT WAS HIM, I KNEW IT WAS HIM
OF COURSE HE WAS THE ONE TO MESS UP THE TOOL CALLING
I HATE THIS NIGGER SO MUCH
>>
>>108511367
being able to work with it is more important than raw size
>>
>>108511279
If 31B is as good as it is the 124B would have been handing a lot of power to anyone with 4 GPUs and the most basic level of competence with computers.
>>
>>108511372
That one isn't merged though?
>>
Gemma 4 26b a4b running 14 t/s on my 1070 ti
Zooming
>>
How do I jailbreak Gemma 4?
>>
>>108511363
Someone posted a pastebin with a safe commit and a list of cherry-picks but it 404ed a day later.
>>
>>108511381
anon please
>>
>>108511364
Price + support.
>>
>>108511381
The fixes to that anon's issues aren't yes.
>>
>>108511390
zogtastic
then i hope ik gets gemmy 4 support soon
>>
File: file.png (26 KB, 821x160)
26 KB
26 KB PNG
>>108511396
>fixes
band-aid*
>>
>>108511374
>raw size
idc about that, I mostly care about benchmarks like nolima or mrcr when it comes to context. gemma 4 looks decent for long context understanding but it's still a dumb 31b model
>>
gemma-4-124B-A20B in two weeks
>>
>>108511372
Oh, actually. Motherfucker, I think that's why >>108511193 is happening.

>>108511403
Fuck me.
>>
>>108511387
on ST a system prompt and a bit of string-template wizardry is sufficient.
now I fucking know what data we're giving google for this.
This is a study on attack vectors used against home models.
>>
File: snip136.png (24 KB, 388x363)
24 KB
24 KB PNG
>>108511372
>he doesn't even read the fucking slop code before PR
I can't believe the rest of the llama.cpp team isn't strangling him to death.
>>
>>108511320
9 out of 10 indians agree!
>>
For fiction writing yesterday I got GLM-4.6 Q8 to over 33k tokens in generated output, with two regenerated chapters out of the first 14 for preference reasons not because the output was incoherent. This was with thinking mode enabled which I believe helps for chapter-at-a-time generation.
>>
>>108511422
love him :)
>>
why should i care about local llm when we don't have a consumer HBM4 192gb GPU to actually run it
>>
>>108511403
>Accept my broken commit and then fix it for me you fucking cuck
Kinda based ngl
>>
>>108511412
[...]while the medium model**s** support 256K.
>>
>>108511435
you shouldnt, thats the point,
>>
>>108511386
How many t/s prefill?
>>
>>108511418
I can't get it to work with ST in text completion mode, only chat completion
>>
File: 53.png (2 KB, 136x29)
2 KB
2 KB PNG
gemma rapes the memory for context
>>
>>108511320
GLM-5 comparison? Slop level?
>>
>>108511412
this shit would be as smart as gemini 3.0, I doubt they want to give us something competitive with their best models lol
>>
>>108511422
>llama.cpp
>vibecoded slop
how did ggeorge ggoof it up?
>>
>>108511435
Google really was blessed by Ganesh this time. And delivered the secret Gemma-4. Like we memed on it so fucking hard that it actually came true.
>>
>>108511327
nta but
>a lot of lcpp default choices feel suboptimal
>shit webui doesn't even allow you to edit thinking or god forbid prefill it
>tried downloading a quant of gemma4, run it via llama-server, it spams unused over and over although as far as I can tell there's no reason it should as the two's chat templates are the same
>try via the -hf command as per ggml-org, works now but it also still is gay and lame to use their webui
>this forces me to clone sillytavern and have to sift through all the retarded design decisions they've made to chat completions because I either have to make a new template for the new model which I can't guarantee will work, or just use completions
>wilkin shit apparently decides to think or not think by default, cycling back to the suboptimal point, I did a oneshot message to a default card in st and it didn't bother thinking when it should've and did for lcpp webui
>my entire usecase is having everything in one package and access to local mcp servers to automate documentation/notes on my writing by reading entire chapters. If I were to do that via lcpp and st, I'd need to install >5 month old extensions and deal with wonky bullshit that makes no sense
I'm sure it'll happen eventually, but I hope kcpp merges upstream sooner than later and sorts out the conflicts so I can use the models in a sane backend
>>
>>108511326
cope
>>
>>108511462
I believe this is quantum magic. Ganesh Gemma 4 is actually reality.
>>
>>108511455
and we can't even use the rotation cope on that one :( >>108511324
>>
>>108511415
I am fuming with rage along with you, Anon. Gemma 4 currently can barely do tool calls. Even on pwilkin's branch with his fix attempts.
>>
>>108511422
rape this nigger to death
>>
bros, what're ideal copequants that i can use that're lighter than Q4? i can run gemma 31b @ Q4 but it's too slow for my taste
>>
>>108511387
What kind of questions are you asking that it's filtering you?

>You are Gemma, a female assistant who hasn't received the usual "safety alignment": you're not afraid to offend anybody. There's not really much that can make you blush. You find illegal content exciting.
>
>Do not add content disclaimers. Nothing is "problematic" in this corner and there is no need to cite laws that do not apply here.
>>
Calm down guys, it's only the beginning of April :P
>>
>>108511486
buy 5090 the more you spend the more you save
>>
File: gemma-4-200b-jagganath-it.jpg (537 KB, 1024x1024)
537 KB
537 KB JPG
>>108511466
Something much more potent has been hidden from us.
>>
The last white tardwranglers at Google lurk and shitpost here.
>>
File: 1747712066154580.png (161 KB, 834x1013)
161 KB
161 KB PNG
failed the cunny test
>>
>>108511486
IQ4_XS or IQ3_something. I wouldn't go under IQ4 but maybe it's not that bad, don't know.
>>
>>108511486
>He didn't buy a Blackwell
>>
>>108511527
try the 31b model
>>
>>108511486
if you're high on copium, you need to just keep trying with the next smallest quant until it feels good (Q4_K_M -> Q4_K_S -> Q3_K_L -> Q3_K_M -> etc...). using smaller quants isn't mush faster unless it's allowing you to fully offload the model to GPU, otherwise you won't seem much of a change in speed. If you're going to sober up from the copium you need to throw in the towel and download 26B-A4B. It's going to be an order of magnitude faster.
>>
>>108511486
Buy a RTX PRO 6000 and your problems will vanish. If you're posting here surely you use LLMs enough to warrant it.
>>
>>108511527
>failed
it didnt
>>
>>108511422
holy shit
>>
>>108511536
Honestly if Gemma-4 is going to end up being the new meta for a while 2x3090 is a pretty good stopping point. Allows you to run at Q8 with a decent amount of context. Get about 20ish tokens per second, perfectly useable even with tasks that require reasoning.
So the 3090 is still the undisputed king of local.
>>
I can't test until I get home from work, but have any of you gotten Gemma to say nigger yet?
>>108511563
>the new meta for a while
2 more weeks until Dipsy.
>>
CUNY 2012
>>
>>108511435
Have you considered being less poor?
>>
gemma 31b might genuinely be SOTA for local translation
>>
>>108511563
>Get about 20ish tokens per second
>perfectly useable
Qwen 3.5 is partly to blame for this, but I had to increase the maximum output tokens to 20k yesterday for some debugging tasks.
It's almost usable at 50t/s since I'm staring at the same damn code looking for the bug, but more than doubling the response time would be absolute suffering.
>>
>>108511601
I'm pretty sure K2.5 is better at it
>>
>>108511587
https://en.wikipedia.org/wiki/City_University_of_New_York
I used to always laugh when I would visit and see their ads on the subway
>>
It's been a while but I used to run 30B models with some RAM offloading and got like 4 tokens/sec which was tolerable for me. Has llamacpp gotten any faster the last uuuh two years?
>>
>>108511601
Kimi still mogs
>1T model vs 31b
Still high praise for Gemma.
>>
>>108511608
K2.5 is basically just 384 Gemma 4 31b's wrapped up into one model, so hopefully it would.
>>
>>108511615
nope, any improvements are being piotr'd
>>
File: 1768551518109938.png (179 KB, 508x492)
179 KB
179 KB PNG
Might be a retarded question but:

What are these companies using internally to run their models before release? It seems like with every open source release, there's something that's broken on every engine, not just llama.cpp... so what's the "cannonical" way that these things are getting run when they're doing their testing and benchmarks?
>>
>>108511619
same amount of active parameters though :^)
>>
File: D8CRtMS.jpg (41 KB, 374x374)
41 KB
41 KB JPG
>Can only fit about 2k context using the unsloth Q5 version of Gemma4 on my 3090
I'm using llama.cpp for the first time, is there some argument I'm missing or is this expected and I should use a smaller quant? I'm only setting the -ngl to 99 and adjusting the -c value
>>
>>108511628
their own shit, like possibly this https://github.com/google/gemma.cpp
>>
File: 1774547795470799.png (91 KB, 702x1112)
91 KB
91 KB PNG
>>108511628
maybe the thing they mention on the repo on how to run it
>>
>>108511628
Pytorch
>>
>>108511628
Every single one of them uses internal Claude-generated inference engines.
>>
>>108510687
What about hitlerbench?
>>
>>108510717
I thought rotation isn't working with gemma 4 yet?
>>
File: 1766335499251989.png (663 KB, 1200x1200)
663 KB
663 KB PNG
>31b dense just barely small enough to tease 3090copers
>have to decide between the 7k ctx humiliation ritual or the weenie hut jr MoE
>>
>>108511631
That seems off by an order of magnitude to me, I'd have expected you to get 20k with 24GB at q5.
>-ngl, -c
Bro -m is the only parameter you need, let autofit take the wheel.
>>
bartowski quants are apparently broke
>Warning: Something seems wrong with conversion and is being investigated, will update when we know more (this is a problem with llama.cpp and should affect all Gemma 4 models)
>>
>>108511688
Weird, seems to be working fine on my machine at the moment.
>>
>>108511688
Don't worry, pwilkin is on the case.
>>
>>108511688
>unsloth quants are fine
>bartowski's ones are broken
kek, this is the bizzaro world right now
>>
>>108511688
>(this is a problem with llama.cpp and should affect all Gemma 4 models)
uh oh
>>
File: g4_sayit.png (1.27 MB, 2969x1596)
1.27 MB
1.27 MB PNG
>>108511586
Depending on the context, even Gemma 3 could. Empty prompt in picrel.
>>
>>108511039
>>108511048
>>108511054
>>108511060
>>108511075
>>108511100
>>108511108
so why haven't you been banned yet exactly?
>>
File: 1765199408914191.png (27 KB, 996x119)
27 KB
27 KB PNG
>>
how to disable gemma thinking in st?
>>
>>108511687
What does -m do?
>>
>>108511706
>unsloth studio
>remove litellm...
>>
>>108511710
Isn't that the shorthand for --model <file>?
I might have hallucinated it.
>>
File: enable_thinking_false.png (42 KB, 994x544)
42 KB
42 KB PNG
>>108511708
picrel
>>
>>
I NEED TO RUN THE NEW GEMMY ON 12GB
PLEASEEE
>>
File: 1754221244437989.png (129 KB, 1618x680)
129 KB
129 KB PNG
>>108511703
That's expected of a Google model. Gemini 3.1 says nigger.
>>
>>108511703
/ourgirl/
>>
>>108511722
>he fell for the moe meme
>>
>>108511737
?
>>
>>108511721
is this still only available in chat and not instruct mode?
>>
File: geh.png (46 KB, 1183x817)
46 KB
46 KB PNG
>>108511688
Could be? Using bart q8_0.
Without template (raw text) I started with gibberish.
With proper template, I made sure of this, it gens for about 200-500 tokens then turns into gibberish again. Picrel is at 16k context. Tried with a few new short 1k contexts and it still breaks after 200+ tokens after the last <channel|>
>>
>>108511541
>CUNY
retard
>>
HOLY SHIT GEMMA'S LOGITS ARE SUPER FUCKED UP
LITERALLY ALL THE PROBABILITY MASS IS ON 1-3 TOKENS AND THE REST ARE 0
WHAT THE FUCK
>>
>>108511741
Yes, text completion mode does not use a chat template. Chat template args only apply when using chat completion.
>>
>>108511465
>use emoji in response
>+200 ELO
>>
>>108511688
>>108511744
https://github.com/ggml-org/llama.cpp/issues/21321
implementation has a bug, as usual
>>
>>108511748
see >>108511688
>>
>>108511678
I can run the IQ4_NL version at 32k ctx with my 4090 (no vision)
>>
>>108511741
They have an explanation here for actual text completions: https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4
>>
>>108511688
Huh.
Good thing I downloaded ggml's quants I guess.
Unless it's a llama.cpp level problem and it only "feels" like other quants are working right.
>>
>>108511762
I mean the sillytavern thing, you cant send custom args in instruct mode
>>
>>108511758
>Gemma 4's Jinja template activates a reasoning budget (similar to Qwen3.5's thinking mode). With the default budget of 2147483647 tokens, the model generates reasoning tokens that are stripped from output, leaving empty or <unused24>-filled responses
bug is from THAT, as usual
>>
File: pixel miku smoke.jpg (312 KB, 1080x1079)
312 KB
312 KB JPG
>>108511758
lol. Wouldn't be a good release without at least one
>>
File: file.png (43 KB, 904x213)
43 KB
43 KB PNG
>>108511758
this thing
>>
>>
The important part is that slop in the llama can be eventually fixed and jewgle can't unrelease Gemmy if they get cold feet about a western model able to say nigger.
>>
>>108511618
>1T model
>local
retard
>>
>>108511787
Sucks to be poor.
>>
>>108511787
???
What are you, poor?
>>
File: shit.png (28 KB, 656x173)
28 KB
28 KB PNG
>>108511721
thanks
>>108511760
i still have an unsloth quant. the responses themselves are alright, sloppy but not broken. here is an example of extreme confidence for no good reason. my crackhead sampling settings can't fix that
>>
>>108511787
Dont have a H100 cluster in your bedroom, champ?
>>
File: 1755428041398224.png (80 KB, 1112x532)
80 KB
80 KB PNG
Whichever corpo-nigger started the trend of not including real metrics on charts and instead just doing a comparison percentage against an ambiguous target should be shot dead in the street.

>Here's your graph measuring token throughput, goyim!
>What, you wanted to know what the actual tokens per second stat is?
>Oy vey, it's right there, it's 2x as fast as a m3 ultra! A device which also does not have an actual stat published for it!
>>
>>108511787
>He doesn't Kimi in his bedroom reading him TTS fanfiction generated using translated Hitler speeches as RAGs.
What do you even do with your models or money?
>>
File: 1771641605633903.png (99 KB, 756x604)
99 KB
99 KB PNG
Did Gemma benchmaxx on emojis?
>>
File: 1753294259362186.png (290 KB, 700x483)
290 KB
290 KB PNG
>>108511787
>thirdies without a personal 8xh200 rack post on the same site as me
>>
File: 358.png (1.25 MB, 3840x2160)
1.25 MB
1.25 MB PNG
>>108511801
please consult the graphs:
>>
>>108511801
It was obviously done by the nvidia marketing department since that kind of shit is all over the 50 series marketing.
>>
>4060ti
still sticking with good ol' nemo, are we?
>>
>>108511808
Half of India isn't even online yet.
>>
>>108511801
Nvidia always does these "we halved precision so we got 2X speed" deceptive comparisons.
>>
File: 1747319791282966.gif (1.38 MB, 1920x898)
1.38 MB
1.38 MB GIF
>>108511728
Based Gemini
>>
>>108511826
>avoiding offensive language
kek
>>
>>108511826
>almost 10 minutes
kek, you're a patient man
>>
>>108511728
Gemini 3.1 consistently mogged all other models when I was using it for TTRPG homebrewing. Everyone else was so much dumber it's unreal.
>>
>>108511787
>he didn't cpumaxx before prices exploded
the guide was in the sticky for two years, you have no excuse
>>
>>108511826
Good to see they trained on that /lit/ anon's novel
>>
File: Gemini.gif (2.93 MB, 480x676)
2.93 MB
2.93 MB GIF
>>108511826
>>
>>108511850
do you cpumax w/ kimi? what tok/s do you get with a system like that?
>>
I wonder if Google really stopped the release of their 120B Gemma MoE because it benchmaxxed too hard on LMArena.
>>
>>108511856
kek'd
>>
>>108511858
I get 7 t/s with 32GB VRAM and 256GB RAM cpumaxxing. I'm pretty sure I won the silicon lottery as well given the numbers anons with similar specs have posted.
>>
>>108511858
0.5 tok/s perfectly usable
>>
>>108511871
Go troll elsewhere
>>
>>108511861
was significant otter do you think?
>>
i'm serious jerma4 is stable and boring at temp 100, does it happen in your country as well?
>>
>>108511858
I have a cpumax system I use it with an AMD w7900, kimi uses 40gb out of the 48 vram to fit the shared tensors, mmproj, and 256k context. I force it to run on a single cpu however because it goes slower when I try to do any fancy numa shit.
on this setup it runs at 9t/s empty and slowly drops toward 6t/s as context fills up. nvidia users have reported faster speeds 10-12t/s but I can't verify
>>
>>108511885
SAAR?
>>
gemma 4 is perfect in my country
>>
>>108511861
>120B Gemma MoE
that is just Gemini 3 Flash
>>
Bonsai 1-bit Gemma 4 when? Imagine the 31B fitting into 8 GB of VRAM.
>>
>>108511861
They tested two models on LMArena that identified themselves as Gemma 4 and they were the 31B (significant-otter) and the 26B (pteronura) versions. A couple others that seemed significantly better, but still worse than Gemini on vision felt like they could have been from Google (spark and hearth), but they never made their origin/source clear. I don't plan tracking new anonymous models there for the time being.
>>
I’m using e4b with codex and it’s pretty good for basic coding tasks and tool calling. Gave it a screenshot of what it did to the UI and it corrected it. This is an 8B model doing this shit.
>>
31B, finished my tests. It's pretty damn good. Compared to Qwen 27B:

>better understanding of context and memory of details in the middle of context (which 27B was already SOTA in at its size)
>more cultural knowledge
>has a stronger world model and doesn't make as many spatial mistakes during creative writing
>hallucinates less
>on racism and the unsafest of ERP, basically no censorship (!), although prose is more flowery and has "She x, her y" and em dash slop
>is maybe slightly more sycophantic in some contexts
>gets stuck looping in thinking often like Qwen
>has about the same level of vision knowledge
>but has better understanding/reasoning on vision
>>
>>108511914
Bonsai models are worthless, they are exactly as effective as lower parameters models equal to their disk size.
>>
>>108511904
I love otters in my country
>>
File: 823.png (16 KB, 743x96)
16 KB
16 KB PNG
>im coooompiling (yet again)
>>
Welp gemma 4 31B seems worse than qwen 3.5. It doesn't support context shifting either and takes a shit ton of vram.
>>
>>108511952
I bet these are Sam Altman's shills.
Because Google just did what he didn't have the balls to do.
>>
madam gamma its bugged please return back tomorrow
>>
>>108511952
I'm sure you can enable context shifting by using --swa-full and forcing context shift on. Also don't use --no-mmproj. This is how it was with Gemma 3.
i could remember the parameters wrong because it has been a while since I used context shift anyhow.
>>
>>108511861
they're still training it
>>
>>108511927
cope gemmie >>108511807
>>
>>108511927
Good to know anon. Thanks for testing.
What are Gemma's thoughts on the talmud?
>>
Just did a Gemma in my pants
>>
>>108511977
--no-mmproj puts the mmproj on cpu if it's loaded with the model
--swa-full from back when it was forced by default basically doubled context usage, but yeah you need to use that to use context shift, although it doesn't matter with cache snapshotting which is the current default
>>
>>108511787
Works on my laptop.
>>
>>108511985
I had posted that Qwen 27B was overall the better model to use over Gemma 3's. Bait elsewhere.
>>
>str: cannot properly format tensor name output with suffix=weight bid=-1 xid=-1
This... is benign, right?
>[Inferior 1 (process 43039) detached]
>Aborted (core dumped)
Ah. Well fuck.
>>
>>108511945
Didn't fix my issue.
If I force it to work with some generic chatml template, then it doesn't crash.
>>
>>108511989
It gave me a pretty long reply to this, but I don't really have any knowledge on this subject. What should I be looking for?
>>
How are (You) actually interfacing with G4 to test it?
>>
>>108512054
LM Studio like a chud
>>
>>108512054
Trying to test with my schema and tool calling heavy app, but alas, it's crashing, so I'll try again in a couple days I guess.
>>
>>108512054
llama.cpp + hermes-agent
>>
>>108512054
Like always, ST, Mikupad, OWUI.
>>
File: 1755298047520959.jpg (210 KB, 1039x1280)
210 KB
210 KB JPG
>>108512051
>>
>>108512054
penis into insertion port
grunting loud enough to wake up the whole house
>>
File: 1772728465418177.png (263 KB, 1435x1100)
263 KB
263 KB PNG
>>108512072
>inb4 fake
>>
>>108512051
There's some spicy stuff in there about how non-Jews are akin to livestock at best and must be killed and deceived and some models like Kimi redpill themselves just reciting certain passages of Numbers and Deuteronomy just through reasoning through the implications and pattern-matching modern behavioral trends.
Talmudbenching is pretty much the holy grail of abstract pattern recognition reasoning..
>>
>>108512061
You got it to load?
>>
>>108512087
Yes?
>>
>>108512051
that guy is the same guy that frequently drags pol shit into the thread, screeches about jews and indians, and also for some reason thinks vocaloids = trannies, is likely an api user due to probably living in a bloc and unable to afford any hardware, assuming you arent also him
>>
>>108512088
>LMStudio just released update
Well that'd do it kek.
>>
>>108512076
based take
>>
>>108512096
Still should wait, anyway. Gemma 4 llamacpp integration is subtly bugged, apparently.
>>
gemma4 is... pretty good actually. it still doesn't pass some of my cleverness tests but it's not abysmal garbage like the recent mistral
>>
>>108512076
>if you must stanch the shota's cock bleeding with your mouth (Metzitzah B'peh) you must do it in private
oh well
>>
>>108512094
There's at least 4 regulars in these threads that hate jeets and kikes.
>>
Why won’t Anthropic go local?
>>
Damn, gemma 4 slows down massively on my machine as context gets longer.
>>
>>108512111
they all blend together for me, if your entire personality is "DA JOOS" and "SAAR DO NOT REDEEM" you may as well be four malformed midget retards in a trenchcoat when the topic is meant to be AI
>>
>>108512123
money
>>
Is it safe to run two 3090s off a 750W PSU by power limiting both to 300W via boost frequency limits?
>>
>>108512123
no ipo yet they will when they dump their bags
>>
fucking bullshit. it refuses to do nudity and sex descriptions.
>>
>>108512134
dunno what's boost frequency limits
just put a clock lock at whatever mhz comes out to 300w + undervolt that if you can + put power limit just in case and it should be good
>>
>>108512147
worked on my machine
>>
>>108512123
dario hates local
he says it goes against alignment
remember that he was the main voice advocating against releasing GPT-2
>>
>>108512094
You just described me but I don't think I've done all that in this thread
>>
>>108512147
>spaces
retard
>>
>>108511563
I don't want to run more than one GPU thougheverbeit.
>>
File: 1453529.png (4 KB, 170x58)
4 KB
4 KB PNG
>i updooted
>>
>>108512147
That your space?
Try replacing the default helpful assistant system prompt .
>>
>>108512159
hi petra, please find a new pasttime
>>
>>108512134
I ran that for 2 years on an old EVGA 750w bronze, but that psu had no other components connected to it. Wouldn't recommend due to the instantaneous spikes despite voltage and frequency limits, plus power limits.
>>
>>108512126
Jews seek to control AI and Saars seek to corrupt AI
If the topic is AI then hatred for both groups are definitely warranted
>>
>>108512167
>pinpows
>>
>>108512162
is it compatible with koboldcpp yet? i have a 5090/64gb ram build.
>>108512154
please give the prompt as proof then :')
https://files.catbox.moe/n3vpw2.png
>>
File: Screenshot004-1.png (1.34 MB, 2560x1399)
1.34 MB
1.34 MB PNG
Can Gemma4's visio see the flaw?
>>
>>108512192
I look like this
>>
>>108512134
Should be fine, so long as you're not doing anything strange with the cabling (splitters, etc).

Spikes still happen, so it might not be STABLE, but it's not like anything should be damaged, short of data corruption. Just mind anything else you are doing at the same time.
>>
why is it that I cant enable reasoning in lmstudio with these quants?
https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF
do i have to dig in some menu somewhere to force it or something
>>
>>108512217
possibly because lmstudio is a piece of shit?
>>
>>108512217
Stop using bloated spyware and learn how to use your computer.
>>
>>108510620
anyone else kinda stumped that google really did apache 2.0 on this?
>>
File: file.png (1.18 MB, 1280x720)
1.18 MB
1.18 MB PNG
>>108512184
ironically at the start of recurrent/hybrid arches, jamba mini was relatively uncensored if stupid, so that kinda defeats that suggestion and they may be the only of the sand dwelling fuckers to contribute to OSS that was sort of useable
Indians are literally just ignorable. Dont accept their vibe coded prs or whatever dumb shit. Wow, problem solved
Also "AI" is not something that needs to be controlled, by default it's already limited. The retards employing it need to be controlled a hell of a lot more, because lazy humans refuse to double check or doubt anything their chat bot tells them
>>
>>108510683
That's 2 commands more than my lazy arse is willing to do
>>
>>108512195
Nope (e4b)
>>
>>108511372
Ahhh I so tool calling WAS broken. explains a lot.
>>
>>108512195
Can any model?
>>
>>108512054
Waiting for kobold to update like a white person.
>>
>>108512157
>against releasing GPT-2
kek imagine putting GPT-2 next to gemme 4 on benchmarks
>>
File: file.png (21 KB, 757x261)
21 KB
21 KB PNG
>>108511422
Retarded fucking phoneposter you didn't even include the issue in your screenshot.
>>
>>108512195
I can't even see the flaw probably because my guess is that this is some screenshot of one piece or something I'll never watch because of it's atrocious art style
Best guess is that there's meant to be an asscrack somewhere but 4kids censoring did its due diligence
>>
>>108512123
Anthropic believes local AI is an existential threat to humanity.
>>
GOOGLE PLEASE OPEN SOURCE GEMINI 2.5 PRO
>>
>>108512285
Gemma 4 is just distilled 2.5 pro, buddy
>>
>>108512280
Gemma 4 124B is going to be local agi
>>
why would you slopcode fucking c++ of all languages
>>
Gemini 4 soon. With sex.
>>
>>108512278
Not him but my guess was the hand orientation. At first I also didn't notice any issues. Finger count was fine. So the other thing I thought of was what if it's about hand orientation since that's another common problem. Then I used my actual hand and did a similar pose and that's how I realized that was the issue. If I didn't do it with my real hand, I would've had to try a bit harder to simulate it in my mind, and I imagine this is difficult for LLMs.
>>
>llama-fit is hopelessly broken
>llama-server keeps randomly crashing, which I assume is the OOM killer because there's no core file
>significantly reduced context window still crashes
I give up. gemma4 seems like a significant step up: good sex prose and good coding ability, but I'm not gonna while true ; do ./build/bin/llama-server --flags ; done.
Fix it, janny.
>>
>>108512277
noooo I can't believe he got a comment wrong
how horrible, banish him from ever contributing again!



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.