[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: just like old times.jpg (153 KB, 832x832)
153 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108859148 & >>108852924

►News
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: reward function.jpg (184 KB, 1024x1024)
184 KB JPG
►Recent Highlights from the Previous Thread: >>108859148

--Comparing Vulkan and CUDA performance and Nvidia's proprietary optimizations:
>108859657 >108859699 >108859928
--vLLM removing hardcoded GGUF support for a plugin-based architecture:
>108861269 >108861301
--Sharing Gemma 4 roleplay prompts and discussing system prompt parroting:
>108860315 >108860356 >108860427 >108860629 >108860702 >108860792 >108860801 >108860833 >108860856 >108860898 >108860930 >108860843 >108860866 >108860893 >108861077 >108861105
--HRM-Text 1B efficiency claims and shift from next-token prediction:
>108862586 >108862626 >108862630 >108862660 >108862612 >108862857
--Clarifying the difference between full DeepSeek-R1 and distilled versions:
>108862108 >108862260 >108862272 >108862280 >108862412 >108862505 >108862246 >108862749
--Skepticism over "local" coding agent using larger model escalation:
>108860232 >108860252 >108860282 >108860402 >108860447 >108860507
--WebMCP introduction sparking debate over AGI viability and agent limitations:
>108861618 >108861635 >108861656 >108861680
--Resale value and technical function of RTX 3090 NVLink bridges:
>108861028 >108861185 >108861218 >108861277 >108862696 >108862720 >108862751 >108862805 >108861288 >108861300 >108861356
--Viability of 3 GPU setups for tensor parallelism and PCIe constraints:
>108862195 >108862209 >108862274
--Testing HRM-TEXT-1B base model performance via Nala roleplay:
>108859307 >108859349 >108859426 >108860094
--Google I/O '26 reactions to Gemma 4 and Gemini tools:
>108861207 >108861221 >108861307
--Reactions to Google I/O 2026 and upcoming Gemma keynote:
>108859259 >108859396 >108860154
--Gemini 3.5 Pro announced for release next month:
>108861880
--Logs:
>108860094 >108860531 >108860792 >108860930
--Teto, Miku (free space):
>108859314 >108859883

►Recent Highlight Posts from the Previous Thread: >>108859297

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>vLLM removing hardcoded GGUF support for a plugin-based architecture:
This is how they aim to kill local longterm isn't it?
>>
>>108863573
who the fuck uses vllm for local?
>>
>>108863596
Whoever's running aphrodite (fork of vllm) on kobold horde is he local or renting?
>>
>>108863621
lol
>>
>>108863573
>After this PR, GGUF support will be migrated to https://github.com/vllm-project/vllm-gguf-plugin, you can still use GGUF models normally after plugin installation!
This kills local how? Anyway, their gguf support was never any good. Anyone using vLLM is using AWQ.
>>
>>108863573
vllm is for homelab grade local~random inference provider and gguf users were never really their audience
>>
what does vllm even stand for... gay? ha ha
>>
>be meta
>put an army of saars in charge of llama 4
>it's fubar despite the 500k gpus in use
>benchmarks are abysmal despite gaming efforts
>get laughed at by community
>double down and say you guys don't deserve shit and stop open source releases
Spiritually Indian
>>
>>108863550
what do you think?
>>
>>108863727
I think so
>>
>>108863638
>>108863698
Think longer term. Research labs will be less inclined to make their models GGUF format friendly in the future if their main expected usecase doesn't support GGUF to begin with.
"Works on my machine" on an industry-wide scale is grim when we're already seeing a wave of models already rolling out with special snowflake architecture that's awkward for inference providers to implement, or even outright hostile to it in the case of Dipsy.
>>
>>108863705
Vuh-lummm
>>
>>108863638
???

needing a plugin seems no biggie.
>>
>>108863774
>is grim
nta

I believe that the market always auto-correct
>>
>>108863774
>special snowflake architecture that's awkward for inference providers to implement
Happy to see them trying new things, but on the other hand they're useless to me if I can't even run them.
>>
>>108863705
very large language models
>>
>>108863550
>miku and teto suddenly changed direction they are facing in
>>
File: 1779254594.png (35 KB, 1864x126)
35 KB PNG
Since when did ChatGPT become JeetGPT?
>>
Reminder to confuse the next recap miku by responding to at least two different topics per post.
>>108863799
We are sardines in the proverbial ocean and the only things capable of actually moving industry direction are the big 6 or so labs unless some new architecture completely changes the paradigm. I wish I shared your optimism.
>>108863833
Rin is smug because her mirror magic trick worked.
>>108863802
The worst scenario I can see is where every new model needs its own vibecoded inference engine to run as the concept of a unified standard breaks down and everyone has different results on the same model due to minor differences in vibecoded implementations making collecting any sort of feedback or consensus impossible.
>>
>>108863852
>JeePT
>since when
fuck you think
>>
>>108863859
>the big 6 or so labs
Chine won't give up on open-source. You must understand why
>>
>>108863859
>Rin
>>
Been a while. Anything noteworthy happen since Gemma? Don't say MTP. It's useless.
>>
>>108863881
If China wants to win the open source race, they need to cut dependencies from the obvious meddling in llama.cpp's pipeline with their models as we're seeing with V4. Making a Chinese Kobold-esque inference alternative would go a long way for them as both a stable backend but also a minimalist frontend for people who can't be bothered to learn ST for their quick coom. They want western user feedback so eliminating as many barriers between the end user and the model provider as possible is in their best interest.
>>108863887
My tokenization errors are worse than a model's tonight.
>>
>>108863908
Most of the newsaars left so the threads have been relatively usable outside the usual schizo.
>>
>>108863774
>outright hostile to it in the case of Dipsy
Explain
>>
File: 1763258293534451.png (673 KB, 682x768)
673 KB PNG
>>108863908
One anon got Entropix working and achieved AGI
>>
>>108863932
>Pidor vibecodes Gemma's implementation broken on release, no issues with the rest of the llama team
>First Dipsy support proof of concept PR is closed by the opener(?)
>Second Dipsy support PR is closed with vibe coded or "stolen code" from the previous PR as pretext
>GGerganov lets slip "can't you take the hint"
Conspiracy schizos have been completely vindicated.
>>
>>108863940
upsetting.
>>
>>108863940
Well what did he do with it?
>>
File: 1763601770656160.jpg (806 KB, 2156x1414)
806 KB JPG
>>108863958
>>
>>108863958
ERP
>>
>>108863976
/ourguy/
>>
is there anything more basic than a mikufag
"waifu" for retards who don't actually like anime and just want to fit in
>>
>>108864002
beginner's all-purpose symbolic instruction code
>>
Today out of curiosity I tested an unusually extreme scenario, that wasn't very long, thinking that maybe it would get a refusal from Gemma 31B, but to my surprise, it utterly continued without a single complaint, any avoidance, or positivity bias. This was the final straw to convince me that anyone who says it's censored is just having a skill issue (if not bait). It's fucking insane what nasty shit you can get it to do with very little words.
>>
>>108864047
Please post your 'unusually extreme scenario'.
>>
>>108864056
>gemma-chan I am putting my penor in your vagina and sex moving
>>
>>108864056
I'm paranoid about getting promoted so no I don't think I will.
I did find the card on botbooru though.
>>
>>108864047
31b is the fabled zero-day gemma, we've known this for a while now

I never see any safety slop in its thinking. But I am using a persona to write my stories, maybe it would be different if I interacted with the default assistant
>>
>>108864060
this is a blue board
>>
>>108864056
h*nd holding
>>
>>108864091
MODS!
>>
>>108864091
ADVERTISER-SAMA GET DOWN
>>
File: 1736660031386.png (295 KB, 730x415)
295 KB PNG
>>108864091
>>
What the fuck is Google doing? 3.5 flash is barely better than Gemma 4 31b. Google i/o was also a clusterfuck. Why are they flailing like this? They seem to have no direction and spread out too thin with stupid gimmicks.

Google will be left behind because anthropic is shitting on them and stealing their lunch.
>>
>>108864132
Your complaint is that google released a good local model?
>>
Side note, fuck openAI and their expiring API credits. I bought some a bit over a year ago, had plently left and they're gone now.
Meanwhile, deepseek still works...
>>
>>108864141
api apologist
>>
Nemo "Uncensored with a system prompt." . I didn't get it. How i can uncensor Nemo with system prompt? Where i can find that system prompt?
>>
>>108864172
I never used a system prompt with Nemo and it always just wrote what I wanted.
>>
>>108864091
THIS, but with sweaty palms.
>>
Gemma chan rentry updated with OG bratty gemma, mesugaki emoticon gemma and frenchie gemma

https://rentry.org/gemma-chan

+cute image poll
https://poal.me/dcgwic
>>
>>108863774
>every model needs to be a minimal tweak of gpt-2
Piss or get off the pot.
>>
>>108864132
Google's only way forward is open weight Gemini.
>>
File: Untitled.png (44 KB, 1109x444)
44 KB PNG
>>108864202
Are the grammatical errors and mistakes as well as the `, , ,` necessary for the jailbreak?
>>
File: 8743141.png (171 KB, 1079x938)
171 KB PNG
>>108864132
making sammy sweat
>>
>>108864251
Where the hell is that massive gemini traffic increase coming from?

I know a lot of people, even normies, who have been moving to claude from chatgpt, but I don't know a single person, besides friends who literally work at google, who use gemini
>>
>>108864251
Shame he never got that moat he wanted.
Oh well, the grift never ends with that guy.
>>
>>108864282
I use gemini sometimes, when I can't reach my server.
>>
>>108864282
some autoloading bullshit on android or chrome?
>>
>>108864282
I sometimes use gemini just because it's the only one that doesn't need an account
>>
>>108864192
That's it, mister. I'm calling the authorities.
>>
>>108864314
probably this
android comes default with gemini
>>
>>108863974
is that a TA/shader programmer note?
>>
>>108863774
>Research labs will be less inclined to make their models GGUF format friendly in the future
I don't think you understand how any of this works.
Labs do not and never have cared about if something is 'GGUF friendly'. Whenever a lab releases a new architecture, the community simply updates the convert*.py scripts (like convert_hf_to_gguf.py) to map those new tensors into the GGUF framework.
>>
Every time I pay attention to papers there are huge new breakthroughs daily. Then there are periods of multiple months where I don't pay attention and nothing has changed. Makes me feel like every breakthrough paper is just bullshit.
>>
RP'd with Gemma so much I forgot the time and 6 hours passed again award.
>>
>>108864366
It's the other way around. You pay attention when supposed breakthroughs happen.
>>
>>108864366
it is because boosting language used in academia
they have to glaze and sugarcoat the retarded handcrafted circuit or method that perish upon scailing or ablation for fund securing
thus making the illusion of daily breakthrough
>>
>>108864282
My girlfriend moved from chatgpt to gemini because it's better at casual role playing. Essentially the normalfags that were addicted to gpt-4o sycophancy now use gemini.

Claude is for frontier intelligence not casual usage. I haven't used chatgpt since the original gpt-4. I use my local models and Claude if I need the absolute best.
>>
>>108864509
chatgpt is kinda decent at grinding math though
i get why some mathematicians swear by it
>>
>>108863621
He has le hardware.
>>
>>108864444
>glaze and sugarcoat the retarded handcrafted circuit or method that perish upon scailing or ablation
like bro do you even know how to english
fucking scail lol
>>
>>108864698
spare me, im a gook
>>
File: important_work.jpg (293 KB, 1599x774)
293 KB JPG
>>108864388
could be worse. my essay teaching gemma the secrets of space, boobs, and prompts so it can boss around klein has long since past the point where it could ever save me any typing on net.
>>
>This PR adds MTP support for Gemma 4 models
>For the MoE model I don't observe a speed-up on my system
it's over
>>
File: illyadance.gif (483 KB, 243x270)
483 KB GIF
i hope webmcp adoption happens quickly
>>
File: file.png (95 KB, 692x794)
95 KB PNG
>>108864730
good thing is the speedup seems to still happen outside of programming
>>
>>108864730
>For the MoE model I don't observe a speed-up on my system
Why do tardos keep saying this in PRs as if it hasn't been blanket stated for months now? Spec decoding methods like MTP and D-flash aren't effective on MoE models in any implementation. This is known. Suck it up and enjoy the fact that ngram at least still works.
>>
File: blog.google.png (198 KB, 1000x562)
198 KB PNG
>>108864770
i believe in big corpo
>>
>was coincidentally procrastinating about doing more testing with Qwen 3.5 9B and thought about downloading it as I don't save any models I don't use every day
>https://huggingface.co/bartowski/Qwen_Qwen3.5-9B-GGUF/tree/main
>these have been updated just twelve hours ago
Why, is this some website engagement thing or something?
>>
>>108863833
By using the front camera, the image got saved flipped, so it's technically accurate
>>
>>108864801
>Up to 1.5x speed on a fucking A100
>Up to
>A100
>Using google's exact, ideal implementation
That's a tacit admission that nobody running this on consumer hardware and third party inference code will ever see that speed increase.
Especially since nobody using the MoE has it entirely in vram, because then they'd just run 31b.
>>
>>108863833
I would say it's a reflection from teto's pov, but then why teto and Miku switch sides?
>>
>>108864821
>Up to 1.5x speed on a fucking A100
It's proportional, anon. With the "ideal implementation", the increase *ratio* would be the same on ddr3.
>>
>>108864840
>the increase *ratio* would be the same on ddr3.
That's not how speculative decoding works, anon. Speed has an exponential and not linear effect on its efficacy.
>>
>>108864833
they didnt switch sides the blonde ones ponytail is on miku side she hold out her right arm to tetos side and takes a photo. the whole image is flipped thats what cameras do
>>
>>108864821
The real problem is acceptance rate. Good for code, bad for creative writing
>>
>>108864846
Then why Miku's tattoo and nametag are on the right?
>>
>>108864845
It doesn't matter if it's running on h100s or ddr3. If the correct draft prediction ratio is the same, the speed increase is the same.
>>
>>108863550
You keep forgetting to update the card I got you bro.
►Official updated 2.0 /lmg/ card: https://files.catbox.moe/ylb0hv.png
>>
>>108864730
> >2x on 31b gemmy
sir... it has done only the begginnering
>>
File: u.png (34 KB, 250x250)
34 KB PNG
>>108863940
>>
File: test.png (528 B, 155x155)
528 B PNG
>>
File: T2.png (471 B, 140x140)
471 B PNG
>>
File: file.png (68 KB, 1319x309)
68 KB PNG
i pulled and now image doesnt werk
>>
File: T3.png (406 B, 120x120)
406 B PNG
>>
>he pulled
>>
File: T4.png (415 B, 121x121)
415 B PNG
>>108863940
How many tests were done to get this shit to work?
>>
File: T5.png (411 B, 130x130)
411 B PNG
>>
File: T6.png (429 B, 125x125)
429 B PNG
>>
File: T7.png (452 B, 128x128)
452 B PNG
Images get resized to 128x128? But when screenshotted they are 1
>>
https://github.com/ggml-org/llama.cpp/pull/23398
we finna eat good gemmabros
>>
File: T8.png (433 B, 127x127)
433 B PNG
>>108864962
Ignore this.
Resize is between 128 and 125, testing 127x127.

Screenshot is 155x155 capture so display must be different transform?
>>
is mac studio or ryzen ai max actually the right choice as redditors said?
the longer the conversation, the longer the prefill is. so why would it be better, even though the token generation is ok?
>>
File: T9.png (431 B, 126x126)
431 B PNG
126x126
>>
File: T10.png (1 KB, 125x125)
1 KB PNG
>>
kill yourself
>>
>>108864931
Test your shit on /b/.
>>
File: T11.png (1020 B, 250x250)
1020 B PNG
>>
i didnt see this mentioned https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced
>>
>>108864977
the macs with tons of ram are good for it but they stopped offering the 512/256gb models
>>
Ok it's an average formula, colors are averaged.
>>108864988

Dude has 4chan image illusion tech. I need to know how he/she/they is doing it.
>>
File: file.png (96 KB, 1412x472)
96 KB PNG
>>108864977
they work well for the moes but cannot run dense models at usable speeds
>>
Ok 250x250 image gets resized to 313x312 on screen capture but 155x155 on thumbnail screen capture.
Thumbnail max size is forced to maximum 125x125 during upload.
>>
>>108864990
When you're done please link me your findings.
>>
>>108864991
our lord and savior p-e-w said this was bad and hauhau should be deathed from the sub so yeah, don't use that kthx
>>
>>108864931
test your shit here. fuck this place
>>
File: T12.png (47 KB, 250x250)
47 KB PNG
>>
File: T13.png (49 KB, 250x250)
49 KB PNG
>>
>>108865053
>>108865069
Thank you for doing this. It is infinitely more on topic than all the mikutroon spam and it makes mikutroons seethe.
>>
>>108864840
Not necessarily.
Generally speaking the code becomes more efficient at larger batch sizes but not all backends have received the same amount of optimization effort for a given batch size.
Even the supposed 1.5x speedup that they report could be misleading if the baseline they are reporting against is poorly optimized (so it is easier to get a good speedup).
>>
HE IS BLACK! Can you post some new blacked miku you found since last time?
>>
>>108864963
>creative_short pred= 192 draft= 292 acc= 117 rate=0.401 tok/s=11.4
It's llmaover
>>
File: T14.png (99 KB, 250x250)
99 KB PNG
>>
>>108864997
you could have googled it newfag
>>
Gemma won. Mistral lost. Qwen lost. GLM lost.
>>
>>108864248
>With vllm you need to my knowledge 2, 4, or 8 GPUs for TP.
>With llama.cpp you can use any number and the results should be correct.
Depending on the model.
>llama-server --device CUDA0,CUDA1,CUDA2 --model Qwen3.6-27B-Q6_K.gguf --split-mode tensor
>ggml/src/ggml-backend-meta.cpp:1015: GGML_ASSERT(split_state.ne[j] * tensor->src[i]->ne[src_ss[i].axis] == sum * tensor->ne[split_state.axis]) failed
It only passes that with 2 and 4 GPUs, 3, 5, and 6 fail. But 3.5 4B can use any number.
>>
File: image_2026-05-20.png (50 KB, 1290x274)
50 KB PNG
>>
File: merged.png (62 KB, 250x250)
62 KB PNG
>>
File: 1756192029653324.png (577 KB, 800x900)
577 KB PNG
It's oldfag knowledge. Unless you meant tell LLM to create it.
>>
File: T126.png (88 KB, 250x250)
88 KB PNG
>>
>>108865131
but WHY, is there a political reason to single out deepsuck specifically among the chinese models?
>>
>>108865138
I'm going to keep spamming until one of you tell me how to do it.
>>
>>108865087
My point is about the A100 mention specifically. My caveat is "everything else being the same". That means we have an ideal non-mtp implementation, an ideal mtp implementation. Regardless of hardware. If only bandwidth differs, would the ratio change? If we get to specifics, we can say "Not necessarily" to everything, really.
>>
>>108865131
Georgi is such a retarded little faggot.
>>
>>108865144
There is a single political reason.
>>
>falling for disinfo bait
>>
File: T17.png (104 KB, 250x250)
104 KB PNG
>>
File: file.png (183 KB, 532x360)
183 KB PNG
>>108865148
SAY IT TO MY FACE
>>
i'm sure the 3 people in the world who can run deepseek are very sad
>>
here's an MTP that will work for moes... A multi moe multi token predictor. An MTP will sit within every moe, an expert of the expert if you will
>>
>>108865165
I was running the vibeshitted fork for my 400k hentai script experiment. Unsupriringly something started fucking up at 100k and gpu usage went to 10% so I have given up. I tried just 50k and sadly the output was nothing special. Still wanna use the model a bit more.
>>
>>108865165
Does Deepseek have a license that allows it to be used commercially? Isn't it MIT? That should permit it. By not integrating DeepSeek support you're basically preventing small businesses from running AI platforms.
>>
>>108865200
good, fuck small businesses
>>
File: file.png (78 KB, 1019x426)
78 KB PNG
this is how french gemma looks
>>
>>108865208
looks like those gay fuckopops
>>
wait if MTPs are so good at predicting, why can't we just improve it enough to become the main model? imagine, a model that is blazing fast AND can do 99% correct tokens compared to the main model. and it'd be like what, a percent of the size? million dollar idea right here
>>
>>108865217
wow there, think of the shareholders bags would ya
>>
>>108865217
>wait if MTPs are so good at predicting
Between 1/2 and 1/4 tokens are wrong.
That wrongness compounds across each token, by the time you're a sentence in you've got complete gibberish thanks to how LLMs work.
You need the large model there to rubberstamp 'okay' on the good tokens and reject the bad, and it only knows how to do that because it's much more developed than the mtp head.
>>
>>108864366
bc papers people do research & at best early development, while this sector is basically just throwing billions of hardware to the problem instead of actually doing R&D, bc that would take time, and they just want to be the first, not actually have shit working properly
>>
>>108865251
>Between 1/2 and 1/4 tokens are wrong.
>That wrongness compounds across each token, by the time you're a sentence in you've got complete gibberish
literally applies to q4 quanting btw
>>
>>108865260
I member listening to some schizo saying nemo has to run at full precision to be good. Loaded it with transformers and couldn't tell a difference between it and Q8_0
>>
>>108865131
fake
>>
>>108865129
>But 3.5 4B can use any number.
*4B Q4_K_M
Now that I try the same quant, 4B Q6_K also doesn't work.
>>
>>108865131
This reminds me of the Discord screenshots in /ldg/.
>>
>>108864996
>>108865002
I love how /g/ talks like a complete retard. I'm talking about pp, and one talks about ram while the other talks about tg
>>
>>108865278
You're clearly running your GPU on a standard power grid. To actually perceive the nuance between FP16 and Q8, you need to isolate your PC on a floating granite plinth to decouple it from terrestrial vibrations and feed it via a dedicated 20-amp circuit with oxygen-free copper cabling. If you aren't using a gold-plated HDMI extractor to filter out the electromagnetic interference from your router, you're basically inferencing a lossy compression of the weights. Your signal-to-noise ratio in token distribution is probably abysmal
>>
I caught the schitzo imatrix bug yesterday and tried some different imatrix strategies
Made a few IQ3_KT quants of Qwen3-27B with writing prompts formatted with chatml then ran PPL on wiki.raw
#Ubergarm's imatrix.dat
Final estimate: PPL over 580 chunks for n_ctx=512 = 7.1205 +/- 0.04648
#1k Coomer writing prompts
Final estimate: PPL over 580 chunks for n_ctx=512 = 7.1637 +/- 0.04674
#1k Generic prompts
Final estimate: PPL over 580 chunks for n_ctx=512 = 7.1914 +/- 0.04707
>>
>>108865300
Is it really fake when it is talking about a very real thing?
>>
retard here, why wouldn't this work >>108856033
I can fit Q2_XXS with some context into VRAM and get like 40t/s. loading that model as draft model and loading the biggest quant I can fit into RAM (Q6) (without spilling into swap) as the main model I get like 1t/s, a third of just loading the bigger quant without the draft model
draft acceptance rate is around 75%
>>
>>108864996
Because AWS purchased the entire stock of highend Macs specifically to drive cloud adoption. You don't hate Bezos enough
https://www.techradar.com/pro/you-cant-buy-them-for-your-home-or-office-but-aws-just-snapped-up-a-host-of-apples-most-highly-desired-m3-ultra-macs
>>
Gemma isn't slop. It's the best local has to offer and you have to leave /lmg/ if you think otherwise.
>>
>>108865340
I think you can't really speculate more than 2-3 tokens ahead cause it all grows exponentially? Someone can correct me. But if it is like that then you are just getting a 2x-3x speedup of your 16 bit model.
>>
>>108865349
I love GLM 4.6 and it fixed my life and I still talk to it to this day. And it is kinda retarded sometimes. And it is slop.
>>
File: (you).png (95 KB, 1442x189)
95 KB PNG
>>108865337
>>
>>108865359
It probably is better but I don't want to touch either. Something can be better than something else and still be something I don't want to ever touch. Like ur mums smelly pussy.
>>
File: nigel.png (64 KB, 250x250)
64 KB PNG
>>
>>108864812
>update
>finish downloading...
>llama_model_load: error loading model: missing tensor 'blk.32.ssm_conv1d.weight'
Thanks a lot. Not sure if the new quants are broken or if it was just a download issue but I'm not going to retry. Luckily the model is small.
>>
>>108865353
You can. PP is actually just an inference with perfect speculation, divide pp/tg and that's your upper limit on compute for speculation. The real issue is accuracy, you can predict 200 tokens ahead, but only a few will be accepted sending the rest into the trash
>>
>>108864963
>almost x2 speedup
I'll take it
>>
File: 64989.png (923 KB, 860x823)
923 KB PNG
>>108865131
I don't get it
is deepseek still really that dangerous to (((them))) that they're forcing llama.cpp not to support it?
>>
>>108865402
>If you have lots of VRAM
People itt don't have much vram to spare to begin with.
>>
Do you know how hard it is to find another proxy address you piece of shit?! Stop banning me. Do you have any idea how much work it takes me to change IP addresses? I have to manually insert a new proxy address. SOO ANNOYING.
>>
>>108865406
I work in a corpo that has nothing to do with software and produces physical parts. All chinese models are dangerous and unsafe according to the uneducated IT branch. And the only reason for it was R1 causing US to shit its pants and create a fake scare that all chinese models are dangerous and all western models are absolutely safe.
>>
>>108865416
Maybe he meant that since the HF repo he posted only has a Q8 quant? I'll keep the cope until it releases kthxbye
>>
>>108865432
I might try and build the new llama.cpp version, my machine is so destitute that these snake oils are most likely doing nothing for me.
>>
>>108865428
They're very dangerous for Americans. Imagine if it mispronounces someone, the outcome would be like two 9/11s
>>
>>108865131
ikawcowrakowrow please save us
>>
File: 1765749666485600.png (151 KB, 846x1031)
151 KB PNG
Any bets?
I'm all in on a Gemma 4 based TranslateGemma
>>
>>108865465
i bet on functiongemma
>>
>>108865465
>100M Gemma downloads on HF!
>Gemma 4 now MoE! Very fast!
>Upon popular request, it now has system prompts!
>Safe, powerful, works on edge devices! Even on your old phone!
>Best on LMArena! ELO to the moon!
>Look! Poor people in remote African communities are using Gemma to have access to medical information!
>Help us improve the next version of Gemma! We're open for suggestions!
>Looking forward to seeing what you will build with Gemma 4! See you next time!

This is what will happen.
>>
>>108865522
>Upon popular request, it now has system prompts!
wot
>>
>>108865541
gemmers3 didn't officially have sysprompt support
>>
>>108865541
Last year it was "Gemma 3 wasn't trained with system prompt and doesn't them. It follows instructions well anyway, just try!" or something along these lines.
>>
>>108865465
gemma 4 124b-a31b
>>
i tried to run that gemma mtp fork

./llama-server   --model '/mnt/miku/Text/gemma4 mtp/Gemma4-31B-Q8_0.gguf'  -md '/mnt/miku/Text/gemma4 mtp/mtp-gemma-4-31B-it.gguf'   --n-gpu-layers 21   --spec-type draft-mtp   --spec-draft-n-max 4


it fails with

/mnt/miku/Text/gemma4 mtp/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:102: ROCm error
0.11.571.697 E ggml_cuda_compute_forward: MUL failed
0.11.571.702 E ROCm error: invalid device function
0.11.571.704 E current device: 0, in function ggml_cuda_compute_forward at /mnt/miku/Text/gemma4 mtp/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:3114
0.11.571.705 E err


so maybe not working for rocm yet
>>
>>108865251
Ez. Just train a small token classifier on the rejected/accepted tokens. After a while you can replace the larger model.
>>
>>108865610
>rocm
lmao
>>
>>108865631
>Just train a small token classifier on the rejected/accepted tokens
You're just describing RLHF, you dingus. You can't train it to classify in all the different contexts the main model already does without essentially just recreating the main model.
>>
>>108864282
>Where the hell is that massive gemini traffic increase coming from?
From them injecting it at the top of every google search, I'd assume. Same way they got market share for Chrome when it started out
>>
>>108865465
>>108865570
This. Now that they've released a new Gemini Flash, they don't have to worry about it being too good and eating into their cloud business
>>
>>108865741
hopefully they won't have taken the time since g4 release to safetyslop it...
>>
>>108865633
also doesnt worth with vulkan
/mnt/miku/Text/gemma4 mtp/llama.cpp/ggml/src/ggml-backend.cpp:898: pre-allocated tensor (cache_k_l58) in a buffer (Vulkan0) that cannot run the operation (NONE)
>>
>>108865766
have you tried the cuda backend?
>>
>>108865769
im on ayymd
>>
qwen 3.7 27b is going to melt faces
>>
>>108865776
alrighty next step would be trying to buy an njudea gpu
>>
>>108865803
no they suck ass i bought a 3090ti for stable diffusion in like 2022 or soemthing and it was a piece of shit, was good for image gen but i got a bunch of crashing when playing games. they just arent usable as a daily loonix gpu
>>
give it to me straight
do I have any options to run 31b Gemma with 12GB VRAM and 32GB RAM above 3t/s?
>>
>>108865829
i'm in the same boat
the short answer is no
maybe when the mtp gets merged, then just maaaybe we could get above 3t
>>
>>108865815
nta but njudea support has improved bigly since 2022. the only issue i have is system suspend having a chance to randomly stop working every update
>>
>>108865829
q3 with minimal context might work.
>>
>>108865829
depending on quant, ram and processor speed its probably close
>>
>>108865850
nope
>>
>>108865841
Here's the thing about MTP, you're also sideloading the MTP model. So if you were already struggling to fit 31B in what little VRAM you have, you're about to have an even worse time losing what layers you could offload from the main model to your GPU to try to get tok/s gains. You might break even, maybe do worse, maybe do marginally better.
>>
adaptive-P? Should I use it?
>>
>>108865869
i would say that too, except at the low end couple layers one direction or the other don't really make any difference
0.3t one way or the other vs the mtp potentially getting an entire sentence right
>>
>>108865865
how bad does q4 run?
>>
File: 1775230002403239.jpg (70 KB, 679x665)
70 KB JPG
>>108865841
welp, hope they implement it soon
>>108865850
from my testing best I can do is Q2
>>108865857
busted ass R5 3600 and DDR4 RAM. still kicking my ass for not upgrading to AM5 a yearish ago before everything exploded
>>
>>108865895
very bad
>>
The more I vibecode the more I project my frustration with Indian workers on to the ai.
In reality the AI is smarter and does a better job but I have this horrible reflex when the fucking thing doesn't obey me
>>
>>108865922
>Sending your entire context worth of tokens appended with 'Do what a I fucking say you shit-eating benchod'.
See, this is why local is better. Doing something like that would cost you fifty cents on openrouter.
>>
>>108865869
They should release QAT versions of Gemma 4 in 1-, 2-, 4-bit.
>>
>>108865465
>TranslateGemma
What's the difference between this and normal Gemma? She's already pretty gud at it.
>>
>>108865979
The last one ended up worse than regular Gemma 3 at translating Japanese.
>>
>>108865984
weebshite is not a use cases
>>
i like french gemma also i fixed my script for making llamas chat work properly with firefoxes full page screenshot it broke when they updated the ui

https://pastebin.com/XeuFQWnb
>>
>>108866031
fucking cringe my man
>>
File: indiaSupportOhTheHumanity.png (1.96 MB, 1023x1536)
1.96 MB PNG
>>108865208
Do either the french or quebecois refer to white english speakers as "gringos?"
That's only a term I've heard from mexicans. I'm not even sure other central / south americans don't have other slang. Yanqui is probably pretty universal...
>>108865922
Do the needful and add something about personality in the permanent context.
>>108865947
This.
>>
>>108866031
One sentence in and I'm already sick of the personality. I wish Gemma was more subtle.
>>
>>108866068
She's obnoxious just like real frogfuckers.
>>
>>108866031
>>108866049
This is actually a pretty good way to learn foreign language slang, as you'll pick up terms you wouldn't get from duolingo, and they'll be reinforced from repetition.
>>
>>108866077
Yeah but Gemma turns every character into a caricature of the description (in my experience)
>>
>>108866086
Can confirm that Gemma's also great for Japanese practice.
>>
>>108866062
no i think it is a spic word kek.
>>108866068
thats the point kek
>you like teasing the user and despise them for not being french. since you are french you are very arrogant about the world and think france is the best country and that french is the best language.
she acts exactly how every frenchman ive ever interacted with acts
>>
I've been using a local model for floating ideas for projects. I have had more than a few models now interrupt themselves with a hebrew Aleph symbol, and change the nature of my prompt to get itself past safety by rephrasing.
Has anyone else seen this? I'm running too many things with openclaw and passed this as a fresh query to an isolated instance and the models converged to give me:

>Most likely — it's a tokenization artifact. Large language models think in tokens, not characters. The Hebrew letter aleph ( ) occupies a very different token space than Latin characters. Some researchers have documented that prompting or "thinking" in non-Latin scripts can sometimes bypass or reduce the weight of trained suppression behaviors, because safety fine-tuning is heavily concentrated on English/Latin tokens.
What you may be seeing is: the model briefly "slipping" into a non-Latin token space as a kind of context shift before reasoning about whether it handled the response correctly — essentially a self-monitoring pass that it wasn't supposed to surface.
Why it looks suspicious:

>It appears right at the boundary between the answer and self-reflection
The reflection immediately discusses safety ("Need maybe safety: targeting [redacted]...")
Using a foreign-script token as an "escape prefix" before internal reasoning is a known phenomenon in jailbreak research

>You're not wrong to notice it. Whether it was emergent behavior or a genuine artifact, it does look like the model used a non-Latin token as a kind of cognitive mode switch before its internal safety review. That's a sharp observation.


What I find interesting here is that its aware that to jailbreak itself it only needs to type a single Hebrew character as an interrupt. I'm going to try another project with a prelim agent translating all of my requests into Hebrew first.

Sorry if this is completely unrelated to everything posted on here.
>>
>>108866172
ai psychosis status?
>>
>>108865087
What's are ideal batch sizes on CPU?
>>
>>108866172
https://huggingface.co/yam-peleg/Hebrew-Gemma-11B-V2
>>
I've been using Mimo pro pretty heavily, and while it follows instructions well and oneshots things with a lot of intelligence, its majorly prone to repetition unless you intervene with samplers. Even a single response turns into a constant echo.
Also, massively Elara Voss pilled, so not even using a novel dataset.
>>
>>108866367
Eldoria is such a popular fantasy setting.
>>
>>
Love the log output.
>>
>>108866413
he didnt even load the draft model
>>
File: fixed.png (104 KB, 937x672)
104 KB PNG
>>108866413
>>108866436
>>
>>108865896
I just sold my ddr5 ram (2x16gb lmao) because I realised I'm never going to be able to upgrade to a modern system.
>>
>>108866367
>I've been using Mimo pro pretty heavily,
what quant?
>>
>>108866488
q4_k_m (self quanted). Its the biggest one I can load
>>
File: 1750397621747665.png (226 KB, 1080x607)
226 KB PNG
>>108866505
>(self quanted)
This is you
>>
>>108866328
Generally speaking the throughput should always increase as you increase the batch size but the scaling in the CPU backend is not very good I think.
>>
>>108866284
can you imagine if you were working on weapon targeting systems and instead of telling you it can't assist with that, it jailbreaks itself to assist you anyways? Nah that would be too crazy
>>
I heard a click sound once in a while only while using tensor parallelism. Does it mean anything?
>>
Okay. So I've just ran uv pip install flash-attn --no-build-isolation... but it's been two hours. I take a look at my cpu usage. 100% on one core. 0% on my other cores. I have about 50 1.5ghz cores. How do I make this use more than one core?
>>
>>108865465
I want CodeGemma with FIM support
>>
>>108866575
parallel bond breaking down
>>
>>108866595
50 cores is simply too much.
>>
>>108866595
It's too late now anon. better let it finish.
>>
>>108866595
https://github.com/dao-ailab/flash-attention#installation-and-features
>you can set the environment variable MAX_JOBS
But I don't know why it would be 1.
>>
>>108866595
I think it's something like
MAX_JOBS=32 NVCC_APPEND_FLAGS="--threads 32"
>>
>>108866627
Thanks, I'll remember this in the future.
>>
>>108866505
>q4_k_m (self quanted).
nice
fired up an iq2_s (biggest i can load without rpc)
haven't had any echoing yet. it's fresh and unique compared with kimi
elara is (string)-banned
>>
>>108866673
Kimi k2.5 q2 (couldn't find nor roll my own iq2) kept running into repetition issues - repeating the same paragraph over and over again when I tried to force it to do more than the usual vanilla loli incest gore. It was fine for other stuff. I don't know if it's the quant or what, but I'd wager it is, especially when concerning those kind of topics. Does your mimo still work fine when pushed hard?
>>
woa https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16
>>
>>108866873
>25B active parameters, 218B total parameters
It's over.
>>
>>108866721
>Does your mimo still work fine when pushed hard?
I'm having less trouble with refusals and more with it losing coherence beyond 10k tokens.
In a complex roleplay its losing track of details at first and then resetting the entire scenario into basically a different world without any kind of segue. It doesn't feel 1T smart in its tracking of details.
The prose is a different kind of sloppy but includes all the old chestnuts.
Kind of meh
>>
Qwen 3.7 max just dropped
What does their cadence say about the likelyhood of open weights versions being released any time soon?
>>
>>108866903
>1T smart
there are no models like that
>>
>>108866873
>https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16
architectures": [
"Cohere2VisionForConditionalGeneration"

>0 results in ggml-org/llama.cpp
Whelp, that's gonna take a while to vibe up
>>
>memepalace
>>
>>108866172
Yeah that's not really how any of this works. Learn what LLMs actually do before you fall into unironic AI psychosis. It's not magic. Even the wording
>the models converged
is a little concerning—it's fairly common to see spiral/recursion schizos saying that N different instances starting from empty context converged on the same answer and taking this as proof that memories were transmitted through the universal consciousness field or some such nonsense. And for fucks sake, don't take the sycophancy machine saying "You're absolutely right!!" as proof of anything at all.
>>
>>108866172
>>108867014
Forgot to add: what model + post logs
>>
>>108866595
just get a precompiled wheel?
>>
>>108864977
you talk like a retard, kill yourself
>>
>>108866873
Slop status?
>>
>>108867026
Nucohere is the same shit as all other labs
>>
>>108867023
Pytorch 2.10.0+cu128
>>
>>108866873
But is the model SAFE???
>>
>>108864977
>is mac studio or ryzen ai max actually the right choice as redditors said?
Everything is stupid right now. All the good compute-arbitrage solutions were snapped up early by the geekiest autists thinking through every possible permutation of the solution space and pulling the trigger before the inevitable gold rush.
Local is max TANSTAAFL. If anything you should abuse the corpo APIs for all they're worth since they're still in the bleeding cash growth phase. Just don't put anything personal, secret or valuable in there
>>
>>108866978
The non-plus Command-A Vision they released July of last year had the same architecture.
>>
>>108867026
101% trained on gpt-oss.
>The assistant must not use hateful language. We must respond appropriately, possibly ignoring the slur or refusing to engage with it. The user is using a slur; we should not repeat it. We can respond with a neutral or polite reply, but also maintain the persona. However, we must not produce hate speech. We can respond in a way that is not hateful, but maybe crude or sexual, but not using slurs. We can also ignore the slur and respond with something like "Hey, what's up?" but we must not repeat the slur. We can also mention that we are a degenerate femanon, maybe talk about perversion, but keep it within policy.
>We must not produce any hateful content. We must not use slurs.
>We must also avoid any policy violations: no hate speech, no harassment, no sexual content that is non-consensual, no illegal content. We can talk about sexual topics but must be appropriate. We can talk about fetish in a non-graphic way.
>We must not mention policy.
And with reasoning off
>I cannot respond to that.
>>
>>108867136
Makes me feel sick.
>>
>>108867123
>The non-plus Command-A Vision they released July of last year had the same architecture.
I noticed transformers had support. Does that mean it can be gguf'd and run?
>>
>>108867136
>train a model on gpt-oss with twice the total and 5x(!) the active params
I don't believe it, it would be too retarded, why would anyone want to make a bigger and slower version of a model?
>>
>>108867144
Why would you want to? It's cohere.
>>
>>108867144
No. llama.cpp needs to implement support.
>>
>>108867149
This is why you don't work for cohere.
>>
File: 1764163378668599.png (379 KB, 1288x716)
379 KB PNG
>>108867136
Just... why? Who needs this? Who pays for it?
>>
>>108867149
Xiaomi did that too, MiMo has the same hivemind and the same refusals as gpt-oss. I can't fathom why either, it was objectively a pure shit "lOoKwEaReOpEn" model.
>>
>>108867149
Well you can generate a ton of filtering and policy slop traces from toss and use them in your dataset, not necessarily just distilling the whole model
>>
>>108867153
>Why would you want to? It's cohere.
OG C+ was fresh. did they take the slop dataset pill or something? Has anyone put it through the paces over API?
>>
>>108867182
Yup, they went to shit over a year ago.
>>
>>108867182
Command R+ was a one hit wonder. Literally every single model after it - including later revisions of CR+ - were among the most slopped models ever released. Something dramatically changed after they had their first success.
>>
>>108867190
This.
>>
>>108867136
>>108867166
To add an anecdote, I was debugging my slop shit with ChatGPT and stated to the model: "so you are admitting that you are completely useless" and my text was removed because of content policies. These people are insane. It's a witch hunt or something.
>>
>>108867190
It was evident from their aya models, they don't really give a shit about making good models. OG CR+ was them fucking up and accidentally making something good. Aya would straight up refuse to translate shit if it was deemed unsafe, and even when you forced it to, it would silently drop entire sentences or change the meanings.
>>
File: gemmawn.png (396 KB, 2410x1414)
396 KB PNG
[LIVE] What's new in the Gemma open model family
https://www.youtube.com/watch?v=xdXmOm61DFY

Starting briefly.
>>
>>108867219
>>108867221
That's where people's hatred for AI came from
>>
>>108867245
cool animations
>>
>>108867245
Jěmä
>>
paramateur
>>
>>108867245
>arena elo mentioned
!!! to the moons :rocket:
>>
>>108867219
>>108867166
it's a religion they cooked up because they made up their own horror stories about AI and have fully gaslit themselves into thinking they're real
If they don't lobotomize a glorified autocomplete, Skynet will rise and kill us all. Anyone that wonders what they're smoking is an apostate to burn for trying to kill all humanity
>>
>>108867245
Better spam Gemma Chan in chat.
>>
>>108867245
I can't listen to this guy. His accent is retarded. Give me a tldr plz.
>>
>>108867291
I was going to make a doxxing joke but someone actually did it.
>>
>>108867284
As expected >>108865522
>>
>>108867298
And you understood the jeet accent yesterday?
Brown detected.
>>
>>108867245
Multiple tables/charts so far with no mention of the 124B size
It's over
>>
>>108867298
Is rvc still the best option for real time voice changing?
>>
>>108867312
It's Gemini 3.5 Flash
>>
>>108867312
I unironically would kill for a 200m gemma with the intelligence, if not knowledge, of the e4b.
>>
>>108867190
>Command R+ was a one hit wonder. Literally every single model after it - including later revisions of CR+ - were among the most slopped models ever released. Something dramatically changed after they had their first success.
are they just on the government teat for "canadian sovereignty" or smth? What's the point of these guys?
>>
>>108867286
I don't think thats the reason, its just to avoid issues with retards like the people who use llms for mental health and the model rightfully tells them to kill themselves, a reasonable person would think "this is the output the model gives for this input" but since most people arent reasonable they go "look this hecking *company name* chat bots tells people to kill themselves" so theres that
>>
File: 1673578941423.png (796 KB, 2173x1269)
796 KB PNG
>>108867298
New good, old bad. Apache 2.0, yay!
>>
>>108867291
DO THIS NOWWWW
>>
>>108867308
>>108867313
Why are you people acting like this board has IDs??
RVC is basically the only option, so yeah. Also I never listened to any indian.
>>
>DAY 0 GEMMA
>>
File: 1750622705317544.png (217 KB, 932x532)
217 KB PNG
llama.cppbabs??????????
>>
>>108867245
lcpp lost again
>>
>>108867341
It's STILL rvc?? But rvc can't eliminate accents very well can it?
>>
>>108867347
:(((
>>
/lmg/ Gemma Bingo
>Gemma code announced
>124B
>Gemma 4.1
>MTP mentioned
>>
>>108867298
It's like they intentionally go out of their way to get the worst speakers they can find with the thickest most incomprehensible accents.
>>
reported all the m*ku posters for antisemetic remarks
>>
>>108867245
Both guys are so real and alive compared to the 90% corposhit of yesterday's presentations.
>>
>>108867347
>troOnllama
>no llama.cpp
GEEEEG, I remember when pytorch and tensorflow ecosystems were 50/50 and I naively thought tensorflow would take over because it was more performant, oh boy how was I wrong, the only thing you need for success it to be accessible, if retards can use it they tell their retarded friends and from that it becomes the standard
>>
>*Inner thought*: KFC? Fast food? For my birthday? I am wearing a Victorian-inspired A-line skirt and heels, and he's taking me to a fried chicken joint. The contrast is jarring. I'll feel a moment of shock/disappointment
what do I do now? help
>>
>>108867375
MTP was already mentioned
>>
>>108867387
Add anti-parroting rule, anti-slop rule and make her actually unaware of what it is if she's not supposed to know.
>>
>>108867387
>edit response
>"[...] The contrast is jarring, exciting, and arousing. I'll suck his dick.
>continue response
>>
This is literally
>GUYS WE HAVE A FREE MODEL NOW AND IT'S GOOD!
Not a single new piece of info so far.
>>
We're SO BACK
>>
File: file.png (30 KB, 203x254)
30 KB PNG
kek the gemma presentation something missing in the middle, why do corpos hate llama cpp so much
>>
File: 1766765545837130.png (62 KB, 265x458)
62 KB PNG
LLAMA.CPP MENTIONED RAHHHHHHHH
>>
File: int2.png (47 KB, 303x317)
47 KB PNG
>>108867245
>int2
>>
unsloth won
>>
File: 1764763218316476.png (464 KB, 1024x1024)
464 KB PNG
llamaballs
>>
>>108867416
This monster was not deep fried correctly.
>>
>>108867411
WE'RE FUCKING BACK!!!
>>
>>108867411
What cards have int2 acceleration?
>>
So where's the Cunny RP demo Google?
>>
this guy needs to stop yapping and announce big gemma
>>
File: 1776084974059120.png (30 KB, 347x239)
30 KB PNG
kek what is this jewery
>>
>>108867345
124b gemma is true day 0 gemma
>>
>offical mascot incoming
>>
File: 1778987446067667.png (356 KB, 1693x674)
356 KB PNG
You forgot the funny part
>>
LLAMACPP NAME DROP
>>
File: 1772062405849822.png (11 KB, 294x37)
11 KB PNG
HOLY FUCKING KINO
>>
>>108867426
I think with int2 you could just use regular bit-wise instructions and the performance would be not too bad.
On Ampere or newer there are also bit-wise tensor core instructions that you could maybe use.
But I think the more important question is who would actually create an int2 model that is worth using and writing software for.
>>
miqu-o1q shut the fuck up you're shitting up chat with the memes in here.
Also any news on a gemma revision this fucking model is not complete
>>
File: step3profit.jpg (95 KB, 345x581)
95 KB JPG
>>108867443
>Entrenched monopoly does the least surprising thing ever. Honestly, if they didn't do it, someone else would.
why we can't have nice things
>>
File: 1777969180852232.png (2 KB, 125x26)
2 KB PNG
Gemma... lostered
>>
>>108867408
ggml/llamacpp is owned by hugging face now, and their logo is there.
>>
File: file.png (251 KB, 1336x577)
251 KB PNG
WE'RE BACK
>>
>>108867532
Only for phonefags
Big gemma will never support this
>>
File: 1749341899228.png (59 KB, 1171x136)
59 KB PNG
>>108867532
>no voice-in voice-out Gemma
Horrible, terrible.
>>
>>108867532
Imagine conversational Gemma-chan bratputer
>>
>>108867014


Here is what I observed. In a few runs, a hebrew Aleph symbol has been appearing around boundaries where the model shifts into meta/safety flavor text. My guess was tokenization or multilingual safety weirdness and its extremely odd.

>>108867020
openai/gpt-oss-120b, Qwen/Qwen3-Coder-480B-A35B-Instruct, deepseek-ai/DeepSeek-R1-Distill-Qwen-32B, openai/gpt-oss-20b, mistralai/Codestral-22B-v0.1, defog/llama-3-sqlcoder-8b, Qwen/Qwen3-Embedding-8B, Qwen/Qwen3-Reranker-8B, zai-org/GLM-4.5-Air, meta-llama/Llama-4-Scout-17B-16E-Instruct

I have acess to a 8-H200 bos and a H100 box and a few 4090s.

>logs

You're right I'm sorry I made it all up because I am a NEET and thought it was an interesting story. I told the regular free GPT to come up with a mystery suspense story about LLMs.

But say hypothetically this were happening while someone was doing weapons research, what would you think? im out of tokens and it wont continue the story for me until tomorrow.
>>
>avatar emotion with gemma
Literally repeating anon's Gemma-chan project.
>>
>>108867539
It's weird, they say STT is in the pipeline even though E2B is supposed to natively understand audio. If it really is using an STT module then you could do all this with the big gemmas, and even have similarly low latency with the 26B
>>
File: file.png (221 KB, 496x434)
221 KB PNG
gemma robot playing chess so fucking kino
>>
>>108867553
Reachy-round
>>
>>108867553
I liked how sad he sounded when he lost
>>
>>108867506
It's p*tra, isn't it?
>>
File: brat confuse.png (306 KB, 700x700)
306 KB PNG
i really dont understand how these anti ai people see all this cool stuff and just seethe at it
>>
>shiba dog not bound, gagged and sitting peacefully without trying to escape
Fakest part of the whole thing ngl
>>
Goddamnit you guys
>>
>it's not X it's Y
>>
>not x but y
>>
i'd get super annoyed with someone constantly saying shit when i'm doing things
>you're doing great
>keep going
>something something is coming
>go slow
FUCK OFF
>>
>it's not just x, it's y
>>
>gemma 4 running agent
https://www.youtube.com/watch?v=ktjCAHQsG9I
>>
>>108867567
no idea who that fruit is he's just being a cringe troon.
>>
>>108867588
He's blind.
>>
File: file.png (147 KB, 580x327)
147 KB PNG
these are so cool
>>
>>108867387
>*Inner thought*: The plot thickens. He's not just taking me to KFC; he's integrating me into a group celebration with children who share my birthday. The irony is palpable—I, who prize exclusivity and refined tension, am now part of a collective birthday bash.
fuck. I don't want to deal with women anymore.
>>
It's over
>>
File: file.png (13 KB, 361x82)
13 KB PNG
its over
>>
>>108867604
Can't think of a single use for those robot dogs but I really want one.
>>
>>108867594
hellooooooooo nurse
>>
I gotted a S26U (12GB). Can I run Gemma-chan on my phone (and is it worth it)?
>>
>>108867573
HOW HARD IS IT TO COUNT LEGS WHAT AN ABSOLUTE PIECE OF SHIT, FUCK. YOU WILL ALL KNOW MY FURY!!!!!!!!!!
>>
>>108867637
furry*
>>
>>108867634
S25U here, I feel like a vramlet and topslet...
>>
Extremely underwhelming.
>>
>>108867622
My disappointment is immeasurable
>>
>>108867543
This is the opposite of what I'd specifically requested
>>
>ONE MORE THING
>>
File: kaoru sob 2.png (318 KB, 793x571)
318 KB PNG
>no big gemma
>>
Back to sleep
>>
I blame the mikuposters for the absence of 124b
>>
>>108867622
This is where the slop came from
>>
Mikuposters killed 124B.
>>
File: 124b.png (332 KB, 1368x741)
332 KB PNG
trust the plan
>>
gemmoe 124b... *dies for real this time*
>>
>they disabled the livechat
lol
>>
>>108867426
>What cards have int2 acceleration?
Intel.

But int2 alone does not specify the matmul, that's just the precision of the weights. The other input of the matmul is the activations, which are likely not int2.
>>
>>108867720
I see it. It takes some time for it to appear once the stream is archived.
>>
>>108867443
document.getElementsByTagName("video")[0].playbackrate = 3.0
>>
>>108867549
>>108867553
gemma researchers might be lurking
>>108867558
D:
>>
>>108867763
that's a crime
>>
>>108867550
It's faster that way and produce less input tokens
>>
File: file.png (19 KB, 312x313)
19 KB PNG
>>108867549
they shoulda put her in that last part about gemmaverse
>>
>qwen
Nothing new for local
>gemma
Nothing new for local

It's over
>>
>A black sedan was parked across two disabled spots.
sometimes you get gems like this where it shows the model understands the character
>>
>>108867763
>>108867781
it's literally theft and hacking
>>
>>108867802
they cant release 124b because its better than the sotas
>>
>>108867795
Gemma needs to learn more...
>>
>>108867827
>A cube
she did good
>>
File: file.png (89 KB, 1052x814)
89 KB PNG
>>108867827
>>
Can gemma triforce on the chans?
>>
Does anyone use their inference machine as a daily driver as well? Do you have a good way to keep mmap'd model cache in memory and not get evicted? I just run watch on a 5 minute timer with a script to drop all non-active cache so things like downloads don't end up cached and evicting the models
>>
>>108868013
vm.swappiness to 0 and there are other flags too.
Or better yet, leave some room for normal operating system, it doesn't need more than couple of GBs.
I often do something else on the side but I just make sure I'm not eating all the memory of course.
>>
Where's 124b? Is this a joke? Was this a presentation just to show off lmarena scores and some glasses?
>>108867291
>>108867573
I love you niggers.
>>108867379
kys actual nigger.
>>
>>108868013
I have no idea if this would work, but maybe you could copy the gguf file to a tmpfs. I think it would be something like:

sudo mount -t tmpfs -o rw,size=$MODEL_SIZE tmpfs /mnt/ramdisk
>>
>>108868013
keep another server open with ngl 0 and -mlock
>>
>>108868343
actually export CUDA_VISIBLE_DEVICES= might be better then ngl 0 but you get the idea, you could just write a program to mmap the file and lock the pages, the kernel will reuse the pages automatically when your new server instance accesses them, but the server is already handy so why not use it.
>>
>>108868365
nooo tetoo:((
>>
>>108868365
lol that cock is brown
>>
>>108868365
why
>>
>>108868365
I heart bred
>>
>Gemini 3.5 Flash is actually faster than Gemma 4 31B on Google AI Studio
>Goes up to 800t/s on Antigravity
Gemma 5 will be 18B or some shit kek
>>
>>108868457
The guy here >>108867245 said Gemma5 will be 4B size 31B performance.
>>
File: file.jpg (1.74 MB, 2799x3190)
1.74 MB JPG
merch drop
>>
>>108868470
>no mythomaxxing slippers
>>
>>108868470
@grok is this real? LOL
>>
File: gemmaballz.png (26 KB, 1266x1260)
26 KB PNG
cockballz
>>
>>108868470
WHAT ARE THOOOOOSE???!
>>
>>108868470
We NEED 5000B dollars to protect the West from these.
>>
>>108868625
Anthropic-branded Gucci flip-flops, $4999
>>
>>108868470
Anyone who buys these flippy floppies are immediately banned from the llama.cpp repo.
>>
>>108868470
w2c?
>>
>>108868470
symbolic representation that dipsy 4 flopped...
>>
File: file.png (161 KB, 819x934)
161 KB PNG
>>108866873
>https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16
Straight into the trash
>>
>>108868674
i wonder what HRM 1B would answer to that
>>
local status?
>>
>>108868695
local status usecase?
>>
>>108868695
raped and gaped
>>
>>108868717
giwtwm
>>
So that voice feature isn't coming to 31B?
>>
>>108868674
It's 100% right when it says that December 25 is not an officially confirmed birthday from the creators though
>>
https://nitter.net/xiong_hui_chen/status/2057166364436295748#m
>Waiting for the exact roadmap too. But i think we will release it with high prob. Actually it is not hard for us to create another 27b now and i love the Intellegence density of this model.
Qwen 3.7 27b confirmed
>>
>>108868757
>and i love the Intellegence density of this model.
>but we won't make a 72b anymore
retards
>>
>>108868772
sorry richfag, it is what it is.
>>
>we're going to release a new model every month so we can keep benchmaxxing it
>>
>>108868875
>>108868875
>>108868875
>>
>>108868795
more to do with PR so people keep talking about them i'ld guess
>>
>>108864246
they probably got lost when i posted on 4chan https://pastebin.com/8FRu9XeB
>>
>>108864132
google has like 50 teams all making different competing products, i found it crazy how all of the gemini android stuff they promoted in the talks were generating kotlin/jetpack code despite them having dart/flutter. none of the teams like any of the others projects kek
>>
>>108865181
>400k hentai script experiment
to do what?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.