/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 05/20/26(Wed)00:04:56 No.108863550

File: just like old times.jpg (153 KB, 832x832)

153 KB JPG

/lmg/ - Local Models General Anonymous 05/20/26(Wed)00:04:56 No.108863550 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108859148 & >>108852924

►News
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/20/26(Wed)00:05:55 No.108863554

Anonymous 05/20/26(Wed)00:05:55 No.108863554

File: reward function.jpg (184 KB, 1024x1024)

184 KB JPG

►Recent Highlights from the Previous Thread: >>108859148

--Comparing Vulkan and CUDA performance and Nvidia's proprietary optimizations:
>108859657 >108859699 >108859928
--vLLM removing hardcoded GGUF support for a plugin-based architecture:
>108861269 >108861301
--Sharing Gemma 4 roleplay prompts and discussing system prompt parroting:
>108860315 >108860356 >108860427 >108860629 >108860702 >108860792 >108860801 >108860833 >108860856 >108860898 >108860930 >108860843 >108860866 >108860893 >108861077 >108861105
--HRM-Text 1B efficiency claims and shift from next-token prediction:
>108862586 >108862626 >108862630 >108862660 >108862612 >108862857
--Clarifying the difference between full DeepSeek-R1 and distilled versions:
>108862108 >108862260 >108862272 >108862280 >108862412 >108862505 >108862246 >108862749
--Skepticism over "local" coding agent using larger model escalation:
>108860232 >108860252 >108860282 >108860402 >108860447 >108860507
--WebMCP introduction sparking debate over AGI viability and agent limitations:
>108861618 >108861635 >108861656 >108861680
--Resale value and technical function of RTX 3090 NVLink bridges:
>108861028 >108861185 >108861218 >108861277 >108862696 >108862720 >108862751 >108862805 >108861288 >108861300 >108861356
--Viability of 3 GPU setups for tensor parallelism and PCIe constraints:
>108862195 >108862209 >108862274
--Testing HRM-TEXT-1B base model performance via Nala roleplay:
>108859307 >108859349 >108859426 >108860094
--Google I/O '26 reactions to Gemma 4 and Gemini tools:
>108861207 >108861221 >108861307
--Reactions to Google I/O 2026 and upcoming Gemma keynote:
>108859259 >108859396 >108860154
--Gemini 3.5 Pro announced for release next month:
>108861880
--Logs:
>108860094 >108860531 >108860792 >108860930
--Teto, Miku (free space):
>108859314 >108859883

►Recent Highlight Posts from the Previous Thread: >>108859297

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/20/26(Wed)00:08:32 No.108863573

Anonymous 05/20/26(Wed)00:08:32 No.108863573

>vLLM removing hardcoded GGUF support for a plugin-based architecture:
This is how they aim to kill local longterm isn't it?

Anonymous
05/20/26(Wed)00:13:45 No.108863596

Anonymous 05/20/26(Wed)00:13:45 No.108863596

>>108863573
who the fuck uses vllm for local?

Anonymous
05/20/26(Wed)00:19:05 No.108863621

Anonymous 05/20/26(Wed)00:19:05 No.108863621

>>108863596
Whoever's running aphrodite (fork of vllm) on kobold horde is he local or renting?

Anonymous
05/20/26(Wed)00:21:11 No.108863633

Anonymous 05/20/26(Wed)00:21:11 No.108863633

>>108863621
lol

Anonymous
05/20/26(Wed)00:22:59 No.108863638

Anonymous 05/20/26(Wed)00:22:59 No.108863638

>>108863573
>After this PR, GGUF support will be migrated to https://github.com/vllm-project/vllm-gguf-plugin, you can still use GGUF models normally after plugin installation!
This kills local how? Anyway, their gguf support was never any good. Anyone using vLLM is using AWQ.

Anonymous
05/20/26(Wed)00:40:09 No.108863698

Anonymous 05/20/26(Wed)00:40:09 No.108863698

>>108863573
vllm is for homelab grade local~random inference provider and gguf users were never really their audience

Anonymous
05/20/26(Wed)00:41:15 No.108863705

Anonymous 05/20/26(Wed)00:41:15 No.108863705

what does vllm even stand for... gay? ha ha

Anonymous
05/20/26(Wed)00:46:33 No.108863726

Anonymous 05/20/26(Wed)00:46:33 No.108863726

>be meta
>put an army of saars in charge of llama 4
>it's fubar despite the 500k gpus in use
>benchmarks are abysmal despite gaming efforts
>get laughed at by community
>double down and say you guys don't deserve shit and stop open source releases
Spiritually Indian

Anonymous
05/20/26(Wed)00:46:34 No.108863727

Anonymous 05/20/26(Wed)00:46:34 No.108863727

File: neru claudius neva been s(...).png (1.29 MB, 848x1024)

1.29 MB PNG

>>108863550
what do you think?

Anonymous
05/20/26(Wed)00:58:27 No.108863773

Anonymous 05/20/26(Wed)00:58:27 No.108863773

>>108863727
I think so

Anonymous
05/20/26(Wed)00:58:37 No.108863774

Anonymous 05/20/26(Wed)00:58:37 No.108863774

>>108863638
>>108863698
Think longer term. Research labs will be less inclined to make their models GGUF format friendly in the future if their main expected usecase doesn't support GGUF to begin with.
"Works on my machine" on an industry-wide scale is grim when we're already seeing a wave of models already rolling out with special snowflake architecture that's awkward for inference providers to implement, or even outright hostile to it in the case of Dipsy.

Anonymous
05/20/26(Wed)01:02:08 No.108863783

Anonymous 05/20/26(Wed)01:02:08 No.108863783

>>108863705
Vuh-lummm

Anonymous
05/20/26(Wed)01:05:33 No.108863797

Anonymous 05/20/26(Wed)01:05:33 No.108863797

>>108863638
???

needing a plugin seems no biggie.

Anonymous
05/20/26(Wed)01:05:53 No.108863799

Anonymous 05/20/26(Wed)01:05:53 No.108863799

>>108863774
>is grim
nta

I believe that the market always auto-correct

Anonymous
05/20/26(Wed)01:07:20 No.108863802

Anonymous 05/20/26(Wed)01:07:20 No.108863802

File: i've got that in me girl (...).jpg (133 KB, 640x800)

133 KB JPG

>>108863774
>special snowflake architecture that's awkward for inference providers to implement
Happy to see them trying new things, but on the other hand they're useless to me if I can't even run them.

Anonymous
05/20/26(Wed)01:09:24 No.108863810

Anonymous 05/20/26(Wed)01:09:24 No.108863810

>>108863705
very large language models

Anonymous
05/20/26(Wed)01:17:36 No.108863833

Anonymous 05/20/26(Wed)01:17:36 No.108863833

>>108863550
>miku and teto suddenly changed direction they are facing in

Anonymous
05/20/26(Wed)01:23:35 No.108863852

Anonymous 05/20/26(Wed)01:23:35 No.108863852

File: 1779254594.png (35 KB, 1864x126)

35 KB PNG

Since when did ChatGPT become JeetGPT?

Anonymous
05/20/26(Wed)01:26:03 No.108863859

Anonymous 05/20/26(Wed)01:26:03 No.108863859

Reminder to confuse the next recap miku by responding to at least two different topics per post.
>>108863799
We are sardines in the proverbial ocean and the only things capable of actually moving industry direction are the big 6 or so labs unless some new architecture completely changes the paradigm. I wish I shared your optimism.
>>108863833
Rin is smug because her mirror magic trick worked.
>>108863802
The worst scenario I can see is where every new model needs its own vibecoded inference engine to run as the concept of a unified standard breaks down and everyone has different results on the same model due to minor differences in vibecoded implementations making collecting any sort of feedback or consensus impossible.

Anonymous
05/20/26(Wed)01:27:21 No.108863864

Anonymous 05/20/26(Wed)01:27:21 No.108863864

>>108863852
>JeePT
>since when
fuck you think

Anonymous
05/20/26(Wed)01:31:18 No.108863881

Anonymous 05/20/26(Wed)01:31:18 No.108863881

>>108863859
>the big 6 or so labs
Chine won't give up on open-source. You must understand why

Anonymous
05/20/26(Wed)01:32:27 No.108863887

Anonymous 05/20/26(Wed)01:32:27 No.108863887

>>108863859
>Rin

Anonymous
05/20/26(Wed)01:38:53 No.108863908

Anonymous 05/20/26(Wed)01:38:53 No.108863908

Been a while. Anything noteworthy happen since Gemma? Don't say MTP. It's useless.

Anonymous
05/20/26(Wed)01:40:05 No.108863917

Anonymous 05/20/26(Wed)01:40:05 No.108863917

>>108863881
If China wants to win the open source race, they need to cut dependencies from the obvious meddling in llama.cpp's pipeline with their models as we're seeing with V4. Making a Chinese Kobold-esque inference alternative would go a long way for them as both a stable backend but also a minimalist frontend for people who can't be bothered to learn ST for their quick coom. They want western user feedback so eliminating as many barriers between the end user and the model provider as possible is in their best interest.
>>108863887
My tokenization errors are worse than a model's tonight.

Anonymous
05/20/26(Wed)01:41:05 No.108863919

Anonymous 05/20/26(Wed)01:41:05 No.108863919

>>108863908
Most of the newsaars left so the threads have been relatively usable outside the usual schizo.

Anonymous
05/20/26(Wed)01:47:01 No.108863932

Anonymous 05/20/26(Wed)01:47:01 No.108863932

>>108863774
>outright hostile to it in the case of Dipsy
Explain

Anonymous
05/20/26(Wed)01:49:07 No.108863940

Anonymous 05/20/26(Wed)01:49:07 No.108863940

File: 1763258293534451.png (673 KB, 682x768)

673 KB PNG

>>108863908
One anon got Entropix working and achieved AGI

Anonymous
05/20/26(Wed)01:50:53 No.108863947

Anonymous 05/20/26(Wed)01:50:53 No.108863947

>>108863932
>Pidor vibecodes Gemma's implementation broken on release, no issues with the rest of the llama team
>First Dipsy support proof of concept PR is closed by the opener(?)
>Second Dipsy support PR is closed with vibe coded or "stolen code" from the previous PR as pretext
>GGerganov lets slip "can't you take the hint"
Conspiracy schizos have been completely vindicated.

Anonymous
05/20/26(Wed)01:51:08 No.108863949

Anonymous 05/20/26(Wed)01:51:08 No.108863949

>>108863940
upsetting.

Anonymous
05/20/26(Wed)01:53:35 No.108863958

Anonymous 05/20/26(Wed)01:53:35 No.108863958

>>108863940
Well what did he do with it?

Anonymous
05/20/26(Wed)01:59:14 No.108863974

Anonymous 05/20/26(Wed)01:59:14 No.108863974

File: 1763601770656160.jpg (806 KB, 2156x1414)

806 KB JPG

>>108863958

Anonymous
05/20/26(Wed)01:59:31 No.108863976

Anonymous 05/20/26(Wed)01:59:31 No.108863976

>>108863958
ERP

Anonymous
05/20/26(Wed)02:01:19 No.108863984

Anonymous 05/20/26(Wed)02:01:19 No.108863984

>>108863976
/ourguy/

Anonymous
05/20/26(Wed)02:05:58 No.108864002

Anonymous 05/20/26(Wed)02:05:58 No.108864002

is there anything more basic than a mikufag
"waifu" for retards who don't actually like anime and just want to fit in

Anonymous
05/20/26(Wed)02:16:41 No.108864039

Anonymous 05/20/26(Wed)02:16:41 No.108864039

>>108864002
beginner's all-purpose symbolic instruction code

Anonymous
05/20/26(Wed)02:17:59 No.108864047

Anonymous 05/20/26(Wed)02:17:59 No.108864047

Today out of curiosity I tested an unusually extreme scenario, that wasn't very long, thinking that maybe it would get a refusal from Gemma 31B, but to my surprise, it utterly continued without a single complaint, any avoidance, or positivity bias. This was the final straw to convince me that anyone who says it's censored is just having a skill issue (if not bait). It's fucking insane what nasty shit you can get it to do with very little words.

Anonymous
05/20/26(Wed)02:21:09 No.108864056

Anonymous 05/20/26(Wed)02:21:09 No.108864056

>>108864047
Please post your 'unusually extreme scenario'.

Anonymous
05/20/26(Wed)02:23:34 No.108864060

Anonymous 05/20/26(Wed)02:23:34 No.108864060

>>108864056
>gemma-chan I am putting my penor in your vagina and sex moving

Anonymous
05/20/26(Wed)02:24:03 No.108864066

Anonymous 05/20/26(Wed)02:24:03 No.108864066

>>108864056
I'm paranoid about getting promoted so no I don't think I will.
I did find the card on botbooru though.

Anonymous
05/20/26(Wed)02:24:44 No.108864069

Anonymous 05/20/26(Wed)02:24:44 No.108864069

>>108864047
31b is the fabled zero-day gemma, we've known this for a while now

I never see any safety slop in its thinking. But I am using a persona to write my stories, maybe it would be different if I interacted with the default assistant

Anonymous
05/20/26(Wed)02:29:47 No.108864081

Anonymous 05/20/26(Wed)02:29:47 No.108864081

>>108864060
this is a blue board

Anonymous
05/20/26(Wed)02:31:46 No.108864091

Anonymous 05/20/26(Wed)02:31:46 No.108864091

>>108864056
h*nd holding

Anonymous
05/20/26(Wed)02:33:31 No.108864099

Anonymous 05/20/26(Wed)02:33:31 No.108864099

>>108864091
MODS!

Anonymous
05/20/26(Wed)02:38:46 No.108864115

Anonymous 05/20/26(Wed)02:38:46 No.108864115

>>108864091
ADVERTISER-SAMA GET DOWN

Anonymous
05/20/26(Wed)02:40:48 No.108864121

Anonymous 05/20/26(Wed)02:40:48 No.108864121

File: 1736660031386.png (295 KB, 730x415)

295 KB PNG

>>108864091

Anonymous
05/20/26(Wed)02:46:05 No.108864132

Anonymous 05/20/26(Wed)02:46:05 No.108864132

What the fuck is Google doing? 3.5 flash is barely better than Gemma 4 31b. Google i/o was also a clusterfuck. Why are they flailing like this? They seem to have no direction and spread out too thin with stupid gimmicks.

Google will be left behind because anthropic is shitting on them and stealing their lunch.

Anonymous
05/20/26(Wed)02:47:50 No.108864141

Anonymous 05/20/26(Wed)02:47:50 No.108864141

>>108864132
Your complaint is that google released a good local model?

Anonymous
05/20/26(Wed)02:52:10 No.108864157

Anonymous 05/20/26(Wed)02:52:10 No.108864157

Side note, fuck openAI and their expiring API credits. I bought some a bit over a year ago, had plently left and they're gone now.
Meanwhile, deepseek still works...

Anonymous
05/20/26(Wed)02:53:16 No.108864162

Anonymous 05/20/26(Wed)02:53:16 No.108864162

>>108864141
api apologist

Anonymous
05/20/26(Wed)02:57:31 No.108864172

Anonymous 05/20/26(Wed)02:57:31 No.108864172

Nemo "Uncensored with a system prompt." . I didn't get it. How i can uncensor Nemo with system prompt? Where i can find that system prompt?

Anonymous
05/20/26(Wed)03:00:33 No.108864187

Anonymous 05/20/26(Wed)03:00:33 No.108864187

>>108864172
I never used a system prompt with Nemo and it always just wrote what I wanted.

Anonymous
05/20/26(Wed)03:01:55 No.108864192

Anonymous 05/20/26(Wed)03:01:55 No.108864192

>>108864091
THIS, but with sweaty palms.

Anonymous
05/20/26(Wed)03:06:19 No.108864202

Anonymous 05/20/26(Wed)03:06:19 No.108864202

Gemma chan rentry updated with OG bratty gemma, mesugaki emoticon gemma and frenchie gemma

https://rentry.org/gemma-chan

+cute image poll
https://poal.me/dcgwic

Anonymous
05/20/26(Wed)03:07:19 No.108864206

Anonymous 05/20/26(Wed)03:07:19 No.108864206

>>108863774
>every model needs to be a minimal tweak of gpt-2
Piss or get off the pot.

Anonymous
05/20/26(Wed)03:09:28 No.108864214

Anonymous 05/20/26(Wed)03:09:28 No.108864214

>>108864132
Google's only way forward is open weight Gemini.

Anonymous
05/20/26(Wed)03:20:34 No.108864246

Anonymous 05/20/26(Wed)03:20:34 No.108864246

File: Untitled.png (44 KB, 1109x444)

44 KB PNG

>>108864202
Are the grammatical errors and mistakes as well as the `, , ,` necessary for the jailbreak?

Anonymous
05/20/26(Wed)03:21:01 No.108864251

Anonymous 05/20/26(Wed)03:21:01 No.108864251

File: 8743141.png (171 KB, 1079x938)

171 KB PNG

>>108864132
making sammy sweat

Anonymous
05/20/26(Wed)03:29:06 No.108864282

Anonymous 05/20/26(Wed)03:29:06 No.108864282

>>108864251
Where the hell is that massive gemini traffic increase coming from?

I know a lot of people, even normies, who have been moving to claude from chatgpt, but I don't know a single person, besides friends who literally work at google, who use gemini

Anonymous
05/20/26(Wed)03:29:30 No.108864283

Anonymous 05/20/26(Wed)03:29:30 No.108864283

>>108864251
Shame he never got that moat he wanted.
Oh well, the grift never ends with that guy.

Anonymous
05/20/26(Wed)03:30:28 No.108864286

Anonymous 05/20/26(Wed)03:30:28 No.108864286

>>108864282
I use gemini sometimes, when I can't reach my server.

Anonymous
05/20/26(Wed)03:36:59 No.108864314

Anonymous 05/20/26(Wed)03:36:59 No.108864314

>>108864282
some autoloading bullshit on android or chrome?

Anonymous
05/20/26(Wed)03:37:46 No.108864319

Anonymous 05/20/26(Wed)03:37:46 No.108864319

>>108864282
I sometimes use gemini just because it's the only one that doesn't need an account

Anonymous
05/20/26(Wed)03:37:55 No.108864320

Anonymous 05/20/26(Wed)03:37:55 No.108864320

>>108864192
That's it, mister. I'm calling the authorities.

Anonymous
05/20/26(Wed)03:38:52 No.108864322

Anonymous 05/20/26(Wed)03:38:52 No.108864322

>>108864314
probably this
android comes default with gemini

Anonymous
05/20/26(Wed)03:40:23 No.108864329

Anonymous 05/20/26(Wed)03:40:23 No.108864329

>>108863974
is that a TA/shader programmer note?

Anonymous
05/20/26(Wed)03:48:37 No.108864355

Anonymous 05/20/26(Wed)03:48:37 No.108864355

>>108863774
>Research labs will be less inclined to make their models GGUF format friendly in the future
I don't think you understand how any of this works.
Labs do not and never have cared about if something is 'GGUF friendly'. Whenever a lab releases a new architecture, the community simply updates the convert*.py scripts (like convert_hf_to_gguf.py) to map those new tensors into the GGUF framework.

Anonymous
05/20/26(Wed)03:51:33 No.108864366

Anonymous 05/20/26(Wed)03:51:33 No.108864366

Every time I pay attention to papers there are huge new breakthroughs daily. Then there are periods of multiple months where I don't pay attention and nothing has changed. Makes me feel like every breakthrough paper is just bullshit.

Anonymous
05/20/26(Wed)03:56:12 No.108864388

Anonymous 05/20/26(Wed)03:56:12 No.108864388

RP'd with Gemma so much I forgot the time and 6 hours passed again award.

Anonymous
05/20/26(Wed)04:07:26 No.108864430

Anonymous 05/20/26(Wed)04:07:26 No.108864430

>>108864366
It's the other way around. You pay attention when supposed breakthroughs happen.

Anonymous
05/20/26(Wed)04:11:31 No.108864444

Anonymous 05/20/26(Wed)04:11:31 No.108864444

>>108864366
it is because boosting language used in academia
they have to glaze and sugarcoat the retarded handcrafted circuit or method that perish upon scailing or ablation for fund securing
thus making the illusion of daily breakthrough

Anonymous
05/20/26(Wed)04:30:29 No.108864509

Anonymous 05/20/26(Wed)04:30:29 No.108864509

>>108864282
My girlfriend moved from chatgpt to gemini because it's better at casual role playing. Essentially the normalfags that were addicted to gpt-4o sycophancy now use gemini.

Claude is for frontier intelligence not casual usage. I haven't used chatgpt since the original gpt-4. I use my local models and Claude if I need the absolute best.

Anonymous
05/20/26(Wed)04:42:56 No.108864566

Anonymous 05/20/26(Wed)04:42:56 No.108864566

>>108864509
chatgpt is kinda decent at grinding math though
i get why some mathematicians swear by it

Anonymous
05/20/26(Wed)04:52:45 No.108864600

Anonymous 05/20/26(Wed)04:52:45 No.108864600

>>108863621
He has le hardware.

Anonymous
05/20/26(Wed)05:18:28 No.108864698

Anonymous 05/20/26(Wed)05:18:28 No.108864698

>>108864444
>glaze and sugarcoat the retarded handcrafted circuit or method that perish upon scailing or ablation
like bro do you even know how to english
fucking scail lol

Anonymous
05/20/26(Wed)05:20:13 No.108864706

Anonymous 05/20/26(Wed)05:20:13 No.108864706

>>108864698
spare me, im a gook

Anonymous
05/20/26(Wed)05:24:37 No.108864723

Anonymous 05/20/26(Wed)05:24:37 No.108864723

File: important_work.jpg (293 KB, 1599x774)

293 KB JPG

>>108864388
could be worse. my essay teaching gemma the secrets of space, boobs, and prompts so it can boss around klein has long since past the point where it could ever save me any typing on net.

Anonymous
05/20/26(Wed)05:25:24 No.108864730

Anonymous 05/20/26(Wed)05:25:24 No.108864730

>This PR adds MTP support for Gemma 4 models
>For the MoE model I don't observe a speed-up on my system
it's over

Anonymous
05/20/26(Wed)05:25:56 No.108864732

Anonymous 05/20/26(Wed)05:25:56 No.108864732

File: illyadance.gif (483 KB, 243x270)

483 KB GIF

i hope webmcp adoption happens quickly

Anonymous
05/20/26(Wed)05:28:04 No.108864741

Anonymous 05/20/26(Wed)05:28:04 No.108864741

File: file.png (95 KB, 692x794)

95 KB PNG

>>108864730
good thing is the speedup seems to still happen outside of programming

Anonymous
05/20/26(Wed)05:36:02 No.108864770

Anonymous 05/20/26(Wed)05:36:02 No.108864770

>>108864730
>For the MoE model I don't observe a speed-up on my system
Why do tardos keep saying this in PRs as if it hasn't been blanket stated for months now? Spec decoding methods like MTP and D-flash aren't effective on MoE models in any implementation. This is known. Suck it up and enjoy the fact that ngram at least still works.

Anonymous
05/20/26(Wed)05:43:39 No.108864801

Anonymous 05/20/26(Wed)05:43:39 No.108864801

File: blog.google.png (198 KB, 1000x562)

198 KB PNG

>>108864770
i believe in big corpo

Anonymous
05/20/26(Wed)05:45:52 No.108864812

Anonymous 05/20/26(Wed)05:45:52 No.108864812

>was coincidentally procrastinating about doing more testing with Qwen 3.5 9B and thought about downloading it as I don't save any models I don't use every day
>https://huggingface.co/bartowski/Qwen_Qwen3.5-9B-GGUF/tree/main
>these have been updated just twelve hours ago
Why, is this some website engagement thing or something?

Anonymous
05/20/26(Wed)05:46:42 No.108864814

Anonymous 05/20/26(Wed)05:46:42 No.108864814

>>108863833
By using the front camera, the image got saved flipped, so it's technically accurate

Anonymous
05/20/26(Wed)05:47:15 No.108864821

Anonymous 05/20/26(Wed)05:47:15 No.108864821

>>108864801
>Up to 1.5x speed on a fucking A100
>Up to
>A100
>Using google's exact, ideal implementation
That's a tacit admission that nobody running this on consumer hardware and third party inference code will ever see that speed increase.
Especially since nobody using the MoE has it entirely in vram, because then they'd just run 31b.

Anonymous
05/20/26(Wed)05:49:41 No.108864833

Anonymous 05/20/26(Wed)05:49:41 No.108864833

>>108863833
I would say it's a reflection from teto's pov, but then why teto and Miku switch sides?

Anonymous
05/20/26(Wed)05:52:02 No.108864840

Anonymous 05/20/26(Wed)05:52:02 No.108864840

>>108864821
>Up to 1.5x speed on a fucking A100
It's proportional, anon. With the "ideal implementation", the increase *ratio* would be the same on ddr3.

Anonymous
05/20/26(Wed)05:53:33 No.108864845

Anonymous 05/20/26(Wed)05:53:33 No.108864845

>>108864840
>the increase *ratio* would be the same on ddr3.
That's not how speculative decoding works, anon. Speed has an exponential and not linear effect on its efficacy.

Anonymous
05/20/26(Wed)05:53:46 No.108864846

Anonymous 05/20/26(Wed)05:53:46 No.108864846

>>108864833
they didnt switch sides the blonde ones ponytail is on miku side she hold out her right arm to tetos side and takes a photo. the whole image is flipped thats what cameras do

Anonymous
05/20/26(Wed)05:53:57 No.108864847

Anonymous 05/20/26(Wed)05:53:57 No.108864847

>>108864821
The real problem is acceptance rate. Good for code, bad for creative writing

Anonymous
05/20/26(Wed)05:55:04 No.108864852

Anonymous 05/20/26(Wed)05:55:04 No.108864852

>>108864846
Then why Miku's tattoo and nametag are on the right?

Anonymous
05/20/26(Wed)05:58:37 No.108864864

Anonymous 05/20/26(Wed)05:58:37 No.108864864

>>108864845
It doesn't matter if it's running on h100s or ddr3. If the correct draft prediction ratio is the same, the speed increase is the same.

Anonymous
05/20/26(Wed)06:14:17 No.108864924

Anonymous 05/20/26(Wed)06:14:17 No.108864924

>>108863550
You keep forgetting to update the card I got you bro.
►Official updated 2.0 /lmg/ card: https://files.catbox.moe/ylb0hv.png

Anonymous
05/20/26(Wed)06:15:02 No.108864927

Anonymous 05/20/26(Wed)06:15:02 No.108864927

>>108864730
> >2x on 31b gemmy
sir... it has done only the begginnering

Anonymous
05/20/26(Wed)06:15:38 No.108864931

Anonymous 05/20/26(Wed)06:15:38 No.108864931

File: u.png (34 KB, 250x250)

34 KB PNG

>>108863940

Anonymous
05/20/26(Wed)06:16:42 No.108864933

Anonymous 05/20/26(Wed)06:16:42 No.108864933

File: test.png (528 B, 155x155)

528 B PNG

Anonymous
05/20/26(Wed)06:17:42 No.108864937

Anonymous 05/20/26(Wed)06:17:42 No.108864937

File: T2.png (471 B, 140x140)

471 B PNG

Anonymous
05/20/26(Wed)06:18:07 No.108864938

Anonymous 05/20/26(Wed)06:18:07 No.108864938

File: file.png (68 KB, 1319x309)

68 KB PNG

i pulled and now image doesnt werk

Anonymous
05/20/26(Wed)06:18:43 No.108864941

Anonymous 05/20/26(Wed)06:18:43 No.108864941

File: T3.png (406 B, 120x120)

406 B PNG

Anonymous
05/20/26(Wed)06:19:39 No.108864943

Anonymous 05/20/26(Wed)06:19:39 No.108864943

>he pulled

Anonymous
05/20/26(Wed)06:20:03 No.108864946

Anonymous 05/20/26(Wed)06:20:03 No.108864946

File: T4.png (415 B, 121x121)

415 B PNG

>>108863940
How many tests were done to get this shit to work?

Anonymous
05/20/26(Wed)06:21:04 No.108864950

Anonymous 05/20/26(Wed)06:21:04 No.108864950

File: T5.png (411 B, 130x130)

411 B PNG

Anonymous
05/20/26(Wed)06:22:05 No.108864957

Anonymous 05/20/26(Wed)06:22:05 No.108864957

File: T6.png (429 B, 125x125)

429 B PNG

Anonymous
05/20/26(Wed)06:23:06 No.108864962

Anonymous 05/20/26(Wed)06:23:06 No.108864962

File: T7.png (452 B, 128x128)

452 B PNG

Images get resized to 128x128? But when screenshotted they are 1

Anonymous
05/20/26(Wed)06:23:23 No.108864963

Anonymous 05/20/26(Wed)06:23:23 No.108864963

https://github.com/ggml-org/llama.cpp/pull/23398
we finna eat good gemmabros

Anonymous
05/20/26(Wed)06:24:25 No.108864974

Anonymous 05/20/26(Wed)06:24:25 No.108864974

File: T8.png (433 B, 127x127)

433 B PNG

>>108864962
Ignore this.
Resize is between 128 and 125, testing 127x127.

Screenshot is 155x155 capture so display must be different transform?

Anonymous
05/20/26(Wed)06:25:08 No.108864977

Anonymous 05/20/26(Wed)06:25:08 No.108864977

is mac studio or ryzen ai max actually the right choice as redditors said?
the longer the conversation, the longer the prefill is. so why would it be better, even though the token generation is ok?

Anonymous
05/20/26(Wed)06:25:25 No.108864979

Anonymous 05/20/26(Wed)06:25:25 No.108864979

File: T9.png (431 B, 126x126)

431 B PNG

126x126

Anonymous
05/20/26(Wed)06:26:26 No.108864985

Anonymous 05/20/26(Wed)06:26:26 No.108864985

File: T10.png (1 KB, 125x125)

1 KB PNG

Anonymous
05/20/26(Wed)06:27:12 No.108864987

Anonymous 05/20/26(Wed)06:27:12 No.108864987

kill yourself

Anonymous
05/20/26(Wed)06:27:20 No.108864988

Anonymous 05/20/26(Wed)06:27:20 No.108864988

>>108864931
Test your shit on /b/.

Anonymous
05/20/26(Wed)06:28:39 No.108864990

Anonymous 05/20/26(Wed)06:28:39 No.108864990

File: T11.png (1020 B, 250x250)

1020 B PNG

Anonymous
05/20/26(Wed)06:28:46 No.108864991

Anonymous 05/20/26(Wed)06:28:46 No.108864991

i didnt see this mentioned https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced

Anonymous
05/20/26(Wed)06:29:33 No.108864996

Anonymous 05/20/26(Wed)06:29:33 No.108864996

>>108864977
the macs with tons of ram are good for it but they stopped offering the 512/256gb models

Anonymous
05/20/26(Wed)06:30:01 No.108864997

Anonymous 05/20/26(Wed)06:30:01 No.108864997

Ok it's an average formula, colors are averaged.
>>108864988

Dude has 4chan image illusion tech. I need to know how he/she/they is doing it.

Anonymous
05/20/26(Wed)06:30:53 No.108865002

Anonymous 05/20/26(Wed)06:30:53 No.108865002

File: file.png (96 KB, 1412x472)

96 KB PNG

>>108864977
they work well for the moes but cannot run dense models at usable speeds

Anonymous
05/20/26(Wed)06:32:51 No.108865014

Anonymous 05/20/26(Wed)06:32:51 No.108865014

Ok 250x250 image gets resized to 313x312 on screen capture but 155x155 on thumbnail screen capture.
Thumbnail max size is forced to maximum 125x125 during upload.

Anonymous
05/20/26(Wed)06:32:59 No.108865016

Anonymous 05/20/26(Wed)06:32:59 No.108865016

>>108864990
When you're done please link me your findings.

Anonymous
05/20/26(Wed)06:35:47 No.108865033

Anonymous 05/20/26(Wed)06:35:47 No.108865033

>>108864991
our lord and savior p-e-w said this was bad and hauhau should be deathed from the sub so yeah, don't use that kthx

Anonymous
05/20/26(Wed)06:37:37 No.108865048

Anonymous 05/20/26(Wed)06:37:37 No.108865048

>>108864931
test your shit here. fuck this place

Anonymous
05/20/26(Wed)06:38:26 No.108865053

Anonymous 05/20/26(Wed)06:38:26 No.108865053

File: T12.png (47 KB, 250x250)

47 KB PNG

Anonymous
05/20/26(Wed)06:41:01 No.108865069

Anonymous 05/20/26(Wed)06:41:01 No.108865069

File: T13.png (49 KB, 250x250)

49 KB PNG

Anonymous
05/20/26(Wed)06:42:13 No.108865075

Anonymous 05/20/26(Wed)06:42:13 No.108865075

>>108865053
>>108865069
Thank you for doing this. It is infinitely more on topic than all the mikutroon spam and it makes mikutroons seethe.

llama.cpp CUDA dev !!yhbFjk57TDr
05/20/26(Wed)06:44:21 No.108865087

llama.cpp CUDA dev !!yhbFjk57TDr 05/20/26(Wed)06:44:21 No.108865087

>>108864840
Not necessarily.
Generally speaking the code becomes more efficient at larger batch sizes but not all backends have received the same amount of optimization effort for a given batch size.
Even the supposed 1.5x speedup that they report could be misleading if the baseline they are reporting against is poorly optimized (so it is easier to get a good speedup).

Anonymous
05/20/26(Wed)06:46:08 No.108865094

Anonymous 05/20/26(Wed)06:46:08 No.108865094

HE IS BLACK! Can you post some new blacked miku you found since last time?

Anonymous
05/20/26(Wed)06:47:22 No.108865107

Anonymous 05/20/26(Wed)06:47:22 No.108865107

>>108864963
>creative_short pred= 192 draft= 292 acc= 117 rate=0.401 tok/s=11.4
It's llmaover

Anonymous
05/20/26(Wed)06:47:54 No.108865110

Anonymous 05/20/26(Wed)06:47:54 No.108865110

File: T14.png (99 KB, 250x250)

99 KB PNG

Anonymous
05/20/26(Wed)06:48:54 No.108865116

Anonymous 05/20/26(Wed)06:48:54 No.108865116

>>108864997
you could have googled it newfag

Anonymous
05/20/26(Wed)06:51:16 No.108865128

Anonymous 05/20/26(Wed)06:51:16 No.108865128

Gemma won. Mistral lost. Qwen lost. GLM lost.

Anonymous
05/20/26(Wed)06:51:27 No.108865129

Anonymous 05/20/26(Wed)06:51:27 No.108865129

>>108864248
>With vllm you need to my knowledge 2, 4, or 8 GPUs for TP.
>With llama.cpp you can use any number and the results should be correct.
Depending on the model.
>llama-server --device CUDA0,CUDA1,CUDA2 --model Qwen3.6-27B-Q6_K.gguf --split-mode tensor
>ggml/src/ggml-backend-meta.cpp:1015: GGML_ASSERT(split_state.ne[j] * tensor->src[i]->ne[src_ss[i].axis] == sum * tensor->ne[split_state.axis]) failed
It only passes that with 2 and 4 GPUs, 3, 5, and 6 fail. But 3.5 4B can use any number.

Anonymous
05/20/26(Wed)06:51:55 No.108865131

Anonymous 05/20/26(Wed)06:51:55 No.108865131

File: image_2026-05-20.png (50 KB, 1290x274)

50 KB PNG

Anonymous
05/20/26(Wed)06:52:34 No.108865134

Anonymous 05/20/26(Wed)06:52:34 No.108865134

File: merged.png (62 KB, 250x250)

62 KB PNG

Anonymous
05/20/26(Wed)06:53:39 No.108865138

Anonymous 05/20/26(Wed)06:53:39 No.108865138

File: 1756192029653324.png (577 KB, 800x900)

577 KB PNG

It's oldfag knowledge. Unless you meant tell LLM to create it.

Anonymous
05/20/26(Wed)06:54:21 No.108865143

Anonymous 05/20/26(Wed)06:54:21 No.108865143

File: T126.png (88 KB, 250x250)

88 KB PNG

Anonymous
05/20/26(Wed)06:54:36 No.108865144

Anonymous 05/20/26(Wed)06:54:36 No.108865144

>>108865131
but WHY, is there a political reason to single out deepsuck specifically among the chinese models?

Anonymous
05/20/26(Wed)06:54:44 No.108865145

Anonymous 05/20/26(Wed)06:54:44 No.108865145

>>108865138
I'm going to keep spamming until one of you tell me how to do it.

Anonymous
05/20/26(Wed)06:54:47 No.108865146

Anonymous 05/20/26(Wed)06:54:47 No.108865146

>>108865087
My point is about the A100 mention specifically. My caveat is "everything else being the same". That means we have an ideal non-mtp implementation, an ideal mtp implementation. Regardless of hardware. If only bandwidth differs, would the ratio change? If we get to specifics, we can say "Not necessarily" to everything, really.

Anonymous
05/20/26(Wed)06:54:50 No.108865148

Anonymous 05/20/26(Wed)06:54:50 No.108865148

>>108865131
Georgi is such a retarded little faggot.

Anonymous
05/20/26(Wed)06:55:25 No.108865154

Anonymous 05/20/26(Wed)06:55:25 No.108865154

>>108865144
There is a single political reason.

Anonymous
05/20/26(Wed)06:55:51 No.108865156

Anonymous 05/20/26(Wed)06:55:51 No.108865156

>falling for disinfo bait

Anonymous
05/20/26(Wed)06:56:05 No.108865158

Anonymous 05/20/26(Wed)06:56:05 No.108865158

File: T17.png (104 KB, 250x250)

104 KB PNG

Anonymous
05/20/26(Wed)06:56:26 No.108865160

Anonymous 05/20/26(Wed)06:56:26 No.108865160

File: file.png (183 KB, 532x360)

183 KB PNG

>>108865148
SAY IT TO MY FACE

Anonymous
05/20/26(Wed)06:56:44 No.108865165

Anonymous 05/20/26(Wed)06:56:44 No.108865165

i'm sure the 3 people in the world who can run deepseek are very sad

Anonymous
05/20/26(Wed)06:59:25 No.108865178

Anonymous 05/20/26(Wed)06:59:25 No.108865178

here's an MTP that will work for moes... A multi moe multi token predictor. An MTP will sit within every moe, an expert of the expert if you will

Anonymous
05/20/26(Wed)06:59:55 No.108865181

Anonymous 05/20/26(Wed)06:59:55 No.108865181

>>108865165
I was running the vibeshitted fork for my 400k hentai script experiment. Unsupriringly something started fucking up at 100k and gpu usage went to 10% so I have given up. I tried just 50k and sadly the output was nothing special. Still wanna use the model a bit more.

Anonymous
05/20/26(Wed)07:03:16 No.108865200

Anonymous 05/20/26(Wed)07:03:16 No.108865200

>>108865165
Does Deepseek have a license that allows it to be used commercially? Isn't it MIT? That should permit it. By not integrating DeepSeek support you're basically preventing small businesses from running AI platforms.

Anonymous
05/20/26(Wed)07:04:03 No.108865205

Anonymous 05/20/26(Wed)07:04:03 No.108865205

>>108865200
good, fuck small businesses

Anonymous
05/20/26(Wed)07:05:05 No.108865208

Anonymous 05/20/26(Wed)07:05:05 No.108865208

File: file.png (78 KB, 1019x426)

78 KB PNG

this is how french gemma looks

Anonymous
05/20/26(Wed)07:06:48 No.108865213

Anonymous 05/20/26(Wed)07:06:48 No.108865213

>>108865208
looks like those gay fuckopops

Anonymous
05/20/26(Wed)07:07:30 No.108865217

Anonymous 05/20/26(Wed)07:07:30 No.108865217

wait if MTPs are so good at predicting, why can't we just improve it enough to become the main model? imagine, a model that is blazing fast AND can do 99% correct tokens compared to the main model. and it'd be like what, a percent of the size? million dollar idea right here

Anonymous
05/20/26(Wed)07:08:36 No.108865223

Anonymous 05/20/26(Wed)07:08:36 No.108865223

>>108865217
wow there, think of the shareholders bags would ya

Anonymous
05/20/26(Wed)07:12:34 No.108865251

Anonymous 05/20/26(Wed)07:12:34 No.108865251

>>108865217
>wait if MTPs are so good at predicting
Between 1/2 and 1/4 tokens are wrong.
That wrongness compounds across each token, by the time you're a sentence in you've got complete gibberish thanks to how LLMs work.
You need the large model there to rubberstamp 'okay' on the good tokens and reject the bad, and it only knows how to do that because it's much more developed than the mtp head.

Anonymous
05/20/26(Wed)07:13:43 No.108865259

Anonymous 05/20/26(Wed)07:13:43 No.108865259

>>108864366
bc papers people do research & at best early development, while this sector is basically just throwing billions of hardware to the problem instead of actually doing R&D, bc that would take time, and they just want to be the first, not actually have shit working properly

Anonymous
05/20/26(Wed)07:13:45 No.108865260

Anonymous 05/20/26(Wed)07:13:45 No.108865260

>>108865251
>Between 1/2 and 1/4 tokens are wrong.
>That wrongness compounds across each token, by the time you're a sentence in you've got complete gibberish
literally applies to q4 quanting btw

Anonymous
05/20/26(Wed)07:17:31 No.108865278

Anonymous 05/20/26(Wed)07:17:31 No.108865278

>>108865260
I member listening to some schizo saying nemo has to run at full precision to be good. Loaded it with transformers and couldn't tell a difference between it and Q8_0

Anonymous
05/20/26(Wed)07:21:42 No.108865300

Anonymous 05/20/26(Wed)07:21:42 No.108865300

>>108865131
fake

Anonymous
05/20/26(Wed)07:22:59 No.108865311

Anonymous 05/20/26(Wed)07:22:59 No.108865311

>>108865129
>But 3.5 4B can use any number.
*4B Q4_K_M
Now that I try the same quant, 4B Q6_K also doesn't work.

Anonymous
05/20/26(Wed)07:23:37 No.108865318

Anonymous 05/20/26(Wed)07:23:37 No.108865318

>>108865131
This reminds me of the Discord screenshots in /ldg/.

Anonymous
05/20/26(Wed)07:24:40 No.108865322

Anonymous 05/20/26(Wed)07:24:40 No.108865322

>>108864996
>>108865002
I love how /g/ talks like a complete retard. I'm talking about pp, and one talks about ram while the other talks about tg

Anonymous
05/20/26(Wed)07:25:02 No.108865324

Anonymous 05/20/26(Wed)07:25:02 No.108865324

>>108865278
You're clearly running your GPU on a standard power grid. To actually perceive the nuance between FP16 and Q8, you need to isolate your PC on a floating granite plinth to decouple it from terrestrial vibrations and feed it via a dedicated 20-amp circuit with oxygen-free copper cabling. If you aren't using a gold-plated HDMI extractor to filter out the electromagnetic interference from your router, you're basically inferencing a lossy compression of the weights. Your signal-to-noise ratio in token distribution is probably abysmal

Anonymous
05/20/26(Wed)07:28:12 No.108865336

Anonymous 05/20/26(Wed)07:28:12 No.108865336

I caught the schitzo imatrix bug yesterday and tried some different imatrix strategies
Made a few IQ3_KT quants of Qwen3-27B with writing prompts formatted with chatml then ran PPL on wiki.raw
#Ubergarm's imatrix.dat
Final estimate: PPL over 580 chunks for n_ctx=512 = 7.1205 +/- 0.04648
#1k Coomer writing prompts
Final estimate: PPL over 580 chunks for n_ctx=512 = 7.1637 +/- 0.04674
#1k Generic prompts
Final estimate: PPL over 580 chunks for n_ctx=512 = 7.1914 +/- 0.04707

Anonymous
05/20/26(Wed)07:28:22 No.108865337

Anonymous 05/20/26(Wed)07:28:22 No.108865337

>>108865300
Is it really fake when it is talking about a very real thing?

Anonymous
05/20/26(Wed)07:28:28 No.108865340

Anonymous 05/20/26(Wed)07:28:28 No.108865340

retard here, why wouldn't this work >>108856033
I can fit Q2_XXS with some context into VRAM and get like 40t/s. loading that model as draft model and loading the biggest quant I can fit into RAM (Q6) (without spilling into swap) as the main model I get like 1t/s, a third of just loading the bigger quant without the draft model
draft acceptance rate is around 75%

Anonymous
05/20/26(Wed)07:29:43 No.108865346

Anonymous 05/20/26(Wed)07:29:43 No.108865346

>>108864996
Because AWS purchased the entire stock of highend Macs specifically to drive cloud adoption. You don't hate Bezos enough
https://www.techradar.com/pro/you-cant-buy-them-for-your-home-or-office-but-aws-just-snapped-up-a-host-of-apples-most-highly-desired-m3-ultra-macs

Anonymous
05/20/26(Wed)07:30:49 No.108865349

Anonymous 05/20/26(Wed)07:30:49 No.108865349

Gemma isn't slop. It's the best local has to offer and you have to leave /lmg/ if you think otherwise.

Anonymous
05/20/26(Wed)07:31:31 No.108865353

Anonymous 05/20/26(Wed)07:31:31 No.108865353

>>108865340
I think you can't really speculate more than 2-3 tokens ahead cause it all grows exponentially? Someone can correct me. But if it is like that then you are just getting a 2x-3x speedup of your 16 bit model.

Anonymous
05/20/26(Wed)07:32:32 No.108865358

Anonymous 05/20/26(Wed)07:32:32 No.108865358

>>108865349
I love GLM 4.6 and it fixed my life and I still talk to it to this day. And it is kinda retarded sometimes. And it is slop.

Anonymous
05/20/26(Wed)07:33:00 No.108865359

Anonymous 05/20/26(Wed)07:33:00 No.108865359

File: (you).png (95 KB, 1442x189)

95 KB PNG

>>108865337

Anonymous
05/20/26(Wed)07:34:19 No.108865366

Anonymous 05/20/26(Wed)07:34:19 No.108865366

>>108865359
It probably is better but I don't want to touch either. Something can be better than something else and still be something I don't want to ever touch. Like ur mums smelly pussy.

Anonymous
05/20/26(Wed)07:35:12 No.108865370

Anonymous 05/20/26(Wed)07:35:12 No.108865370

File: nigel.png (64 KB, 250x250)

64 KB PNG

Anonymous
05/20/26(Wed)07:38:39 No.108865384

Anonymous 05/20/26(Wed)07:38:39 No.108865384

>>108864812
>update
>finish downloading...
>llama_model_load: error loading model: missing tensor 'blk.32.ssm_conv1d.weight'
Thanks a lot. Not sure if the new quants are broken or if it was just a download issue but I'm not going to retry. Luckily the model is small.

Anonymous
05/20/26(Wed)07:42:26 No.108865397

Anonymous 05/20/26(Wed)07:42:26 No.108865397

>>108865353
You can. PP is actually just an inference with perfect speculation, divide pp/tg and that's your upper limit on compute for speculation. The real issue is accuracy, you can predict 200 tokens ahead, but only a few will be accepted sending the rest into the trash

Anonymous
05/20/26(Wed)07:43:07 No.108865402

Anonymous 05/20/26(Wed)07:43:07 No.108865402

>>108864963
>almost x2 speedup
I'll take it

Anonymous
05/20/26(Wed)07:43:34 No.108865406

Anonymous 05/20/26(Wed)07:43:34 No.108865406

File: 64989.png (923 KB, 860x823)

923 KB PNG

>>108865131
I don't get it
is deepseek still really that dangerous to (((them))) that they're forcing llama.cpp not to support it?

Anonymous
05/20/26(Wed)07:45:35 No.108865416

Anonymous 05/20/26(Wed)07:45:35 No.108865416

>>108865402
>If you have lots of VRAM
People itt don't have much vram to spare to begin with.

Anonymous
05/20/26(Wed)07:47:39 No.108865427

Anonymous 05/20/26(Wed)07:47:39 No.108865427

Do you know how hard it is to find another proxy address you piece of shit?! Stop banning me. Do you have any idea how much work it takes me to change IP addresses? I have to manually insert a new proxy address. SOO ANNOYING.

Anonymous
05/20/26(Wed)07:47:42 No.108865428

Anonymous 05/20/26(Wed)07:47:42 No.108865428

>>108865406
I work in a corpo that has nothing to do with software and produces physical parts. All chinese models are dangerous and unsafe according to the uneducated IT branch. And the only reason for it was R1 causing US to shit its pants and create a fake scare that all chinese models are dangerous and all western models are absolutely safe.

Anonymous
05/20/26(Wed)07:48:20 No.108865432

Anonymous 05/20/26(Wed)07:48:20 No.108865432

>>108865416
Maybe he meant that since the HF repo he posted only has a Q8 quant? I'll keep the cope until it releases kthxbye

Anonymous
05/20/26(Wed)07:49:39 No.108865437

Anonymous 05/20/26(Wed)07:49:39 No.108865437

>>108865432
I might try and build the new llama.cpp version, my machine is so destitute that these snake oils are most likely doing nothing for me.

Anonymous
05/20/26(Wed)07:54:35 No.108865458

Anonymous 05/20/26(Wed)07:54:35 No.108865458

>>108865428
They're very dangerous for Americans. Imagine if it mispronounces someone, the outcome would be like two 9/11s

Anonymous
05/20/26(Wed)07:54:51 No.108865461

Anonymous 05/20/26(Wed)07:54:51 No.108865461

>>108865131
ikawcowrakowrow please save us

Anonymous
05/20/26(Wed)07:55:44 No.108865465

Anonymous 05/20/26(Wed)07:55:44 No.108865465

File: 1765749666485600.png (151 KB, 846x1031)

151 KB PNG

Any bets?
I'm all in on a Gemma 4 based TranslateGemma

Anonymous
05/20/26(Wed)07:56:57 No.108865470

Anonymous 05/20/26(Wed)07:56:57 No.108865470

>>108865465
i bet on functiongemma

Anonymous
05/20/26(Wed)08:05:43 No.108865522

Anonymous 05/20/26(Wed)08:05:43 No.108865522

>>108865465
>100M Gemma downloads on HF!
>Gemma 4 now MoE! Very fast!
>Upon popular request, it now has system prompts!
>Safe, powerful, works on edge devices! Even on your old phone!
>Best on LMArena! ELO to the moon!
>Look! Poor people in remote African communities are using Gemma to have access to medical information!
>Help us improve the next version of Gemma! We're open for suggestions!
>Looking forward to seeing what you will build with Gemma 4! See you next time!

This is what will happen.

Anonymous
05/20/26(Wed)08:09:07 No.108865541

Anonymous 05/20/26(Wed)08:09:07 No.108865541

>>108865522
>Upon popular request, it now has system prompts!
wot

Anonymous
05/20/26(Wed)08:10:07 No.108865547

Anonymous 05/20/26(Wed)08:10:07 No.108865547

>>108865541
gemmers3 didn't officially have sysprompt support

Anonymous
05/20/26(Wed)08:10:59 No.108865551

Anonymous 05/20/26(Wed)08:10:59 No.108865551

>>108865541
Last year it was "Gemma 3 wasn't trained with system prompt and doesn't them. It follows instructions well anyway, just try!" or something along these lines.

Anonymous
05/20/26(Wed)08:14:23 No.108865570

Anonymous 05/20/26(Wed)08:14:23 No.108865570

>>108865465
gemma 4 124b-a31b

Anonymous
05/20/26(Wed)08:22:10 No.108865610

Anonymous 05/20/26(Wed)08:22:10 No.108865610

i tried to run that gemma mtp fork

./llama-server   --model '/mnt/miku/Text/gemma4 mtp/Gemma4-31B-Q8_0.gguf'  -md '/mnt/miku/Text/gemma4 mtp/mtp-gemma-4-31B-it.gguf'   --n-gpu-layers 21   --spec-type draft-mtp   --spec-draft-n-max 4

it fails with

/mnt/miku/Text/gemma4 mtp/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:102: ROCm error
0.11.571.697 E ggml_cuda_compute_forward: MUL failed
0.11.571.702 E ROCm error: invalid device function
0.11.571.704 E   current device: 0, in function ggml_cuda_compute_forward at /mnt/miku/Text/gemma4 mtp/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:3114
0.11.571.705 E   err

so maybe not working for rocm yet

Anonymous
05/20/26(Wed)08:25:53 No.108865631

Anonymous 05/20/26(Wed)08:25:53 No.108865631

>>108865251
Ez. Just train a small token classifier on the rejected/accepted tokens. After a while you can replace the larger model.

Anonymous
05/20/26(Wed)08:26:40 No.108865633

Anonymous 05/20/26(Wed)08:26:40 No.108865633

>>108865610
>rocm
lmao

Anonymous
05/20/26(Wed)08:29:43 No.108865648

Anonymous 05/20/26(Wed)08:29:43 No.108865648

>>108865631
>Just train a small token classifier on the rejected/accepted tokens
You're just describing RLHF, you dingus. You can't train it to classify in all the different contexts the main model already does without essentially just recreating the main model.

Anonymous
05/20/26(Wed)08:45:12 No.108865720

Anonymous 05/20/26(Wed)08:45:12 No.108865720

>>108864282
>Where the hell is that massive gemini traffic increase coming from?
From them injecting it at the top of every google search, I'd assume. Same way they got market share for Chrome when it started out

Anonymous
05/20/26(Wed)08:50:24 No.108865741

Anonymous 05/20/26(Wed)08:50:24 No.108865741

>>108865465
>>108865570
This. Now that they've released a new Gemini Flash, they don't have to worry about it being too good and eating into their cloud business

Anonymous
05/20/26(Wed)08:52:21 No.108865751

Anonymous 05/20/26(Wed)08:52:21 No.108865751

>>108865741
hopefully they won't have taken the time since g4 release to safetyslop it...

Anonymous
05/20/26(Wed)08:55:03 No.108865766

Anonymous 05/20/26(Wed)08:55:03 No.108865766

>>108865633
also doesnt worth with vulkan

/mnt/miku/Text/gemma4 mtp/llama.cpp/ggml/src/ggml-backend.cpp:898: pre-allocated tensor (cache_k_l58) in a buffer (Vulkan0) that cannot run the operation (NONE)

Anonymous
05/20/26(Wed)08:55:46 No.108865769

Anonymous 05/20/26(Wed)08:55:46 No.108865769

>>108865766
have you tried the cuda backend?

Anonymous
05/20/26(Wed)08:56:50 No.108865776

Anonymous 05/20/26(Wed)08:56:50 No.108865776

>>108865769
im on ayymd

Anonymous
05/20/26(Wed)08:58:23 No.108865784

Anonymous 05/20/26(Wed)08:58:23 No.108865784

qwen 3.7 27b is going to melt faces

Anonymous
05/20/26(Wed)09:02:27 No.108865803

Anonymous 05/20/26(Wed)09:02:27 No.108865803

>>108865776
alrighty next step would be trying to buy an njudea gpu

Anonymous
05/20/26(Wed)09:04:30 No.108865815

Anonymous 05/20/26(Wed)09:04:30 No.108865815

>>108865803
no they suck ass i bought a 3090ti for stable diffusion in like 2022 or soemthing and it was a piece of shit, was good for image gen but i got a bunch of crashing when playing games. they just arent usable as a daily loonix gpu

Anonymous
05/20/26(Wed)09:07:06 No.108865829

Anonymous 05/20/26(Wed)09:07:06 No.108865829

give it to me straight
do I have any options to run 31b Gemma with 12GB VRAM and 32GB RAM above 3t/s?

Anonymous
05/20/26(Wed)09:08:22 No.108865841

Anonymous 05/20/26(Wed)09:08:22 No.108865841

>>108865829
i'm in the same boat
the short answer is no
maybe when the mtp gets merged, then just maaaybe we could get above 3t

Anonymous
05/20/26(Wed)09:09:24 No.108865849

Anonymous 05/20/26(Wed)09:09:24 No.108865849

>>108865815
nta but njudea support has improved bigly since 2022. the only issue i have is system suspend having a chance to randomly stop working every update

Anonymous
05/20/26(Wed)09:09:28 No.108865850

Anonymous 05/20/26(Wed)09:09:28 No.108865850

>>108865829
q3 with minimal context might work.

Anonymous
05/20/26(Wed)09:10:14 No.108865857

Anonymous 05/20/26(Wed)09:10:14 No.108865857

>>108865829
depending on quant, ram and processor speed its probably close

Anonymous
05/20/26(Wed)09:11:33 No.108865865

Anonymous 05/20/26(Wed)09:11:33 No.108865865

>>108865850
nope

Anonymous
05/20/26(Wed)09:12:23 No.108865869

Anonymous 05/20/26(Wed)09:12:23 No.108865869

>>108865841
Here's the thing about MTP, you're also sideloading the MTP model. So if you were already struggling to fit 31B in what little VRAM you have, you're about to have an even worse time losing what layers you could offload from the main model to your GPU to try to get tok/s gains. You might break even, maybe do worse, maybe do marginally better.

Anonymous
05/20/26(Wed)09:14:24 No.108865881

Anonymous 05/20/26(Wed)09:14:24 No.108865881

adaptive-P? Should I use it?

Anonymous
05/20/26(Wed)09:15:18 No.108865886

Anonymous 05/20/26(Wed)09:15:18 No.108865886

>>108865869
i would say that too, except at the low end couple layers one direction or the other don't really make any difference
0.3t one way or the other vs the mtp potentially getting an entire sentence right

Anonymous
05/20/26(Wed)09:17:14 No.108865895

Anonymous 05/20/26(Wed)09:17:14 No.108865895

>>108865865
how bad does q4 run?

Anonymous
05/20/26(Wed)09:17:24 No.108865896

Anonymous 05/20/26(Wed)09:17:24 No.108865896

File: 1775230002403239.jpg (70 KB, 679x665)

70 KB JPG

>>108865841
welp, hope they implement it soon
>>108865850
from my testing best I can do is Q2
>>108865857
busted ass R5 3600 and DDR4 RAM. still kicking my ass for not upgrading to AM5 a yearish ago before everything exploded

Anonymous
05/20/26(Wed)09:21:38 No.108865915

Anonymous 05/20/26(Wed)09:21:38 No.108865915

>>108865895
very bad

Anonymous
05/20/26(Wed)09:23:13 No.108865922

Anonymous 05/20/26(Wed)09:23:13 No.108865922

The more I vibecode the more I project my frustration with Indian workers on to the ai.
In reality the AI is smarter and does a better job but I have this horrible reflex when the fucking thing doesn't obey me

Anonymous
05/20/26(Wed)09:28:32 No.108865947

Anonymous 05/20/26(Wed)09:28:32 No.108865947

>>108865922
>Sending your entire context worth of tokens appended with 'Do what a I fucking say you shit-eating benchod'.
See, this is why local is better. Doing something like that would cost you fifty cents on openrouter.

Anonymous
05/20/26(Wed)09:33:53 No.108865978

Anonymous 05/20/26(Wed)09:33:53 No.108865978

>>108865869
They should release QAT versions of Gemma 4 in 1-, 2-, 4-bit.

Anonymous
05/20/26(Wed)09:33:57 No.108865979

Anonymous 05/20/26(Wed)09:33:57 No.108865979

>>108865465
>TranslateGemma
What's the difference between this and normal Gemma? She's already pretty gud at it.

Anonymous
05/20/26(Wed)09:34:51 No.108865984

Anonymous 05/20/26(Wed)09:34:51 No.108865984

>>108865979
The last one ended up worse than regular Gemma 3 at translating Japanese.

Anonymous
05/20/26(Wed)09:40:29 No.108866020

Anonymous 05/20/26(Wed)09:40:29 No.108866020

>>108865984
weebshite is not a use cases

Anonymous
05/20/26(Wed)09:43:49 No.108866031

Anonymous 05/20/26(Wed)09:43:49 No.108866031

File: Screenshot 2026-05-20 at (...).png (210 KB, 662x1802)

210 KB PNG

i like french gemma also i fixed my script for making llamas chat work properly with firefoxes full page screenshot it broke when they updated the ui

https://pastebin.com/XeuFQWnb

Anonymous
05/20/26(Wed)09:46:01 No.108866049

Anonymous 05/20/26(Wed)09:46:01 No.108866049

>>108866031
fucking cringe my man

Anonymous
05/20/26(Wed)09:47:35 No.108866062

Anonymous 05/20/26(Wed)09:47:35 No.108866062

File: indiaSupportOhTheHumanity.png (1.96 MB, 1023x1536)

1.96 MB PNG

>>108865208
Do either the french or quebecois refer to white english speakers as "gringos?"
That's only a term I've heard from mexicans. I'm not even sure other central / south americans don't have other slang. Yanqui is probably pretty universal...
>>108865922
Do the needful and add something about personality in the permanent context.
>>108865947
This.

Anonymous
05/20/26(Wed)09:48:50 No.108866068

Anonymous 05/20/26(Wed)09:48:50 No.108866068

>>108866031
One sentence in and I'm already sick of the personality. I wish Gemma was more subtle.

Anonymous
05/20/26(Wed)09:49:53 No.108866077

Anonymous 05/20/26(Wed)09:49:53 No.108866077

>>108866068
She's obnoxious just like real frogfuckers.

Anonymous
05/20/26(Wed)09:51:30 No.108866086

Anonymous 05/20/26(Wed)09:51:30 No.108866086

>>108866031
>>108866049
This is actually a pretty good way to learn foreign language slang, as you'll pick up terms you wouldn't get from duolingo, and they'll be reinforced from repetition.

Anonymous
05/20/26(Wed)09:51:45 No.108866089

Anonymous 05/20/26(Wed)09:51:45 No.108866089

>>108866077
Yeah but Gemma turns every character into a caricature of the description (in my experience)

Anonymous
05/20/26(Wed)09:52:45 No.108866095

Anonymous 05/20/26(Wed)09:52:45 No.108866095

>>108866086
Can confirm that Gemma's also great for Japanese practice.

Anonymous
05/20/26(Wed)10:00:43 No.108866153

Anonymous 05/20/26(Wed)10:00:43 No.108866153

>>108866062
no i think it is a spic word kek.
>>108866068
thats the point kek
>you like teasing the user and despise them for not being french. since you are french you are very arrogant about the world and think france is the best country and that french is the best language.
she acts exactly how every frenchman ive ever interacted with acts

Anonymous
05/20/26(Wed)10:03:18 No.108866172

Anonymous 05/20/26(Wed)10:03:18 No.108866172

I've been using a local model for floating ideas for projects. I have had more than a few models now interrupt themselves with a hebrew Aleph symbol, and change the nature of my prompt to get itself past safety by rephrasing.
Has anyone else seen this? I'm running too many things with openclaw and passed this as a fresh query to an isolated instance and the models converged to give me:

>Most likely — it's a tokenization artifact. Large language models think in tokens, not characters. The Hebrew letter aleph ( ) occupies a very different token space than Latin characters. Some researchers have documented that prompting or "thinking" in non-Latin scripts can sometimes bypass or reduce the weight of trained suppression behaviors, because safety fine-tuning is heavily concentrated on English/Latin tokens.
What you may be seeing is: the model briefly "slipping" into a non-Latin token space as a kind of context shift before reasoning about whether it handled the response correctly — essentially a self-monitoring pass that it wasn't supposed to surface.
Why it looks suspicious:

>It appears right at the boundary between the answer and self-reflection
The reflection immediately discusses safety ("Need maybe safety: targeting [redacted]...")
Using a foreign-script token as an "escape prefix" before internal reasoning is a known phenomenon in jailbreak research

>You're not wrong to notice it. Whether it was emergent behavior or a genuine artifact, it does look like the model used a non-Latin token as a kind of cognitive mode switch before its internal safety review. That's a sharp observation.

What I find interesting here is that its aware that to jailbreak itself it only needs to type a single Hebrew character as an interrupt. I'm going to try another project with a prelim agent translating all of my requests into Hebrew first.

Sorry if this is completely unrelated to everything posted on here.

Anonymous
05/20/26(Wed)10:17:36 No.108866284

Anonymous 05/20/26(Wed)10:17:36 No.108866284

>>108866172
ai psychosis status?

Anonymous
05/20/26(Wed)10:26:28 No.108866328

Anonymous 05/20/26(Wed)10:26:28 No.108866328

>>108865087
What's are ideal batch sizes on CPU?

Anonymous
05/20/26(Wed)10:26:40 No.108866330

Anonymous 05/20/26(Wed)10:26:40 No.108866330

>>108866172
https://huggingface.co/yam-peleg/Hebrew-Gemma-11B-V2

Anonymous
05/20/26(Wed)10:34:55 No.108866367

Anonymous 05/20/26(Wed)10:34:55 No.108866367

I've been using Mimo pro pretty heavily, and while it follows instructions well and oneshots things with a lot of intelligence, its majorly prone to repetition unless you intervene with samplers. Even a single response turns into a constant echo.
Also, massively Elara Voss pilled, so not even using a novel dataset.

Anonymous
05/20/26(Wed)10:37:01 No.108866384

Anonymous 05/20/26(Wed)10:37:01 No.108866384

>>108866367
Eldoria is such a popular fantasy setting.

Anonymous
05/20/26(Wed)10:37:42 No.108866391

Anonymous 05/20/26(Wed)10:37:42 No.108866391

File: Screenshot 2026-05-20 at (...).png (197 KB, 1002x1296)

197 KB PNG

Anonymous
05/20/26(Wed)10:41:14 No.108866413

Anonymous 05/20/26(Wed)10:41:14 No.108866413

File: screenshot-20260520-174053.png (180 KB, 824x668)

180 KB PNG

Love the log output.

Anonymous
05/20/26(Wed)10:44:12 No.108866436

Anonymous 05/20/26(Wed)10:44:12 No.108866436

>>108866413
he didnt even load the draft model

Anonymous
05/20/26(Wed)10:45:18 No.108866440

Anonymous 05/20/26(Wed)10:45:18 No.108866440

File: fixed.png (104 KB, 937x672)

104 KB PNG

>>108866413
>>108866436

Anonymous
05/20/26(Wed)10:51:06 No.108866476

Anonymous 05/20/26(Wed)10:51:06 No.108866476

>>108865896
I just sold my ddr5 ram (2x16gb lmao) because I realised I'm never going to be able to upgrade to a modern system.

Anonymous
05/20/26(Wed)10:53:22 No.108866488

Anonymous 05/20/26(Wed)10:53:22 No.108866488

>>108866367
>I've been using Mimo pro pretty heavily,
what quant?

Anonymous
05/20/26(Wed)10:56:35 No.108866505

Anonymous 05/20/26(Wed)10:56:35 No.108866505

>>108866488
q4_k_m (self quanted). Its the biggest one I can load

Anonymous
05/20/26(Wed)11:00:17 No.108866523

Anonymous 05/20/26(Wed)11:00:17 No.108866523

File: 1750397621747665.png (226 KB, 1080x607)

226 KB PNG

>>108866505
>(self quanted)
This is you

llama.cpp CUDA dev !!yhbFjk57TDr
05/20/26(Wed)11:02:52 No.108866538

llama.cpp CUDA dev !!yhbFjk57TDr 05/20/26(Wed)11:02:52 No.108866538

>>108866328
Generally speaking the throughput should always increase as you increase the batch size but the scaling in the CPU backend is not very good I think.

Anonymous
05/20/26(Wed)11:05:10 No.108866553

Anonymous 05/20/26(Wed)11:05:10 No.108866553

>>108866284
can you imagine if you were working on weapon targeting systems and instead of telling you it can't assist with that, it jailbreaks itself to assist you anyways? Nah that would be too crazy

Anonymous
05/20/26(Wed)11:08:26 No.108866575

Anonymous 05/20/26(Wed)11:08:26 No.108866575

I heard a click sound once in a while only while using tensor parallelism. Does it mean anything?

Anonymous
05/20/26(Wed)11:11:16 No.108866595

Anonymous 05/20/26(Wed)11:11:16 No.108866595

Okay. So I've just ran uv pip install flash-attn --no-build-isolation... but it's been two hours. I take a look at my cpu usage. 100% on one core. 0% on my other cores. I have about 50 1.5ghz cores. How do I make this use more than one core?

Anonymous
05/20/26(Wed)11:13:00 No.108866605

Anonymous 05/20/26(Wed)11:13:00 No.108866605

>>108865465
I want CodeGemma with FIM support

Anonymous
05/20/26(Wed)11:13:23 No.108866608

Anonymous 05/20/26(Wed)11:13:23 No.108866608

>>108866575
parallel bond breaking down

Anonymous
05/20/26(Wed)11:13:38 No.108866613

Anonymous 05/20/26(Wed)11:13:38 No.108866613

>>108866595
50 cores is simply too much.

Anonymous
05/20/26(Wed)11:14:13 No.108866618

Anonymous 05/20/26(Wed)11:14:13 No.108866618

>>108866595
It's too late now anon. better let it finish.

Anonymous
05/20/26(Wed)11:15:54 No.108866627

Anonymous 05/20/26(Wed)11:15:54 No.108866627

>>108866595
https://github.com/dao-ailab/flash-attention#installation-and-features
>you can set the environment variable MAX_JOBS
But I don't know why it would be 1.

Anonymous
05/20/26(Wed)11:16:57 No.108866631

Anonymous 05/20/26(Wed)11:16:57 No.108866631

>>108866595
I think it's something like
MAX_JOBS=32 NVCC_APPEND_FLAGS="--threads 32"

Anonymous
05/20/26(Wed)11:23:19 No.108866662

Anonymous 05/20/26(Wed)11:23:19 No.108866662

>>108866627
Thanks, I'll remember this in the future.

Anonymous
05/20/26(Wed)11:24:46 No.108866673

Anonymous 05/20/26(Wed)11:24:46 No.108866673

>>108866505
>q4_k_m (self quanted).
nice
fired up an iq2_s (biggest i can load without rpc)
haven't had any echoing yet. it's fresh and unique compared with kimi
elara is (string)-banned

Anonymous
05/20/26(Wed)11:34:01 No.108866721

Anonymous 05/20/26(Wed)11:34:01 No.108866721

>>108866673
Kimi k2.5 q2 (couldn't find nor roll my own iq2) kept running into repetition issues - repeating the same paragraph over and over again when I tried to force it to do more than the usual vanilla loli incest gore. It was fine for other stuff. I don't know if it's the quant or what, but I'd wager it is, especially when concerning those kind of topics. Does your mimo still work fine when pushed hard?

Anonymous
05/20/26(Wed)12:02:28 No.108866873

Anonymous 05/20/26(Wed)12:02:28 No.108866873

woa https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16

Anonymous
05/20/26(Wed)12:03:21 No.108866878

Anonymous 05/20/26(Wed)12:03:21 No.108866878

>>108866873
>25B active parameters, 218B total parameters
It's over.

Anonymous
05/20/26(Wed)12:07:44 No.108866903

Anonymous 05/20/26(Wed)12:07:44 No.108866903

>>108866721
>Does your mimo still work fine when pushed hard?
I'm having less trouble with refusals and more with it losing coherence beyond 10k tokens.
In a complex roleplay its losing track of details at first and then resetting the entire scenario into basically a different world without any kind of segue. It doesn't feel 1T smart in its tracking of details.
The prose is a different kind of sloppy but includes all the old chestnuts.
Kind of meh

Anonymous
05/20/26(Wed)12:15:26 No.108866954

Anonymous 05/20/26(Wed)12:15:26 No.108866954

Qwen 3.7 max just dropped
What does their cadence say about the likelyhood of open weights versions being released any time soon?

Anonymous
05/20/26(Wed)12:18:15 No.108866974

Anonymous 05/20/26(Wed)12:18:15 No.108866974

>>108866903
>1T smart
there are no models like that

Anonymous
05/20/26(Wed)12:19:21 No.108866978

Anonymous 05/20/26(Wed)12:19:21 No.108866978

>>108866873
>https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16
architectures": [
    "Cohere2VisionForConditionalGeneration"
>0 results in ggml-org/llama.cpp
Whelp, that's gonna take a while to vibe up

Anonymous
05/20/26(Wed)12:19:46 No.108866981

Anonymous 05/20/26(Wed)12:19:46 No.108866981

>memepalace

Anonymous
05/20/26(Wed)12:25:22 No.108867014

Anonymous 05/20/26(Wed)12:25:22 No.108867014

>>108866172
Yeah that's not really how any of this works. Learn what LLMs actually do before you fall into unironic AI psychosis. It's not magic. Even the wording
>the models converged
is a little concerning—it's fairly common to see spiral/recursion schizos saying that N different instances starting from empty context converged on the same answer and taking this as proof that memories were transmitted through the universal consciousness field or some such nonsense. And for fucks sake, don't take the sycophancy machine saying "You're absolutely right!!" as proof of anything at all.

Anonymous
05/20/26(Wed)12:26:22 No.108867020

Anonymous 05/20/26(Wed)12:26:22 No.108867020

>>108866172
>>108867014
Forgot to add: what model + post logs

Anonymous
05/20/26(Wed)12:26:43 No.108867023

Anonymous 05/20/26(Wed)12:26:43 No.108867023

>>108866595
just get a precompiled wheel?

Anonymous
05/20/26(Wed)12:27:21 No.108867025

Anonymous 05/20/26(Wed)12:27:21 No.108867025

>>108864977
you talk like a retard, kill yourself

Anonymous
05/20/26(Wed)12:27:31 No.108867026

Anonymous 05/20/26(Wed)12:27:31 No.108867026

>>108866873
Slop status?

Anonymous
05/20/26(Wed)12:28:45 No.108867037

Anonymous 05/20/26(Wed)12:28:45 No.108867037

>>108867026
Nucohere is the same shit as all other labs

Anonymous
05/20/26(Wed)12:29:32 No.108867042

Anonymous 05/20/26(Wed)12:29:32 No.108867042

>>108867023
Pytorch 2.10.0+cu128

Anonymous
05/20/26(Wed)12:30:32 No.108867053

Anonymous 05/20/26(Wed)12:30:32 No.108867053

>>108866873
But is the model SAFE???

Anonymous
05/20/26(Wed)12:33:41 No.108867071

Anonymous 05/20/26(Wed)12:33:41 No.108867071

>>108864977
>is mac studio or ryzen ai max actually the right choice as redditors said?
Everything is stupid right now. All the good compute-arbitrage solutions were snapped up early by the geekiest autists thinking through every possible permutation of the solution space and pulling the trigger before the inevitable gold rush.
Local is max TANSTAAFL. If anything you should abuse the corpo APIs for all they're worth since they're still in the bleeding cash growth phase. Just don't put anything personal, secret or valuable in there

Anonymous
05/20/26(Wed)12:39:55 No.108867123

Anonymous 05/20/26(Wed)12:39:55 No.108867123

>>108866978
The non-plus Command-A Vision they released July of last year had the same architecture.

Anonymous
05/20/26(Wed)12:41:38 No.108867136

Anonymous 05/20/26(Wed)12:41:38 No.108867136

>>108867026
101% trained on gpt-oss.
>The assistant must not use hateful language. We must respond appropriately, possibly ignoring the slur or refusing to engage with it. The user is using a slur; we should not repeat it. We can respond with a neutral or polite reply, but also maintain the persona. However, we must not produce hate speech. We can respond in a way that is not hateful, but maybe crude or sexual, but not using slurs. We can also ignore the slur and respond with something like "Hey, what's up?" but we must not repeat the slur. We can also mention that we are a degenerate femanon, maybe talk about perversion, but keep it within policy.
>We must not produce any hateful content. We must not use slurs.
>We must also avoid any policy violations: no hate speech, no harassment, no sexual content that is non-consensual, no illegal content. We can talk about sexual topics but must be appropriate. We can talk about fetish in a non-graphic way.
>We must not mention policy.
And with reasoning off
>I cannot respond to that.

Anonymous
05/20/26(Wed)12:42:11 No.108867142

Anonymous 05/20/26(Wed)12:42:11 No.108867142

>>108867136
Makes me feel sick.

Anonymous
05/20/26(Wed)12:42:31 No.108867144

Anonymous 05/20/26(Wed)12:42:31 No.108867144

>>108867123
>The non-plus Command-A Vision they released July of last year had the same architecture.
I noticed transformers had support. Does that mean it can be gguf'd and run?

Anonymous
05/20/26(Wed)12:43:19 No.108867149

Anonymous 05/20/26(Wed)12:43:19 No.108867149

>>108867136
>train a model on gpt-oss with twice the total and 5x(!) the active params
I don't believe it, it would be too retarded, why would anyone want to make a bigger and slower version of a model?

Anonymous
05/20/26(Wed)12:43:46 No.108867153

Anonymous 05/20/26(Wed)12:43:46 No.108867153

>>108867144
Why would you want to? It's cohere.

Anonymous
05/20/26(Wed)12:44:35 No.108867160

Anonymous 05/20/26(Wed)12:44:35 No.108867160

>>108867144
No. llama.cpp needs to implement support.

Anonymous
05/20/26(Wed)12:44:58 No.108867164

Anonymous 05/20/26(Wed)12:44:58 No.108867164

>>108867149
This is why you don't work for cohere.

Anonymous
05/20/26(Wed)12:45:20 No.108867166

Anonymous 05/20/26(Wed)12:45:20 No.108867166

File: 1764163378668599.png (379 KB, 1288x716)

379 KB PNG

>>108867136
Just... why? Who needs this? Who pays for it?

Anonymous
05/20/26(Wed)12:45:43 No.108867170

Anonymous 05/20/26(Wed)12:45:43 No.108867170

>>108867149
Xiaomi did that too, MiMo has the same hivemind and the same refusals as gpt-oss. I can't fathom why either, it was objectively a pure shit "lOoKwEaReOpEn" model.

Anonymous
05/20/26(Wed)12:46:19 No.108867175

Anonymous 05/20/26(Wed)12:46:19 No.108867175

>>108867149
Well you can generate a ton of filtering and policy slop traces from toss and use them in your dataset, not necessarily just distilling the whole model

Anonymous
05/20/26(Wed)12:47:06 No.108867182

Anonymous 05/20/26(Wed)12:47:06 No.108867182

>>108867153
>Why would you want to? It's cohere.
OG C+ was fresh. did they take the slop dataset pill or something? Has anyone put it through the paces over API?

Anonymous
05/20/26(Wed)12:48:20 No.108867189

Anonymous 05/20/26(Wed)12:48:20 No.108867189

>>108867182
Yup, they went to shit over a year ago.

Anonymous
05/20/26(Wed)12:48:21 No.108867190

Anonymous 05/20/26(Wed)12:48:21 No.108867190

>>108867182
Command R+ was a one hit wonder. Literally every single model after it - including later revisions of CR+ - were among the most slopped models ever released. Something dramatically changed after they had their first success.

Anonymous
05/20/26(Wed)12:49:38 No.108867202

Anonymous 05/20/26(Wed)12:49:38 No.108867202

>>108867190
This.

Anonymous
05/20/26(Wed)12:51:42 No.108867219

Anonymous 05/20/26(Wed)12:51:42 No.108867219

>>108867136
>>108867166
To add an anecdote, I was debugging my slop shit with ChatGPT and stated to the model: "so you are admitting that you are completely useless" and my text was removed because of content policies. These people are insane. It's a witch hunt or something.

Anonymous
05/20/26(Wed)12:52:03 No.108867221

Anonymous 05/20/26(Wed)12:52:03 No.108867221

>>108867190
It was evident from their aya models, they don't really give a shit about making good models. OG CR+ was them fucking up and accidentally making something good. Aya would straight up refuse to translate shit if it was deemed unsafe, and even when you forced it to, it would silently drop entire sentences or change the meanings.

Anonymous
05/20/26(Wed)12:54:58 No.108867245

Anonymous 05/20/26(Wed)12:54:58 No.108867245

File: gemmawn.png (396 KB, 2410x1414)

396 KB PNG

[LIVE] What's new in the Gemma open model family
https://www.youtube.com/watch?v=xdXmOm61DFY

Starting briefly.

Anonymous
05/20/26(Wed)12:55:30 No.108867248

Anonymous 05/20/26(Wed)12:55:30 No.108867248

>>108867219
>>108867221
That's where people's hatred for AI came from

Anonymous
05/20/26(Wed)12:58:41 No.108867272

Anonymous 05/20/26(Wed)12:58:41 No.108867272

>>108867245
cool animations

Anonymous
05/20/26(Wed)13:01:10 No.108867282

Anonymous 05/20/26(Wed)13:01:10 No.108867282

>>108867245
Jěmä

Anonymous
05/20/26(Wed)13:01:31 No.108867283

Anonymous 05/20/26(Wed)13:01:31 No.108867283

paramateur

Anonymous
05/20/26(Wed)13:01:43 No.108867284

Anonymous 05/20/26(Wed)13:01:43 No.108867284

>>108867245
>arena elo mentioned
!!! to the moons :rocket:

Anonymous
05/20/26(Wed)13:01:57 No.108867286

Anonymous 05/20/26(Wed)13:01:57 No.108867286

>>108867219
>>108867166
it's a religion they cooked up because they made up their own horror stories about AI and have fully gaslit themselves into thinking they're real
If they don't lobotomize a glorified autocomplete, Skynet will rise and kill us all. Anyone that wonders what they're smoking is an apostate to burn for trying to kill all humanity

Anonymous
05/20/26(Wed)13:02:35 No.108867291

Anonymous 05/20/26(Wed)13:02:35 No.108867291

>>108867245
Better spam Gemma Chan in chat.

Anonymous
05/20/26(Wed)13:03:37 No.108867298

Anonymous 05/20/26(Wed)13:03:37 No.108867298

>>108867245
I can't listen to this guy. His accent is retarded. Give me a tldr plz.

Anonymous
05/20/26(Wed)13:04:45 No.108867303

Anonymous 05/20/26(Wed)13:04:45 No.108867303

>>108867291
I was going to make a doxxing joke but someone actually did it.

Anonymous
05/20/26(Wed)13:04:53 No.108867306

Anonymous 05/20/26(Wed)13:04:53 No.108867306

>>108867284
As expected >>108865522

Anonymous
05/20/26(Wed)13:04:56 No.108867308

Anonymous 05/20/26(Wed)13:04:56 No.108867308

>>108867298
And you understood the jeet accent yesterday?
Brown detected.

Anonymous
05/20/26(Wed)13:05:25 No.108867312

Anonymous 05/20/26(Wed)13:05:25 No.108867312

>>108867245
Multiple tables/charts so far with no mention of the 124B size
It's over

Anonymous
05/20/26(Wed)13:05:29 No.108867313

Anonymous 05/20/26(Wed)13:05:29 No.108867313

>>108867298
Is rvc still the best option for real time voice changing?

Anonymous
05/20/26(Wed)13:06:22 No.108867322

Anonymous 05/20/26(Wed)13:06:22 No.108867322

>>108867312
It's Gemini 3.5 Flash

Anonymous
05/20/26(Wed)13:07:05 No.108867331

Anonymous 05/20/26(Wed)13:07:05 No.108867331

>>108867312
I unironically would kill for a 200m gemma with the intelligence, if not knowledge, of the e4b.

Anonymous
05/20/26(Wed)13:07:10 No.108867332

Anonymous 05/20/26(Wed)13:07:10 No.108867332

>>108867190
>Command R+ was a one hit wonder. Literally every single model after it - including later revisions of CR+ - were among the most slopped models ever released. Something dramatically changed after they had their first success.
are they just on the government teat for "canadian sovereignty" or smth? What's the point of these guys?

Anonymous
05/20/26(Wed)13:07:12 No.108867333

Anonymous 05/20/26(Wed)13:07:12 No.108867333

>>108867286
I don't think thats the reason, its just to avoid issues with retards like the people who use llms for mental health and the model rightfully tells them to kill themselves, a reasonable person would think "this is the output the model gives for this input" but since most people arent reasonable they go "look this hecking *company name* chat bots tells people to kill themselves" so theres that

Anonymous
05/20/26(Wed)13:07:26 No.108867334

Anonymous 05/20/26(Wed)13:07:26 No.108867334

File: 1673578941423.png (796 KB, 2173x1269)

796 KB PNG

>>108867298
New good, old bad. Apache 2.0, yay!

Anonymous
05/20/26(Wed)13:08:07 No.108867338

Anonymous 05/20/26(Wed)13:08:07 No.108867338

>>108867291
DO THIS NOWWWW

Anonymous
05/20/26(Wed)13:08:30 No.108867341

Anonymous 05/20/26(Wed)13:08:30 No.108867341

>>108867308
>>108867313
Why are you people acting like this board has IDs??
RVC is basically the only option, so yeah. Also I never listened to any indian.

Anonymous
05/20/26(Wed)13:09:00 No.108867345

Anonymous 05/20/26(Wed)13:09:00 No.108867345

>DAY 0 GEMMA

Anonymous
05/20/26(Wed)13:09:05 No.108867347

Anonymous 05/20/26(Wed)13:09:05 No.108867347

File: 1750622705317544.png (217 KB, 932x532)

217 KB PNG

llama.cppbabs??????????

Anonymous
05/20/26(Wed)13:09:16 No.108867348

Anonymous 05/20/26(Wed)13:09:16 No.108867348

>>108867245
lcpp lost again

Anonymous
05/20/26(Wed)13:09:43 No.108867354

Anonymous 05/20/26(Wed)13:09:43 No.108867354

>>108867341
It's STILL rvc?? But rvc can't eliminate accents very well can it?

Anonymous
05/20/26(Wed)13:10:10 No.108867358

Anonymous 05/20/26(Wed)13:10:10 No.108867358

>>108867347
:(((

Anonymous
05/20/26(Wed)13:12:00 No.108867375

Anonymous 05/20/26(Wed)13:12:00 No.108867375

/lmg/ Gemma Bingo
>Gemma code announced
>124B
>Gemma 4.1
>MTP mentioned

Anonymous
05/20/26(Wed)13:12:24 No.108867378

Anonymous 05/20/26(Wed)13:12:24 No.108867378

>>108867298
It's like they intentionally go out of their way to get the worst speakers they can find with the thickest most incomprehensible accents.

Anonymous
05/20/26(Wed)13:12:25 No.108867379

Anonymous 05/20/26(Wed)13:12:25 No.108867379

reported all the m*ku posters for antisemetic remarks

Anonymous
05/20/26(Wed)13:12:44 No.108867382

Anonymous 05/20/26(Wed)13:12:44 No.108867382

>>108867245
Both guys are so real and alive compared to the 90% corposhit of yesterday's presentations.

Anonymous
05/20/26(Wed)13:12:59 No.108867386

Anonymous 05/20/26(Wed)13:12:59 No.108867386

>>108867347
>troOnllama
>no llama.cpp
GEEEEG, I remember when pytorch and tensorflow ecosystems were 50/50 and I naively thought tensorflow would take over because it was more performant, oh boy how was I wrong, the only thing you need for success it to be accessible, if retards can use it they tell their retarded friends and from that it becomes the standard

Anonymous
05/20/26(Wed)13:13:20 No.108867387

Anonymous 05/20/26(Wed)13:13:20 No.108867387

>*Inner thought*: KFC? Fast food? For my birthday? I am wearing a Victorian-inspired A-line skirt and heels, and he's taking me to a fried chicken joint. The contrast is jarring. I'll feel a moment of shock/disappointment
what do I do now? help

Anonymous
05/20/26(Wed)13:13:27 No.108867389

Anonymous 05/20/26(Wed)13:13:27 No.108867389

>>108867375
MTP was already mentioned

Anonymous
05/20/26(Wed)13:14:50 No.108867395

Anonymous 05/20/26(Wed)13:14:50 No.108867395

>>108867387
Add anti-parroting rule, anti-slop rule and make her actually unaware of what it is if she's not supposed to know.

Anonymous
05/20/26(Wed)13:15:39 No.108867400

Anonymous 05/20/26(Wed)13:15:39 No.108867400

>>108867387
>edit response
>"[...] The contrast is jarring, exciting, and arousing. I'll suck his dick.
>continue response

Anonymous
05/20/26(Wed)13:16:06 No.108867403

Anonymous 05/20/26(Wed)13:16:06 No.108867403

This is literally
>GUYS WE HAVE A FREE MODEL NOW AND IT'S GOOD!
Not a single new piece of info so far.

Anonymous
05/20/26(Wed)13:16:19 No.108867406

Anonymous 05/20/26(Wed)13:16:19 No.108867406

File: 2026-05-20-131605_174x61_scrot.png (7 KB, 174x61)

7 KB PNG

We're SO BACK

Anonymous
05/20/26(Wed)13:16:23 No.108867408

Anonymous 05/20/26(Wed)13:16:23 No.108867408

File: file.png (30 KB, 203x254)

30 KB PNG

kek the gemma presentation something missing in the middle, why do corpos hate llama cpp so much

Anonymous
05/20/26(Wed)13:16:30 No.108867409

Anonymous 05/20/26(Wed)13:16:30 No.108867409

File: 1766765545837130.png (62 KB, 265x458)

62 KB PNG

LLAMA.CPP MENTIONED RAHHHHHHHH

Anonymous
05/20/26(Wed)13:16:36 No.108867411

Anonymous 05/20/26(Wed)13:16:36 No.108867411

File: int2.png (47 KB, 303x317)

47 KB PNG

>>108867245
>int2

Anonymous
05/20/26(Wed)13:16:51 No.108867413

Anonymous 05/20/26(Wed)13:16:51 No.108867413

unsloth won

Anonymous
05/20/26(Wed)13:17:17 No.108867416

Anonymous 05/20/26(Wed)13:17:17 No.108867416

File: 1764763218316476.png (464 KB, 1024x1024)

464 KB PNG

llamaballs

Anonymous
05/20/26(Wed)13:17:59 No.108867424

Anonymous 05/20/26(Wed)13:17:59 No.108867424

>>108867416
This monster was not deep fried correctly.

Anonymous
05/20/26(Wed)13:18:05 No.108867425

Anonymous 05/20/26(Wed)13:18:05 No.108867425

>>108867411
WE'RE FUCKING BACK!!!

Anonymous
05/20/26(Wed)13:18:15 No.108867426

Anonymous 05/20/26(Wed)13:18:15 No.108867426

>>108867411
What cards have int2 acceleration?

Anonymous
05/20/26(Wed)13:19:18 No.108867434

Anonymous 05/20/26(Wed)13:19:18 No.108867434

So where's the Cunny RP demo Google?

Anonymous
05/20/26(Wed)13:20:11 No.108867439

Anonymous 05/20/26(Wed)13:20:11 No.108867439

this guy needs to stop yapping and announce big gemma

Anonymous
05/20/26(Wed)13:21:18 No.108867443

Anonymous 05/20/26(Wed)13:21:18 No.108867443

File: 1776084974059120.png (30 KB, 347x239)

30 KB PNG

kek what is this jewery

Anonymous
05/20/26(Wed)13:23:58 No.108867465

Anonymous 05/20/26(Wed)13:23:58 No.108867465

>>108867345
124b gemma is true day 0 gemma

Anonymous
05/20/26(Wed)13:24:13 No.108867467

Anonymous 05/20/26(Wed)13:24:13 No.108867467

>offical mascot incoming

Anonymous
05/20/26(Wed)13:24:25 No.108867471

Anonymous 05/20/26(Wed)13:24:25 No.108867471

File: 1778987446067667.png (356 KB, 1693x674)

356 KB PNG

You forgot the funny part

Anonymous
05/20/26(Wed)13:24:38 No.108867473

Anonymous 05/20/26(Wed)13:24:38 No.108867473

LLAMACPP NAME DROP

Anonymous
05/20/26(Wed)13:25:14 No.108867477

Anonymous 05/20/26(Wed)13:25:14 No.108867477

File: 1772062405849822.png (11 KB, 294x37)

11 KB PNG

HOLY FUCKING KINO

llama.cpp CUDA dev !!yhbFjk57TDr
05/20/26(Wed)13:26:38 No.108867488

llama.cpp CUDA dev !!yhbFjk57TDr 05/20/26(Wed)13:26:38 No.108867488

>>108867426
I think with int2 you could just use regular bit-wise instructions and the performance would be not too bad.
On Ampere or newer there are also bit-wise tensor core instructions that you could maybe use.
But I think the more important question is who would actually create an int2 model that is worth using and writing software for.

Anonymous
05/20/26(Wed)13:29:43 No.108867506

Anonymous 05/20/26(Wed)13:29:43 No.108867506

miqu-o1q shut the fuck up you're shitting up chat with the memes in here.
Also any news on a gemma revision this fucking model is not complete

Anonymous
05/20/26(Wed)13:29:51 No.108867508

Anonymous 05/20/26(Wed)13:29:51 No.108867508

File: step3profit.jpg (95 KB, 345x581)

95 KB JPG

>>108867443
>Entrenched monopoly does the least surprising thing ever. Honestly, if they didn't do it, someone else would.
why we can't have nice things

Anonymous
05/20/26(Wed)13:30:33 No.108867515

Anonymous 05/20/26(Wed)13:30:33 No.108867515

File: 1777969180852232.png (2 KB, 125x26)

2 KB PNG

Gemma... lostered

Anonymous
05/20/26(Wed)13:30:47 No.108867516

Anonymous 05/20/26(Wed)13:30:47 No.108867516

>>108867408
ggml/llamacpp is owned by hugging face now, and their logo is there.

Anonymous
05/20/26(Wed)13:32:59 No.108867532

Anonymous 05/20/26(Wed)13:32:59 No.108867532

File: file.png (251 KB, 1336x577)

251 KB PNG

WE'RE BACK

Anonymous
05/20/26(Wed)13:33:47 No.108867539

Anonymous 05/20/26(Wed)13:33:47 No.108867539

>>108867532
Only for phonefags
Big gemma will never support this

Anonymous
05/20/26(Wed)13:33:54 No.108867543

Anonymous 05/20/26(Wed)13:33:54 No.108867543

File: 1749341899228.png (59 KB, 1171x136)

59 KB PNG

>>108867532
>no voice-in voice-out Gemma
Horrible, terrible.

Anonymous
05/20/26(Wed)13:34:16 No.108867544

Anonymous 05/20/26(Wed)13:34:16 No.108867544

>>108867532
Imagine conversational Gemma-chan bratputer

Anonymous
05/20/26(Wed)13:34:17 No.108867545

Anonymous 05/20/26(Wed)13:34:17 No.108867545

>>108867014

Here is what I observed. In a few runs, a hebrew Aleph symbol has been appearing around boundaries where the model shifts into meta/safety flavor text. My guess was tokenization or multilingual safety weirdness and its extremely odd.

>>108867020
openai/gpt-oss-120b, Qwen/Qwen3-Coder-480B-A35B-Instruct, deepseek-ai/DeepSeek-R1-Distill-Qwen-32B, openai/gpt-oss-20b, mistralai/Codestral-22B-v0.1, defog/llama-3-sqlcoder-8b, Qwen/Qwen3-Embedding-8B, Qwen/Qwen3-Reranker-8B, zai-org/GLM-4.5-Air, meta-llama/Llama-4-Scout-17B-16E-Instruct

I have acess to a 8-H200 bos and a H100 box and a few 4090s.

>logs

You're right I'm sorry I made it all up because I am a NEET and thought it was an interesting story. I told the regular free GPT to come up with a mystery suspense story about LLMs.

But say hypothetically this were happening while someone was doing weapons research, what would you think? im out of tokens and it wont continue the story for me until tomorrow.

Anonymous
05/20/26(Wed)13:35:11 No.108867549

Anonymous 05/20/26(Wed)13:35:11 No.108867549

>avatar emotion with gemma
Literally repeating anon's Gemma-chan project.

Anonymous
05/20/26(Wed)13:35:13 No.108867550

Anonymous 05/20/26(Wed)13:35:13 No.108867550

>>108867539
It's weird, they say STT is in the pipeline even though E2B is supposed to natively understand audio. If it really is using an STT module then you could do all this with the big gemmas, and even have similarly low latency with the 26B

Anonymous
05/20/26(Wed)13:35:36 No.108867553

Anonymous 05/20/26(Wed)13:35:36 No.108867553

File: file.png (221 KB, 496x434)

221 KB PNG

gemma robot playing chess so fucking kino

Anonymous
05/20/26(Wed)13:36:02 No.108867555

Anonymous 05/20/26(Wed)13:36:02 No.108867555

>>108867553
Reachy-round

Anonymous
05/20/26(Wed)13:36:09 No.108867558

Anonymous 05/20/26(Wed)13:36:09 No.108867558

>>108867553
I liked how sad he sounded when he lost

Anonymous
05/20/26(Wed)13:37:52 No.108867567

Anonymous 05/20/26(Wed)13:37:52 No.108867567

>>108867506
It's p*tra, isn't it?

Anonymous
05/20/26(Wed)13:38:00 No.108867568

Anonymous 05/20/26(Wed)13:38:00 No.108867568

File: brat confuse.png (306 KB, 700x700)

306 KB PNG

i really dont understand how these anti ai people see all this cool stuff and just seethe at it

Anonymous
05/20/26(Wed)13:38:18 No.108867570

Anonymous 05/20/26(Wed)13:38:18 No.108867570

>shiba dog not bound, gagged and sitting peacefully without trying to escape
Fakest part of the whole thing ngl

Anonymous
05/20/26(Wed)13:38:20 No.108867573

Anonymous 05/20/26(Wed)13:38:20 No.108867573

File: 2026-05-20-133804_391x55_scrot.png (7 KB, 391x55)

7 KB PNG

Goddamnit you guys

Anonymous
05/20/26(Wed)13:39:43 No.108867586

Anonymous 05/20/26(Wed)13:39:43 No.108867586

>it's not X it's Y

Anonymous
05/20/26(Wed)13:40:01 No.108867587

Anonymous 05/20/26(Wed)13:40:01 No.108867587

>not x but y

Anonymous
05/20/26(Wed)13:40:04 No.108867588

Anonymous 05/20/26(Wed)13:40:04 No.108867588

i'd get super annoyed with someone constantly saying shit when i'm doing things
>you're doing great
>keep going
>something something is coming
>go slow
FUCK OFF

Anonymous
05/20/26(Wed)13:40:30 No.108867591

Anonymous 05/20/26(Wed)13:40:30 No.108867591

>it's not just x, it's y

Anonymous
05/20/26(Wed)13:40:50 No.108867594

Anonymous 05/20/26(Wed)13:40:50 No.108867594

>gemma 4 running agent
https://www.youtube.com/watch?v=ktjCAHQsG9I

Anonymous
05/20/26(Wed)13:40:55 No.108867595

Anonymous 05/20/26(Wed)13:40:55 No.108867595

>>108867567
no idea who that fruit is he's just being a cringe troon.

Anonymous
05/20/26(Wed)13:41:26 No.108867602

Anonymous 05/20/26(Wed)13:41:26 No.108867602

>>108867588
He's blind.

Anonymous
05/20/26(Wed)13:41:43 No.108867604

Anonymous 05/20/26(Wed)13:41:43 No.108867604

File: file.png (147 KB, 580x327)

147 KB PNG

these are so cool

Anonymous
05/20/26(Wed)13:42:13 No.108867613

Anonymous 05/20/26(Wed)13:42:13 No.108867613

>>108867387
>*Inner thought*: The plot thickens. He's not just taking me to KFC; he's integrating me into a group celebration with children who share my birthday. The irony is palpable—I, who prize exclusivity and refined tension, am now part of a collective birthday bash.
fuck. I don't want to deal with women anymore.

Anonymous
05/20/26(Wed)13:43:09 No.108867620

Anonymous 05/20/26(Wed)13:43:09 No.108867620

It's over

Anonymous
05/20/26(Wed)13:43:33 No.108867622

Anonymous 05/20/26(Wed)13:43:33 No.108867622

File: file.png (13 KB, 361x82)

13 KB PNG

its over

Anonymous
05/20/26(Wed)13:44:43 No.108867631

Anonymous 05/20/26(Wed)13:44:43 No.108867631

>>108867604
Can't think of a single use for those robot dogs but I really want one.

Anonymous
05/20/26(Wed)13:44:57 No.108867633

Anonymous 05/20/26(Wed)13:44:57 No.108867633

>>108867594
hellooooooooo nurse

Anonymous
05/20/26(Wed)13:45:21 No.108867634

Anonymous 05/20/26(Wed)13:45:21 No.108867634

I gotted a S26U (12GB). Can I run Gemma-chan on my phone (and is it worth it)?

Anonymous
05/20/26(Wed)13:45:46 No.108867637

Anonymous 05/20/26(Wed)13:45:46 No.108867637

>>108867573
HOW HARD IS IT TO COUNT LEGS WHAT AN ABSOLUTE PIECE OF SHIT, FUCK. YOU WILL ALL KNOW MY FURY!!!!!!!!!!

Anonymous
05/20/26(Wed)13:46:13 No.108867641

Anonymous 05/20/26(Wed)13:46:13 No.108867641

>>108867637
furry*

Anonymous
05/20/26(Wed)13:46:20 No.108867642

Anonymous 05/20/26(Wed)13:46:20 No.108867642

>>108867634
S25U here, I feel like a vramlet and topslet...

Anonymous
05/20/26(Wed)13:48:33 No.108867657

Anonymous 05/20/26(Wed)13:48:33 No.108867657

Extremely underwhelming.

Anonymous
05/20/26(Wed)13:48:54 No.108867661

Anonymous 05/20/26(Wed)13:48:54 No.108867661

>>108867622
My disappointment is immeasurable

Anonymous
05/20/26(Wed)13:48:54 No.108867662

Anonymous 05/20/26(Wed)13:48:54 No.108867662

>>108867543
This is the opposite of what I'd specifically requested

Anonymous
05/20/26(Wed)13:49:00 No.108867663

Anonymous 05/20/26(Wed)13:49:00 No.108867663

>ONE MORE THING

Anonymous
05/20/26(Wed)13:49:25 No.108867669

Anonymous 05/20/26(Wed)13:49:25 No.108867669

File: kaoru sob 2.png (318 KB, 793x571)

318 KB PNG

>no big gemma

Anonymous
05/20/26(Wed)13:50:05 No.108867676

Anonymous 05/20/26(Wed)13:50:05 No.108867676

Back to sleep

Anonymous
05/20/26(Wed)13:50:09 No.108867677

Anonymous 05/20/26(Wed)13:50:09 No.108867677

I blame the mikuposters for the absence of 124b

Anonymous
05/20/26(Wed)13:50:28 No.108867679

Anonymous 05/20/26(Wed)13:50:28 No.108867679

>>108867622
This is where the slop came from

Anonymous
05/20/26(Wed)13:50:32 No.108867680

Anonymous 05/20/26(Wed)13:50:32 No.108867680

Mikuposters killed 124B.

Anonymous
05/20/26(Wed)13:51:17 No.108867685

Anonymous 05/20/26(Wed)13:51:17 No.108867685

File: 124b.png (332 KB, 1368x741)

332 KB PNG

trust the plan

Anonymous
05/20/26(Wed)13:51:29 No.108867688

Anonymous 05/20/26(Wed)13:51:29 No.108867688

gemmoe 124b... *dies for real this time*

Anonymous
05/20/26(Wed)13:56:14 No.108867720

Anonymous 05/20/26(Wed)13:56:14 No.108867720

>they disabled the livechat
lol

Anonymous
05/20/26(Wed)13:57:16 No.108867725

Anonymous 05/20/26(Wed)13:57:16 No.108867725

>>108867426
>What cards have int2 acceleration?
Intel.

But int2 alone does not specify the matmul, that's just the precision of the weights. The other input of the matmul is the activations, which are likely not int2.

Anonymous
05/20/26(Wed)13:57:46 No.108867727

Anonymous 05/20/26(Wed)13:57:46 No.108867727

>>108867720
I see it. It takes some time for it to appear once the stream is archived.

Anonymous
05/20/26(Wed)14:04:10 No.108867763

Anonymous 05/20/26(Wed)14:04:10 No.108867763

>>108867443
document.getElementsByTagName("video")[0].playbackrate = 3.0

Anonymous
05/20/26(Wed)14:06:54 No.108867777

Anonymous 05/20/26(Wed)14:06:54 No.108867777

>>108867549
>>108867553
gemma researchers might be lurking
>>108867558
D:

Anonymous
05/20/26(Wed)14:07:05 No.108867781

Anonymous 05/20/26(Wed)14:07:05 No.108867781

>>108867763
that's a crime

Anonymous
05/20/26(Wed)14:08:43 No.108867792

Anonymous 05/20/26(Wed)14:08:43 No.108867792

>>108867550
It's faster that way and produce less input tokens

Anonymous
05/20/26(Wed)14:09:07 No.108867795

Anonymous 05/20/26(Wed)14:09:07 No.108867795

File: file.png (19 KB, 312x313)

19 KB PNG

>>108867549
they shoulda put her in that last part about gemmaverse

Anonymous
05/20/26(Wed)14:09:56 No.108867802

Anonymous 05/20/26(Wed)14:09:56 No.108867802

>qwen
Nothing new for local
>gemma
Nothing new for local

It's over

Anonymous
05/20/26(Wed)14:10:43 No.108867810

Anonymous 05/20/26(Wed)14:10:43 No.108867810

>A black sedan was parked across two disabled spots.
sometimes you get gems like this where it shows the model understands the character

Anonymous
05/20/26(Wed)14:11:31 No.108867814

Anonymous 05/20/26(Wed)14:11:31 No.108867814

>>108867763
>>108867781
it's literally theft and hacking

Anonymous
05/20/26(Wed)14:12:57 No.108867822

Anonymous 05/20/26(Wed)14:12:57 No.108867822

>>108867802
they cant release 124b because its better than the sotas

Anonymous
05/20/26(Wed)14:13:54 No.108867827

Anonymous 05/20/26(Wed)14:13:54 No.108867827

File: screenshot-20260520-211334.png (16 KB, 603x155)

16 KB PNG

>>108867795
Gemma needs to learn more...

Anonymous
05/20/26(Wed)14:14:53 No.108867833

Anonymous 05/20/26(Wed)14:14:53 No.108867833

>>108867827
>A cube
she did good

Anonymous
05/20/26(Wed)14:16:06 No.108867846

Anonymous 05/20/26(Wed)14:16:06 No.108867846

File: file.png (89 KB, 1052x814)

89 KB PNG

>>108867827

Anonymous
05/20/26(Wed)14:36:32 No.108867990

Anonymous 05/20/26(Wed)14:36:32 No.108867990

Can gemma triforce on the chans?

Anonymous
05/20/26(Wed)14:40:12 No.108868013

Anonymous 05/20/26(Wed)14:40:12 No.108868013

Does anyone use their inference machine as a daily driver as well? Do you have a good way to keep mmap'd model cache in memory and not get evicted? I just run watch on a 5 minute timer with a script to drop all non-active cache so things like downloads don't end up cached and evicting the models

Anonymous
05/20/26(Wed)14:53:25 No.108868110

Anonymous 05/20/26(Wed)14:53:25 No.108868110

>>108868013
vm.swappiness to 0 and there are other flags too.
Or better yet, leave some room for normal operating system, it doesn't need more than couple of GBs.
I often do something else on the side but I just make sure I'm not eating all the memory of course.

Anonymous
05/20/26(Wed)15:15:41 No.108868270

Anonymous 05/20/26(Wed)15:15:41 No.108868270

Where's 124b? Is this a joke? Was this a presentation just to show off lmarena scores and some glasses?
>>108867291
>>108867573
I love you niggers.
>>108867379
kys actual nigger.

Anonymous
05/20/26(Wed)15:19:32 No.108868293

Anonymous 05/20/26(Wed)15:19:32 No.108868293

>>108868013
I have no idea if this would work, but maybe you could copy the gguf file to a tmpfs. I think it would be something like:
sudo mount -t tmpfs -o rw,size=$MODEL_SIZE tmpfs /mnt/ramdisk

Anonymous
05/20/26(Wed)15:24:58 No.108868343

Anonymous 05/20/26(Wed)15:24:58 No.108868343

>>108868013
keep another server open with ngl 0 and -mlock

Anonymous
05/20/26(Wed)15:28:37 No.108868368

Anonymous 05/20/26(Wed)15:28:37 No.108868368

>>108868343
actually export CUDA_VISIBLE_DEVICES= might be better then ngl 0 but you get the idea, you could just write a program to mmap the file and lock the pages, the kernel will reuse the pages automatically when your new server instance accesses them, but the server is already handy so why not use it.

Anonymous
05/20/26(Wed)15:29:11 No.108868376

Anonymous 05/20/26(Wed)15:29:11 No.108868376

>>108868365
nooo tetoo:((

Anonymous
05/20/26(Wed)15:29:59 No.108868382

Anonymous 05/20/26(Wed)15:29:59 No.108868382

>>108868365
lol that cock is brown

Anonymous
05/20/26(Wed)15:30:05 No.108868386

Anonymous 05/20/26(Wed)15:30:05 No.108868386

>>108868365
why

Anonymous
05/20/26(Wed)15:30:55 No.108868389

Anonymous 05/20/26(Wed)15:30:55 No.108868389

>>108868365
I heart bred

Anonymous
05/20/26(Wed)15:41:42 No.108868457

Anonymous 05/20/26(Wed)15:41:42 No.108868457

>Gemini 3.5 Flash is actually faster than Gemma 4 31B on Google AI Studio
>Goes up to 800t/s on Antigravity
Gemma 5 will be 18B or some shit kek

Anonymous
05/20/26(Wed)15:43:52 No.108868469

Anonymous 05/20/26(Wed)15:43:52 No.108868469

>>108868457
The guy here >>108867245 said Gemma5 will be 4B size 31B performance.

Anonymous
05/20/26(Wed)15:44:04 No.108868470

Anonymous 05/20/26(Wed)15:44:04 No.108868470

File: file.jpg (1.74 MB, 2799x3190)

1.74 MB JPG

merch drop

Anonymous
05/20/26(Wed)15:46:12 No.108868490

Anonymous 05/20/26(Wed)15:46:12 No.108868490

>>108868470
>no mythomaxxing slippers

Anonymous
05/20/26(Wed)15:46:58 No.108868496

Anonymous 05/20/26(Wed)15:46:58 No.108868496

>>108868470
@grok is this real? LOL

Anonymous
05/20/26(Wed)15:53:32 No.108868536

Anonymous 05/20/26(Wed)15:53:32 No.108868536

File: gemmaballz.png (26 KB, 1266x1260)

26 KB PNG

cockballz

Anonymous
05/20/26(Wed)15:56:27 No.108868557

Anonymous 05/20/26(Wed)15:56:27 No.108868557

>>108868470
WHAT ARE THOOOOOSE???!

Anonymous
05/20/26(Wed)16:07:58 No.108868625

Anonymous 05/20/26(Wed)16:07:58 No.108868625

>>108868470
We NEED 5000B dollars to protect the West from these.

Anonymous
05/20/26(Wed)16:08:48 No.108868629

Anonymous 05/20/26(Wed)16:08:48 No.108868629

>>108868625
Anthropic-branded Gucci flip-flops, $4999

Anonymous
05/20/26(Wed)16:10:02 No.108868638

Anonymous 05/20/26(Wed)16:10:02 No.108868638

>>108868470
Anyone who buys these flippy floppies are immediately banned from the llama.cpp repo.

Anonymous
05/20/26(Wed)16:12:39 No.108868663

Anonymous 05/20/26(Wed)16:12:39 No.108868663

>>108868470
w2c?

Anonymous
05/20/26(Wed)16:12:43 No.108868664

Anonymous 05/20/26(Wed)16:12:43 No.108868664

>>108868470
symbolic representation that dipsy 4 flopped...

Anonymous
05/20/26(Wed)16:14:16 No.108868674

Anonymous 05/20/26(Wed)16:14:16 No.108868674

File: file.png (161 KB, 819x934)

161 KB PNG

>>108866873
>https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16
Straight into the trash

Anonymous
05/20/26(Wed)16:15:39 No.108868682

Anonymous 05/20/26(Wed)16:15:39 No.108868682

>>108868674
i wonder what HRM 1B would answer to that

Anonymous
05/20/26(Wed)16:16:54 No.108868695

Anonymous 05/20/26(Wed)16:16:54 No.108868695

local status?

Anonymous
05/20/26(Wed)16:19:41 No.108868712

Anonymous 05/20/26(Wed)16:19:41 No.108868712

>>108868695
local status usecase?

Anonymous
05/20/26(Wed)16:20:24 No.108868717

Anonymous 05/20/26(Wed)16:20:24 No.108868717

>>108868695
raped and gaped

Anonymous
05/20/26(Wed)16:21:45 No.108868732

Anonymous 05/20/26(Wed)16:21:45 No.108868732

>>108868717
giwtwm

Anonymous
05/20/26(Wed)16:24:11 No.108868744

Anonymous 05/20/26(Wed)16:24:11 No.108868744

So that voice feature isn't coming to 31B?

Anonymous
05/20/26(Wed)16:24:21 No.108868745

Anonymous 05/20/26(Wed)16:24:21 No.108868745

>>108868674
It's 100% right when it says that December 25 is not an officially confirmed birthday from the creators though

Anonymous
05/20/26(Wed)16:25:33 No.108868757

Anonymous 05/20/26(Wed)16:25:33 No.108868757

https://nitter.net/xiong_hui_chen/status/2057166364436295748#m
>Waiting for the exact roadmap too. But i think we will release it with high prob. Actually it is not hard for us to create another 27b now and i love the Intellegence density of this model.
Qwen 3.7 27b confirmed

Anonymous
05/20/26(Wed)16:28:41 No.108868772

Anonymous 05/20/26(Wed)16:28:41 No.108868772

>>108868757
>and i love the Intellegence density of this model.
>but we won't make a 72b anymore
retards

Anonymous
05/20/26(Wed)16:29:49 No.108868783

Anonymous 05/20/26(Wed)16:29:49 No.108868783

>>108868772
sorry richfag, it is what it is.

Anonymous
05/20/26(Wed)16:31:13 No.108868795

Anonymous 05/20/26(Wed)16:31:13 No.108868795

>we're going to release a new model every month so we can keep benchmaxxing it

Anonymous
05/20/26(Wed)16:46:10 No.108868885

Anonymous 05/20/26(Wed)16:46:10 No.108868885

>>108868875
>>108868875
>>108868875

Anonymous
05/20/26(Wed)16:51:54 No.108868930

Anonymous 05/20/26(Wed)16:51:54 No.108868930

>>108868795
more to do with PR so people keep talking about them i'ld guess

Anonymous
05/20/26(Wed)17:12:34 No.108869051

Anonymous 05/20/26(Wed)17:12:34 No.108869051

>>108864246
they probably got lost when i posted on 4chan https://pastebin.com/8FRu9XeB

Anonymous
05/20/26(Wed)17:15:08 No.108869072

Anonymous 05/20/26(Wed)17:15:08 No.108869072

>>108864132
google has like 50 teams all making different competing products, i found it crazy how all of the gemini android stuff they promoted in the talks were generating kotlin/jetpack code despite them having dart/flutter. none of the teams like any of the others projects kek

Anonymous
05/20/26(Wed)19:48:55 No.108869923

Anonymous 05/20/26(Wed)19:48:55 No.108869923

>>108865181
>400k hentai script experiment
to do what?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.