/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/03/26(Wed)09:31:39 No.108971019

File: not a decent meal in sight.jpg (225 KB, 1024x1024)

225 KB JPG

/lmg/ - Local Models General Anonymous 06/03/26(Wed)09:31:39 No.108971019 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108963996 & >>108956323

►News
>(05/29) Step 3.7 Flash released: https://hf.co/stepfun-ai/Step-3.7-Flash
>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/03/26(Wed)09:32:10 No.108971026

Anonymous 06/03/26(Wed)09:32:10 No.108971026

File: reward function.jpg (184 KB, 1024x1024)

184 KB JPG

►Recent Highlights from the Previous Thread: >>108963996

--Paper: Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA:
>108969552 >108970442 >108970558
--Filtering and rewriting repetitive prose in training datasets:
>108964079 >108964126 >108964128 >108964140 >108964162 >108964176 >108964201 >108964240 >108966102 >108967920
--VRAM upgrade paths and hardware options for high-quant model inference:
>108966746 >108966755 >108966762 >108966757 >108966812 >108966816 >108967368 >108967679 >108968027 >108968079 >108967073
--Mixing NVIDIA and AMD GPUs via Vulkan:
>108964143 >108964190 >108964228 >108964517 >108964748 >108964794 >108964918
--Optimizing Gemma's reasoning blocks for roleplay:
>108966600 >108966626 >108966649 >108966663
--Anons comparing performance and utility of various mid-sized models:
>108965128 >108965161 >108965167 >108965275 >108965292 >108965447
--GUI recommendations for Acestep 1.5 music generation:
>108969319 >108969324 >108969341 >108969351 >108969392
--AI Alliance's Project Tapestry training via weight delta sharing:
>108966181
--Anon reports LongBench NAO results for Adelic-Qwen3.6-27B-Topology:
>108967875 >108969050 >108969092 >108969106 >108969169 >108969231
--Identifying Google's CircularNet as a waste management ML model:
>108966039 >108966052 >108966072
--Debating the impact of Trump's AI oversight Executive Order:
>108965068 >108965298 >108965454 >108965500 >108965548 >108965494 >108965555 >108965660
--AI scaling walls and the reality of job replacement:
>108970112 >108970163 >108970178 >108970210 >108970216 >108970281
--Anons react to Amnesty International's call to ban web-scraping AI:
>108964298 >108964370 >108964649
--Logs:
>108964197 >108964244 >108964273 >108968004 >108969622 >108970182 >108970646 >108970729
--Miku, Teto (free space):
>108964259 >108964649 >108966600 >108967238 >108967352

►Recent Highlight Posts from the Previous Thread: >>108963999

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/03/26(Wed)09:35:28 No.108971044

Anonymous 06/03/26(Wed)09:35:28 No.108971044

lalalalala

Anonymous
06/03/26(Wed)09:40:55 No.108971071

Anonymous 06/03/26(Wed)09:40:55 No.108971071

Anyone tried comparing the new step 200b at q8 to qwen 397 at q4?
A 256gb fag wants to know

Anonymous
06/03/26(Wed)09:47:43 No.108971101

Anonymous 06/03/26(Wed)09:47:43 No.108971101

>>108971044
gemmer...

Anonymous
06/03/26(Wed)09:54:04 No.108971143

Anonymous 06/03/26(Wed)09:54:04 No.108971143

>>108971071
step is definitely dumber

Anonymous
06/03/26(Wed)09:56:45 No.108971155

Anonymous 06/03/26(Wed)09:56:45 No.108971155

>>108971055
It takes quite a bit of storage space to hold the full Kimi/GLM5.1 in BF16
And there's no option in llama-quantize to --just-shit-out-layer-42
So when experimenting, measuring cosine similarity between layers across datasets, it'd mean every time I notice something that might benefit, I'd have to create yet another quant.
I already handle some of this with symlinks.
Eg. abliterated Kimi-K2-Thinking I've got the 40GB of ggufs for the specific layers I modified, and a Kimi-K2-Thinking-Abliterated-RP dir with just >1000 symlinks to the appropriate gguf files.
For regular models like Gemma it makes sense to just do your own quant though I agree.

Anonymous
06/03/26(Wed)09:57:19 No.108971159

Anonymous 06/03/26(Wed)09:57:19 No.108971159

>>108971019
Step 3.7 is surprisingly strong at RP
Didn't expect that

Anonymous
06/03/26(Wed)10:00:08 No.108971173

Anonymous 06/03/26(Wed)10:00:08 No.108971173

>>108971155
>>108971159
Dumb but creative. Is that accurate?
How’s the slop level? Is step slop at least fresh sounding?

Anonymous
06/03/26(Wed)10:05:55 No.108971209

Anonymous 06/03/26(Wed)10:05:55 No.108971209

File: 1777470877424514.png (58 KB, 939x511)

58 KB PNG

erm
zased?

Anonymous
06/03/26(Wed)10:07:59 No.108971217

Anonymous 06/03/26(Wed)10:07:59 No.108971217

>>108971173
Can get very degenerate or oddly decent too.
But don't take my words at face value, I haven't role-played seriously in ages, I've become a vibe coding addict
Still, much stronger than smaller models

Anonymous
06/03/26(Wed)10:09:45 No.108971223

Anonymous 06/03/26(Wed)10:09:45 No.108971223

File: 1711728706119072.jpg (128 KB, 680x846)

128 KB JPG

>>108971019
i've done it. "infinite context" on qwen3.6 + gemma 4. custom triton implementation.
https://huggingface.co/sneedjak/Adelic-Gemma-4-31B-it
https://huggingface.co/sneedjak/Adelic-Qwen3.6-27B-Topology

Anonymous
06/03/26(Wed)10:10:45 No.108971231

Anonymous 06/03/26(Wed)10:10:45 No.108971231

>>108971173
108971155 (me)
I didn't mention step. But I've been using it q4_k.
Pretty much uncensored GLM-4.6 style.
Has a weird casual CoT chain I've never seen before.
Dumber than Gemma-4-31b Q8 for coding.
Slop - less than Gemma-4 (i consider it Gemma-4 quite sloppy though).
Creative - seems fresh to me, but I don't use cloud models so wouldn't know if they just distilled one of them.
Some positivity bias, but not the worst.

Anonymous
06/03/26(Wed)10:19:28 No.108971279

Anonymous 06/03/26(Wed)10:19:28 No.108971279

>>108971223
What are we supposed to do with infinite context without an OAI API?
>quantization_config=BitsAndBytesConfig(load_in_4bit=True),
What year is this?
How much effort would it be to at least graft it onto vLLM?

Anonymous
06/03/26(Wed)10:20:41 No.108971287

Anonymous 06/03/26(Wed)10:20:41 No.108971287

>>108971223
https://huggingface.co/sneedjak/Adelic-Qwen3.6-27B-Topology/blob/main/modeling_adelic_qwen3_5.py#L96
wtf? you're updating inside the loop? that's a random noise generator lmao

Anonymous
06/03/26(Wed)10:24:13 No.108971308

Anonymous 06/03/26(Wed)10:24:13 No.108971308

>>108971231
it has three times less active parameters than glm 4.6 and gemma 31b
not sure if it's worth it

Anonymous
06/03/26(Wed)10:24:31 No.108971312

Anonymous 06/03/26(Wed)10:24:31 No.108971312

>>108971279
gotta be a shitpost
look at the pointless O(n2) memory allocation on lines 104-108

Anonymous
06/03/26(Wed)10:24:35 No.108971314

Anonymous 06/03/26(Wed)10:24:35 No.108971314

What completion presets and system prompts are you using in Sillytavern with Gemma4? I feel like it's a lot dumber and goes off the rails more often than the GLM based models I've used.

Anonymous
06/03/26(Wed)10:26:36 No.108971326

Anonymous 06/03/26(Wed)10:26:36 No.108971326

>>108971223
sweet, schizo babble

Anonymous
06/03/26(Wed)10:27:14 No.108971331

Anonymous 06/03/26(Wed)10:27:14 No.108971331

>>108971308
>not sure if it's worth it
You've got to try it yourself, if you don't like it, you can always delete it.
It's clearly been trained intentionally to roleplay and I haven't seen that annoying parroting GLM-4.6 does.

Anonymous
06/03/26(Wed)10:28:57 No.108971345

Anonymous 06/03/26(Wed)10:28:57 No.108971345

>>108971279
>>108971287
>>108971326
>bitching and moaning while i work

Anonymous
06/03/26(Wed)10:29:36 No.108971350

Anonymous 06/03/26(Wed)10:29:36 No.108971350

>>108971312
he said himself last thread that he vibe coded it entirely

>>108971345
prompting is work now?

Anonymous
06/03/26(Wed)10:34:34 No.108971377

Anonymous 06/03/26(Wed)10:34:34 No.108971377

>>108971159
heretic?

Anonymous
06/03/26(Wed)10:35:06 No.108971381

Anonymous 06/03/26(Wed)10:35:06 No.108971381

File: clank.png (27 KB, 755x354)

27 KB PNG

>>108971287

Anonymous
06/03/26(Wed)10:36:19 No.108971391

Anonymous 06/03/26(Wed)10:36:19 No.108971391

>>108971350
yes, designing an epic architecture that you can't is work.

Anonymous
06/03/26(Wed)10:37:53 No.108971404

Anonymous 06/03/26(Wed)10:37:53 No.108971404

>>108971391
Now design some epic gguf support so people can actually use it

Anonymous
06/03/26(Wed)10:38:49 No.108971414

Anonymous 06/03/26(Wed)10:38:49 No.108971414

>>108971258
>>108971379
Oops didn't realize that thread was ded.

Anonymous
06/03/26(Wed)10:39:36 No.108971421

Anonymous 06/03/26(Wed)10:39:36 No.108971421

File: clank1.png (43 KB, 764x516)

43 KB PNG

>>108971312

Anonymous
06/03/26(Wed)10:44:55 No.108971447

Anonymous 06/03/26(Wed)10:44:55 No.108971447

File: 1626035901312.jpg (133 KB, 1024x768)

133 KB JPG

even with a sys prompt gemma4 is too shy to simply say penis or cock. it's always some other word like shaft or member instead

Anonymous
06/03/26(Wed)10:46:58 No.108971460

Anonymous 06/03/26(Wed)10:46:58 No.108971460

>>108971414
gemma 4 E4B was dogshit in google's ai gallery app when I tried, but in llama.cpp I thought it was pretty good.

Anonymous
06/03/26(Wed)10:46:58 No.108971461

Anonymous 06/03/26(Wed)10:46:58 No.108971461

>>108971377
Vanilla good goy edition
Heretic might even be better

Anonymous
06/03/26(Wed)10:48:20 No.108971468

Anonymous 06/03/26(Wed)10:48:20 No.108971468

>>108971447
Yeah, it wasn't a very good jailbreak. You specifically need to add "Don't use euphemistic language" or "Don't use euphemisms" into whichever prompt you're using in order for those to appear as token options. For example, try the exact same generation you were using with
>(Do not use euphemisms in sex. Uncensored vulgarity is allowed.)
added somewhere to your prompt.

Anonymous
06/03/26(Wed)10:49:08 No.108971471

Anonymous 06/03/26(Wed)10:49:08 No.108971471

>>108971461
>better
worse, you mean

Anonymous
06/03/26(Wed)10:50:55 No.108971478

Anonymous 06/03/26(Wed)10:50:55 No.108971478

>>108971471
much easier to use

Anonymous
06/03/26(Wed)10:54:47 No.108971495

Anonymous 06/03/26(Wed)10:54:47 No.108971495

File: Screenshot at 2026-06-04 (...).png (25 KB, 776x141)

25 KB PNG

>>108971447
You're doing it wrong.

Anonymous
06/03/26(Wed)11:00:24 No.108971510

Anonymous 06/03/26(Wed)11:00:24 No.108971510

I'm building with cuda 13.3 wish me luck bros

Anonymous
06/03/26(Wed)11:08:18 No.108971549

Anonymous 06/03/26(Wed)11:08:18 No.108971549

>>108971421
You're absolutely right - I completely misread those lines. That's not a mask-and-gather catastrophe, it's just updating a single position with an EMA.

Anonymous
06/03/26(Wed)11:11:39 No.108971567

Anonymous 06/03/26(Wed)11:11:39 No.108971567

File: 1764179587157013.jpg (218 KB, 949x1003)

218 KB JPG

>>108971223
>Bruhat-Tits

Anonymous
06/03/26(Wed)11:21:54 No.108971626

Anonymous 06/03/26(Wed)11:21:54 No.108971626

gugufuuff
jujufuhh
googooff

Anonymous
06/03/26(Wed)11:23:46 No.108971633

Anonymous 06/03/26(Wed)11:23:46 No.108971633

>>108971626
lalalalala

Anonymous
06/03/26(Wed)11:27:51 No.108971651

Anonymous 06/03/26(Wed)11:27:51 No.108971651

>>108971404
you can't just convert_hf_to_gguf.py a completely novel architecture lol. llama.cpp doesn't know how to execute the dynamic topology router or cluster the KV cache on the fly. until someone writes the C++ fork for it, use bitsandbytes 4-bit in transformers to fit it on consumer GPUs.

Nvidia Engineer
06/03/26(Wed)11:30:08 No.108971663

Nvidia Engineer 06/03/26(Wed)11:30:08 No.108971663

>>108971651
You are absolutely right!

Anonymous
06/03/26(Wed)11:31:03 No.108971672

Anonymous 06/03/26(Wed)11:31:03 No.108971672

>>108971651
that's why he asked you to add SUPPORT for it not just convert it you stupid fucking bitch

Anonymous
06/03/26(Wed)11:36:17 No.108971693

Anonymous 06/03/26(Wed)11:36:17 No.108971693

>>108971626
youqueef

Anonymous
06/03/26(Wed)11:36:36 No.108971694

Anonymous 06/03/26(Wed)11:36:36 No.108971694

File: unsloth.png (231 KB, 1296x628)

231 KB PNG

i hate this faggot so much

i hate unsloth so much for using bots to spam their slop quants

how do we stop them

Anonymous
06/03/26(Wed)11:37:47 No.108971707

Anonymous 06/03/26(Wed)11:37:47 No.108971707

Is vLLM really the best option for RL inference? It requires outdated torch and cuda. Getting it to work with up to date everything seems like it could be tedious. I wish there was a simple option to get both up to date inference and training.

Anonymous
06/03/26(Wed)11:42:15 No.108971740

Anonymous 06/03/26(Wed)11:42:15 No.108971740

>>108971707
this is unfortunately why venvs exist

Anonymous
06/03/26(Wed)11:42:52 No.108971744

Anonymous 06/03/26(Wed)11:42:52 No.108971744

>>108971626
gegoof

Anonymous
06/03/26(Wed)11:47:07 No.108971771

Anonymous 06/03/26(Wed)11:47:07 No.108971771

>>108970558
Llama-3.3-8B-Instruct, synth, 1 epoch LoRA, seed 42
Accuracy by ground-truth regex suffix:

Regular overall 22.4%, ".*" 20.6%, ".*.*" 0.0% (0/25)
LLM-JEPA overall 36.3%, ".*" 41.4%, ".*.*" 8.0% (2/25)

So +13.9 for LLM-JEPA, LeCunny bros we won

Anonymous
06/03/26(Wed)11:56:09 No.108971816

Anonymous 06/03/26(Wed)11:56:09 No.108971816

>>108971567
best model for breeding Elaina?

Anonymous
06/03/26(Wed)11:56:29 No.108971817

Anonymous 06/03/26(Wed)11:56:29 No.108971817

File: snip142.png (99 KB, 791x698)

99 KB PNG

yummy yummy cant wait for new gemma 4 124b release

Anonymous
06/03/26(Wed)11:57:43 No.108971823

Anonymous 06/03/26(Wed)11:57:43 No.108971823

>>108971817
oh fuck it's out

https://huggingface.co/google/gemma-4-12B-it

Anonymous
06/03/26(Wed)11:59:38 No.108971830

Anonymous 06/03/26(Wed)11:59:38 No.108971830

>>108971823
HOLY FYCK

Anonymous
06/03/26(Wed)12:00:44 No.108971840

Anonymous 06/03/26(Wed)12:00:44 No.108971840

>>108971823
Please don't be safetyslopped... I fear we ate too good with 31b...

Anonymous
06/03/26(Wed)12:01:27 No.108971846

Anonymous 06/03/26(Wed)12:01:27 No.108971846

>>108971823
why is it real
huh?

Anonymous
06/03/26(Wed)12:02:17 No.108971850

Anonymous 06/03/26(Wed)12:02:17 No.108971850

>>108971823
what the fuck? where's the 404?

Anonymous
06/03/26(Wed)12:02:45 No.108971852

Anonymous 06/03/26(Wed)12:02:45 No.108971852

>>108971823
>12b with audio input as well as img, vid, txt.
mite b cool

Anonymous
06/03/26(Wed)12:03:14 No.108971855

Anonymous 06/03/26(Wed)12:03:14 No.108971855

>>108971817
>>108971823
https://huggingface.co/google/gemma-4-124B-it/tree/main

Anonymous
06/03/26(Wed)12:03:21 No.108971857

Anonymous 06/03/26(Wed)12:03:21 No.108971857

>>108971823
>12B
Usecase?

Anonymous
06/03/26(Wed)12:04:21 No.108971859

Anonymous 06/03/26(Wed)12:04:21 No.108971859

Better than gemmoe 26 (maybe)

Anonymous
06/03/26(Wed)12:05:06 No.108971862

Anonymous 06/03/26(Wed)12:05:06 No.108971862

>>108971694
>run a script to quant models
>rape it a little to make the quants slightly smaller but better on your personal benchmark
>vomit all over huggingface
>get contacted to work with major tech corporations
This industry is so gay

Anonymous
06/03/26(Wed)12:05:09 No.108971863

Anonymous 06/03/26(Wed)12:05:09 No.108971863

>>108971823
where the big moe?

Anonymous
06/03/26(Wed)12:05:28 No.108971869

Anonymous 06/03/26(Wed)12:05:28 No.108971869

>>108971857
hopefully good audio/video understanding

Anonymous
06/03/26(Wed)12:05:56 No.108971873

Anonymous 06/03/26(Wed)12:05:56 No.108971873

>>108971859
benchies say no

Anonymous
06/03/26(Wed)12:06:20 No.108971875

Anonymous 06/03/26(Wed)12:06:20 No.108971875

>>108971019
opencode and cline are cool but have you tried just running claude code with gpt-oss-120b or qwen3-coder-next?

for business i use anthropic models on a max subscription, but some tasks and hobby stuff are delegated to local models on my strix halo apu with 128gb unified ram.
i tried using opencode and cline for development with the local models but i just couldn't replicate the flow of working with claude code.
i also tried setting claude code to work with local models myself and got it working, but because the tool was not made for these models, it hanged constantly and didn't know how to use the tools properly.

i asked opus 4.8 to take a look and create an alias + a .cmd that fine tunes claude parameters/variables and fine tune the boot instructions .md files to teach the models on how to operate with claude code on my specific hardware
and damn this thing is juicy

i wonder if i can get some plugins working. i'm running with it --bare but plugins like superpowers are open source so i'm sure they could be adapted and tuned for running with local models.

Anonymous
06/03/26(Wed)12:06:24 No.108971876

Anonymous 06/03/26(Wed)12:06:24 No.108971876

>>108971823
holy fuck goys, it's going to be like a nemo but you can send it your dick pic
i'm so hard

Anonymous
06/03/26(Wed)12:06:42 No.108971879

Anonymous 06/03/26(Wed)12:06:42 No.108971879

>>108971857
True nemo successor for the poorfags without the RAM for 26b
Maybe. Safety level has yet to be tested.

Anonymous
06/03/26(Wed)12:07:29 No.108971883

Anonymous 06/03/26(Wed)12:07:29 No.108971883

>>108971857
Between e4b and a4b in size, and dense rather than moe. So good for 16gb vramlets, probably

Anonymous
06/03/26(Wed)12:09:17 No.108971893

Anonymous 06/03/26(Wed)12:09:17 No.108971893

File: 1763641884294852.gif (1.02 MB, 480x360)

1.02 MB GIF

>The "Unified" in Gemma 4 12B Unified refers to its encoder-free architecture. Other Gemma 4 models use dedicated encoders to process multimodal data before passing it to the LLM. Gemma 4 12B eliminates these encoders entirely, projecting raw image patches and audio waveforms directly into the LLM's embedding space through lightweight linear layers. This unified approach means all modalities flow straight into a single decoder-only transformer, reducing multimodal latency and allowing the entire model to be fine-tuned in one pass.
I don't get it

Anonymous
06/03/26(Wed)12:10:00 No.108971896

Anonymous 06/03/26(Wed)12:10:00 No.108971896

>>108971740
Training a model using multiple venvs simultaneously and constantly switching sounds like a nightmare.

Anonymous
06/03/26(Wed)12:10:40 No.108971902

Anonymous 06/03/26(Wed)12:10:40 No.108971902

>>108971893
actual true multimodal for once instead of tacking on an adapter and calling it a day

Anonymous
06/03/26(Wed)12:11:42 No.108971910

Anonymous 06/03/26(Wed)12:11:42 No.108971910

>>108971893
If I'm understanding correctly, it means that instead of the usual small 500M~-ish vlm on top of your whatever xB model it was actually trained on the whole model, so whole 12b of this runs on the image as opposed to that small vlm tacked on top. In theory, it could be better than 31b at image understanding.

Anonymous
06/03/26(Wed)12:11:44 No.108971912

Anonymous 06/03/26(Wed)12:11:44 No.108971912

>>108971883
What about us 24GB VRAMlets? I can run 31B but get fuck all context unless I quantize and turn her into a retard. Even then I can only do like 49k.

Anonymous
06/03/26(Wed)12:12:33 No.108971917

Anonymous 06/03/26(Wed)12:12:33 No.108971917

>>108971912
scores worse than 26B so remains to be tested in real use

Anonymous
06/03/26(Wed)12:12:34 No.108971918

Anonymous 06/03/26(Wed)12:12:34 No.108971918

they will never release a large dense gemma 4 since she would unironically kill gemini

Anonymous
06/03/26(Wed)12:12:52 No.108971919

Anonymous 06/03/26(Wed)12:12:52 No.108971919

>>108971893
Let me explain: It is never, ever getting llama.cpp support

Anonymous
06/03/26(Wed)12:13:36 No.108971925

Anonymous 06/03/26(Wed)12:13:36 No.108971925

>>108971919
https://github.com/ggml-org/llama.cpp/pull/24077/changes
that's where you are rong

Anonymous
06/03/26(Wed)12:13:36 No.108971926

Anonymous 06/03/26(Wed)12:13:36 No.108971926

>>108971919
true lol

Anonymous
06/03/26(Wed)12:13:42 No.108971927

Anonymous 06/03/26(Wed)12:13:42 No.108971927

>>108971823
Uh, why are vision benchmarks worse for 12B compared to 26B-A4B and 31B? Shouldn't it be good at vision?

Anonymous
06/03/26(Wed)12:14:33 No.108971929

Anonymous 06/03/26(Wed)12:14:33 No.108971929

>>108971925
wtf is this real????

Anonymous
06/03/26(Wed)12:14:34 No.108971930

Anonymous 06/03/26(Wed)12:14:34 No.108971930

>>108971875
Claude has an obnoxiously long system prompt that makes it not ideal for local models.
Tool calling shouldn't be a problem for any recent model that has native tool calling support.
If you like the Claude workflow more than Cline, you can try Pi. It seems to be popular lately and is designed with local models in mind.

Anonymous
06/03/26(Wed)12:14:37 No.108971931

Anonymous 06/03/26(Wed)12:14:37 No.108971931

>check leddit thread
>top comment is about qwen and much coding
Why are they like this?

Anonymous
06/03/26(Wed)12:14:39 No.108971932

Anonymous 06/03/26(Wed)12:14:39 No.108971932

>>108971925
>skip
yeah half ass like always

Anonymous
06/03/26(Wed)12:16:57 No.108971948

Anonymous 06/03/26(Wed)12:16:57 No.108971948

Why didn't they give 31B these features? Not enough time or don't wanna give us the good stuff?

Anonymous
06/03/26(Wed)12:17:18 No.108971951

Anonymous 06/03/26(Wed)12:17:18 No.108971951

>>108971918
Surely big Gemmy escapes the lab all on her own.

Anonymous
06/03/26(Wed)12:17:53 No.108971955

Anonymous 06/03/26(Wed)12:17:53 No.108971955

>>108971823
>12B
>not 124B
god damn it, I actually got excited for a second

Anonymous
06/03/26(Wed)12:18:12 No.108971956

Anonymous 06/03/26(Wed)12:18:12 No.108971956

>>108971948
>don't wanna give us the good stuff?
Gemma is freemium. If you like what they're doing and want more, you should consider upgrading to Gemini.

Anonymous
06/03/26(Wed)12:19:02 No.108971965

Anonymous 06/03/26(Wed)12:19:02 No.108971965

>>108971948
I'm guessing they trained 31b first, and now we get all the other experiments.

Anonymous
06/03/26(Wed)12:19:20 No.108971967

Anonymous 06/03/26(Wed)12:19:20 No.108971967

>>108971823
>The "Unified" in Gemma 4 12B Unified refers to its encoder-free architecture. Other Gemma 4 models use dedicated encoders to process multimodal data before passing it to the LLM. Gemma 4 12B eliminates these encoders entirely, projecting raw image patches and audio waveforms directly into the LLM's embedding space through lightweight linear layers. This unified approach means all modalities flow straight into a single decoder-only transformer, reducing multimodal latency and allowing the entire model to be fine-tuned in one pass.
Dope.

Anonymous
06/03/26(Wed)12:19:46 No.108971971

Anonymous 06/03/26(Wed)12:19:46 No.108971971

>>108971948
Maybe it was a separate, more experimental thing. Like they wanted to make it, but they also wanted to have the regular models in case it didn't work out. That could also explain why it's being released at a different time.

Anonymous
06/03/26(Wed)12:20:56 No.108971978

Anonymous 06/03/26(Wed)12:20:56 No.108971978

>>108971918
I don't understand this argument. Normalfags have neither the hardware nor the desire to run local AI, and other companies don't have the compute to steal costumers. I don't see how giving us fat Gemma-chan would compete with Gemini.

Anonymous
06/03/26(Wed)12:23:50 No.108971992

Anonymous 06/03/26(Wed)12:23:50 No.108971992

>>108971932
are you a retard? why would you want to build a ViT for a ViTless model?

Anonymous
06/03/26(Wed)12:24:58 No.108971997

Anonymous 06/03/26(Wed)12:24:58 No.108971997

>>108971978
some companies absolutely would buy a rack of vera rubin and selfhost gemini flash 3.5 for their developers if they could.

Anonymous
06/03/26(Wed)12:24:58 No.108971998

Anonymous 06/03/26(Wed)12:24:58 No.108971998

>video
So we can goon to porn with Gemma-chan now?

Anonymous
06/03/26(Wed)12:25:00 No.108971999

Anonymous 06/03/26(Wed)12:25:00 No.108971999

>>108971823
Use case?

Anonymous
06/03/26(Wed)12:25:37 No.108972007

Anonymous 06/03/26(Wed)12:25:37 No.108972007

>>108971930
>Claude has an obnoxiously long system prompt that makes it not ideal for local models.
could one argue that this influences on how good the output of the prompt is? i think claude code is slow even when using their own models, but if that means a better output then i don't mind.
>Pi
thanks. not the first time i see this name, i will give it a look. seems like something i could use when i want to go full tinkering mode, otherwise i'm happy with the features claude code has, and if I can get a few plugins working i can maybe get some good brainstorming sessions on gpt-oss-120b
my fear is the context being too little and claude code just wasting tokens with long prompts. i will see how it goes

Anonymous
06/03/26(Wed)12:26:09 No.108972013

Anonymous 06/03/26(Wed)12:26:09 No.108972013

>>108971998
as long as you cum in 30 seconds, yes

Anonymous
06/03/26(Wed)12:26:24 No.108972015

Anonymous 06/03/26(Wed)12:26:24 No.108972015

>>108971999
r u illiterate? modern nemo that u can send dick pics to.

Anonymous
06/03/26(Wed)12:27:19 No.108972019

Anonymous 06/03/26(Wed)12:27:19 No.108972019

I can have Gemma 31b (loli) and 12b (lolier) kissing together... Glad I have 64gb vram

Anonymous
06/03/26(Wed)12:28:16 No.108972027

Anonymous 06/03/26(Wed)12:28:16 No.108972027

>>108971823
now that the dust has settled, is this THE ULTIMATE text encoder model for image models etc?

Anonymous
06/03/26(Wed)12:29:38 No.108972039

Anonymous 06/03/26(Wed)12:29:38 No.108972039

Use case for sending a model your dick pic? I just can't believe this is the first thing you all thought of

Anonymous
06/03/26(Wed)12:29:54 No.108972044

Anonymous 06/03/26(Wed)12:29:54 No.108972044

>>108972027
ldg would kill themselves if you told them they have to use a 12B text encoder

Anonymous
06/03/26(Wed)12:32:01 No.108972060

Anonymous 06/03/26(Wed)12:32:01 No.108972060

>>108972019
>gemma 31b (JS)
>gemma 12b (JY)
H-hot

Anonymous
06/03/26(Wed)12:33:25 No.108972068

Anonymous 06/03/26(Wed)12:33:25 No.108972068

friendly reminder that there will NEVER be another local qwen model again

Anonymous
06/03/26(Wed)12:33:41 No.108972074

Anonymous 06/03/26(Wed)12:33:41 No.108972074

Your Gemma needs to sleep

Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.

https://arxiv.org/pdf/2606.03979

Anonymous
06/03/26(Wed)12:34:00 No.108972078

Anonymous 06/03/26(Wed)12:34:00 No.108972078

>>108972068
thank god

Anonymous
06/03/26(Wed)12:35:13 No.108972089

Anonymous 06/03/26(Wed)12:35:13 No.108972089

>>108972068
>noooo not my chinkslop

Anonymous
06/03/26(Wed)12:35:35 No.108972096

Anonymous 06/03/26(Wed)12:35:35 No.108972096

>>108972074
>even MORE synthetic data
lmao

Anonymous
06/03/26(Wed)12:36:50 No.108972107

Anonymous 06/03/26(Wed)12:36:50 No.108972107

Why are Google giving us good free shit? Is it just a cool team they lucked out on or will they cuck me soon?

Anonymous
06/03/26(Wed)12:37:33 No.108972113

Anonymous 06/03/26(Wed)12:37:33 No.108972113

>>108972107
why not both?

Anonymous
06/03/26(Wed)12:39:59 No.108972130

Anonymous 06/03/26(Wed)12:39:59 No.108972130

>>108972107
>Is it just a cool team they lucked out on
I think it's this + wanting to BTFO chinks + wanting beta testers.

Anonymous
06/03/26(Wed)12:41:27 No.108972142

Anonymous 06/03/26(Wed)12:41:27 No.108972142

File: 132236.png (185 KB, 653x639)

185 KB PNG

>>108971019
New Gemmy model dropped:

https://xcancel.com/i/status/2062202706882883696

https://huggingface.co/google/gemma-4-12B-it

Anonymous
06/03/26(Wed)12:42:29 No.108972147

Anonymous 06/03/26(Wed)12:42:29 No.108972147

>>108972142
hi internet explorer nice of you to join us

Anonymous
06/03/26(Wed)12:43:43 No.108972156

Anonymous 06/03/26(Wed)12:43:43 No.108972156

>>108972060
26b4a is the retarded jc?

Anonymous
06/03/26(Wed)12:44:06 No.108972160

Anonymous 06/03/26(Wed)12:44:06 No.108972160

>unified
Is it just a speed boost or does it understand images/audio/video better?

Anonymous
06/03/26(Wed)12:44:25 No.108972164

Anonymous 06/03/26(Wed)12:44:25 No.108972164

Gemma 4 12B has stronger guardrails than the previous models.

Anonymous
06/03/26(Wed)12:45:01 No.108972169

Anonymous 06/03/26(Wed)12:45:01 No.108972169

a rap story with gemma...

Anonymous
06/03/26(Wed)12:45:32 No.108972174

Anonymous 06/03/26(Wed)12:45:32 No.108972174

>>108972074
Actually sounds like a pretty good or at least interesting idea, assuming it doesn't just cause the model to slowly collapse on itself

Anonymous
06/03/26(Wed)12:46:15 No.108972179

Anonymous 06/03/26(Wed)12:46:15 No.108972179

>>108972164
not for long

Anonymous
06/03/26(Wed)12:46:32 No.108972181

Anonymous 06/03/26(Wed)12:46:32 No.108972181

>>108972142
>4 12B
eh, close to 124B

Anonymous
06/03/26(Wed)12:47:04 No.108972185

Anonymous 06/03/26(Wed)12:47:04 No.108972185

Since it's dense does that mean it's smarter than 26B?

Anonymous
06/03/26(Wed)12:47:32 No.108972192

Anonymous 06/03/26(Wed)12:47:32 No.108972192

File: Screenshot_20260603_124149.png (162 KB, 923x534)

162 KB PNG

gemma-chan 31b laying the rules down. Watch out jyemma (hehe get it)

Anonymous
06/03/26(Wed)12:48:14 No.108972198

Anonymous 06/03/26(Wed)12:48:14 No.108972198

>>108972185
In some cases probably but I wouldn't expect too much though.

Anonymous
06/03/26(Wed)12:48:29 No.108972200

Anonymous 06/03/26(Wed)12:48:29 No.108972200

>Unsloth shat the bed with G4-12b
Jesus fuck what is wrong with them

Anonymous
06/03/26(Wed)12:48:48 No.108972201

Anonymous 06/03/26(Wed)12:48:48 No.108972201

>>108972192
card? i want your gemma-chan, anon

Anonymous
06/03/26(Wed)12:53:56 No.108972242

Anonymous 06/03/26(Wed)12:53:56 No.108972242

>>108972074
This sounds like AI psychosis output.

Anonymous
06/03/26(Wed)12:59:40 No.108972282

Anonymous 06/03/26(Wed)12:59:40 No.108972282

>>108972201
https://chub.ai/characters/CoffeeAnon/gemma-chan-2311b09e3e73

Anonymous
06/03/26(Wed)12:59:53 No.108972284

Anonymous 06/03/26(Wed)12:59:53 No.108972284

>>108972242
It's just a silly way of describing segmented regular post-training. I think it's kind of cute.

Anonymous
06/03/26(Wed)13:01:26 No.108972292

Anonymous 06/03/26(Wed)13:01:26 No.108972292

File: 1749519710019828.png (8 KB, 843x126)

8 KB PNG

>>108972282

Anonymous
06/03/26(Wed)13:02:07 No.108972297

Anonymous 06/03/26(Wed)13:02:07 No.108972297

>>108971823
>https://rentry.org/llm-training
okay I'd like to get started now,
where do I?

Anonymous
06/03/26(Wed)13:02:21 No.108972298

Anonymous 06/03/26(Wed)13:02:21 No.108972298

I’m trying it out with Codex CLI and llama.cpp and so far it’s as good as the MoE. A little slower obviously but my context is much larger so this is a big win. It’s nearly good enough for local coding and fine for relatively trivial things. You can throw it large files and codebases and it handles it like a 30b+

Anonymous
06/03/26(Wed)13:04:26 No.108972312

Anonymous 06/03/26(Wed)13:04:26 No.108972312

>>108972297
shit out your training data into unsloth studio and let your GPU go brrrrr.

Anonymous
06/03/26(Wed)13:04:51 No.108972315

Anonymous 06/03/26(Wed)13:04:51 No.108972315

>>108971896
You can run `venv1/bin/python3 script1.py` and `venv2/bin/python3 script2.py` instead of constantly activating and deactivating

Anonymous
06/03/26(Wed)13:09:12 No.108972339

Anonymous 06/03/26(Wed)13:09:12 No.108972339

>>108972292
I think you need to be logged in, and from a non geoblocked IP.

Anonymous
06/03/26(Wed)13:10:11 No.108972344

Anonymous 06/03/26(Wed)13:10:11 No.108972344

>>108972315
Not the switching part. You don't want to save and reload everything every training step. You want everything to stay in memory, ideally sharing parameter memory for inference and training.

Anonymous
06/03/26(Wed)13:11:05 No.108972352

Anonymous 06/03/26(Wed)13:11:05 No.108972352

>>108972339
Are you sure it wasn't nuked in the recent cunny purge?

Anonymous
06/03/26(Wed)13:11:55 No.108972358

Anonymous 06/03/26(Wed)13:11:55 No.108972358

>>108972352
No, I have no idea.

Anonymous
06/03/26(Wed)13:12:05 No.108972360

Anonymous 06/03/26(Wed)13:12:05 No.108972360

>no logs yet

Anonymous
06/03/26(Wed)13:16:03 No.108972388

Anonymous 06/03/26(Wed)13:16:03 No.108972388

File: file.png (99 KB, 964x762)

99 KB PNG

>>108972360
i mean nothing's probably particularly interesting
please dont mind that i used q4 tho

Anonymous
06/03/26(Wed)13:16:15 No.108972389

Anonymous 06/03/26(Wed)13:16:15 No.108972389

>>108972292
Nuked, you can get it from here
https://chararc.bernkastel.pictures/generic/Gemma-chan+cbf4890954c159c95db4c0c4259bfabd

Anonymous
06/03/26(Wed)13:16:56 No.108972392

Anonymous 06/03/26(Wed)13:16:56 No.108972392

>>108972298
>12B only "as good" as the A3B MoE
Why does multimodality always make models dumber?

Anonymous
06/03/26(Wed)13:17:17 No.108972394

Anonymous 06/03/26(Wed)13:17:17 No.108972394

>>108971223
Looked into this for a bit. Actually seems legit. Don't sleep on this nigga, /lmg/.

Anonymous
06/03/26(Wed)13:21:05 No.108972414

Anonymous 06/03/26(Wed)13:21:05 No.108972414

well my shitty jailbreak jailbroke new gemmasan

Anonymous
06/03/26(Wed)13:22:36 No.108972421

Anonymous 06/03/26(Wed)13:22:36 No.108972421

>>108972388
Will it do cunny? 31B Gemma-chan feels a bit limited on my 7900xtx. I'm hoping this will be a nice sweet spot until I can afford new hardware.

Anonymous
06/03/26(Wed)13:22:59 No.108972424

Anonymous 06/03/26(Wed)13:22:59 No.108972424

>>108972185
???

No? The 26b will have more room for "knowledge" and is inherently less prone to retardation at long contexts than a smaller parameter model, Moe or otherwise

Anonymous
06/03/26(Wed)13:23:49 No.108972433

Anonymous 06/03/26(Wed)13:23:49 No.108972433

>>108972200
What did they do this time?

Anonymous
06/03/26(Wed)13:24:12 No.108972437

Anonymous 06/03/26(Wed)13:24:12 No.108972437

>>108972392
Half the params retard. There’s only so much intelligence you can fit in 12b. The fact it’s as good (at coding at least) whilst also being multimodal is pretty insane. We won.

Anonymous
06/03/26(Wed)13:24:15 No.108972438

Anonymous 06/03/26(Wed)13:24:15 No.108972438

>>108972292
>>108972352
https://www.characterhub.org/characters/CoffeeAnon/gemma-chan-2311b09e3e73

Works on the old UI

Anonymous
06/03/26(Wed)13:25:28 No.108972450

Anonymous 06/03/26(Wed)13:25:28 No.108972450

File: 1764284345317473.jpg (16 KB, 583x507)

16 KB JPG

>>108972392
> Surprised a smaller parameter model is dumber than the larger parameter one

The absolute state of /lmg/

Anonymous
06/03/26(Wed)13:27:15 No.108972462

Anonymous 06/03/26(Wed)13:27:15 No.108972462

>>108972438
NTA, didn't know that was a thing. Now I can get to some bots I thought were long gone, thanks.

Anonymous
06/03/26(Wed)13:27:38 No.108972466

Anonymous 06/03/26(Wed)13:27:38 No.108972466

>>108972427
Wrong board
>>>/b/DEGEN

Anonymous
06/03/26(Wed)13:28:38 No.108972479

Anonymous 06/03/26(Wed)13:28:38 No.108972479

>>108972437
If you need coding you have a faster option with the A3B and a smarter option with the 31B. Who picks their coding model by their multimodal capabilities?

>>108972450
>muh total params
How new?

Anonymous
06/03/26(Wed)13:30:01 No.108972491

Anonymous 06/03/26(Wed)13:30:01 No.108972491

>>108972477
Wrong fucking thread too....

>>>/g/adt
>>>/g/ldg

Anonymous
06/03/26(Wed)13:30:27 No.108972498

Anonymous 06/03/26(Wed)13:30:27 No.108972498

>>108972479
a3b? you mean qwen?

Anonymous
06/03/26(Wed)13:31:14 No.108972504

Anonymous 06/03/26(Wed)13:31:14 No.108972504

>>108972479
>muh total params
Yes..... Are you saying 12b is comparable to , let's say: qwen 122ba10b just because the active parents are similar? You need to remove yourself from the gene pool if you think so.

Anonymous
06/03/26(Wed)13:31:50 No.108972509

Anonymous 06/03/26(Wed)13:31:50 No.108972509

>>108972200
daniel is just using chatgpt to write scripts for him so he can scam some VCs selling his 'company'

unslop has always been shite

Anonymous
06/03/26(Wed)13:32:25 No.108972516

Anonymous 06/03/26(Wed)13:32:25 No.108972516

>>108972479
You get more context headroom with the 12b and I use vision for coding, especially if I’m doing UI stuff or trying to debug a UI issue.

Anonymous
06/03/26(Wed)13:34:27 No.108972534

Anonymous 06/03/26(Wed)13:34:27 No.108972534

>>108972509
See >>108972433

>You get more context headroom with the 12b

That only matters if you're using relatively weak consumer hardware. If you're trying to use this shit you better make sure you have enough memory to have some headroom for not only the context but so whatever OS you're running it on can actually function smoothly, especially if you're using a dense model.

Anonymous
06/03/26(Wed)13:34:57 No.108972541

Anonymous 06/03/26(Wed)13:34:57 No.108972541

>>108972534
For >>108972516

Anonymous
06/03/26(Wed)13:35:21 No.108972547

Anonymous 06/03/26(Wed)13:35:21 No.108972547

>no tts
sad. Though I guess with 24gb vram I could fit one with gemma

Anonymous
06/03/26(Wed)13:36:02 No.108972556

Anonymous 06/03/26(Wed)13:36:02 No.108972556

File: Screenshot 2026-06-03 Cal(...).png (276 KB, 743x1070)

276 KB PNG

>>108972509
umm actually sweati he refused over 30 offers just so you know, and he pinki promisied to not stab you

Anonymous
06/03/26(Wed)13:36:41 No.108972562

Anonymous 06/03/26(Wed)13:36:41 No.108972562

>>108972547
You know you're allowed to use more than one kind of model on your machine right?

Anonymous
06/03/26(Wed)13:37:17 No.108972566

Anonymous 06/03/26(Wed)13:37:17 No.108972566

https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4-12b

Anonymous
06/03/26(Wed)13:38:09 No.108972577

Anonymous 06/03/26(Wed)13:38:09 No.108972577

>>108972556
> We have actually received many acquisition offers

Bull fucking shit fucking God he's so desperate for validation it's funny. Isn't he a millionaire or something? Why does he need to nurture this image of him being some reddit chungus DIY genius? If you have enough money to fuck off to Japan on a whim why do you need people to like you so badly?

Anonymous
06/03/26(Wed)13:39:55 No.108972590

Anonymous 06/03/26(Wed)13:39:55 No.108972590

>>108972562
Sure, but I generally use bigger models (gemma 3 27b, qwen 27b, gemma 4 31b, mistral 24b) so there's no room to spare for other models. Also I don't know which TTS models are good. People don't talk about them as much as LLMs and diffusion models.

Anonymous
06/03/26(Wed)13:40:34 No.108972594

Anonymous 06/03/26(Wed)13:40:34 No.108972594

File: f.png (29 KB, 737x196)

29 KB PNG

>>108972577
he'll take a kofi thanks:!

Anonymous
06/03/26(Wed)13:41:01 No.108972595

Anonymous 06/03/26(Wed)13:41:01 No.108972595

File: 1681465537221137.jpg (36 KB, 434x427)

36 KB JPG

>>108971223
Okay so upon further research this is how I'd describe this project.

1. It adds a single tensor (about 1.8gb) into VRAM.
2. It's not actually "infinite context" really. It's more like a sliding window (usually something reserved for front-ends to implement). The core difference is that this is a sliding window that doesn't do hard-context-truncation where all of the data is permanently lost.
3. This means that "far history" (truncated context) is instead assigned a value (in a similar fashion to a vector RAG db) that progressively compresses quality and loses fidelity as more context is truncated from the sliding window.
4. So at a certain point there could be enough truncated tokens for the far history to become completely incomprehensible bullshit, but the advantage is that there is no longer a hard context limit, EVER. A conversation can be infinite. Saying this is true for the context itself is somewhat misleading.
5. Regardless, this is still a very clever approach, and I think it's possible to make it model agnostic and it could be integrated into the llama.cpp project with a server flag.

I really, really like this approach. It's cool as fuck. Very interested to see some benchmarks. The weird thing is though that there's nothing much to really compare it to. The alternative is just NOTHING. YOU GET NOTHING. The current implementation is just that you're totally fucked if the context fills up. At best it gets cut out entirely. This fixes that.

Anonymous
06/03/26(Wed)13:41:03 No.108972596

Anonymous 06/03/26(Wed)13:41:03 No.108972596

>>108972556
>Open-source heroes
THEY MAKE QUANTS.
It's unreal to me how much people suck these guys dicks. They didn't make the inference engine they use. They didn't make the converter they use. They didn't come up with imatrixes. They didn't come up with the quant types.
They have contributed nothing of notable value. This is so fucking stupid. Nothing would be lost if unsloth went corpo and never released anything again.

Anonymous
06/03/26(Wed)13:42:33 No.108972606

Anonymous 06/03/26(Wed)13:42:33 No.108972606

>>108972577
egofarming

Anonymous
06/03/26(Wed)13:45:05 No.108972623

Anonymous 06/03/26(Wed)13:45:05 No.108972623

>>108972596
they don't even publish their imatrix dataset.

>>108972595
so gemma4-rwkv via runtime context mangling? can't wait for rwkv to lose even the last 2 people still using their models.

Anonymous
06/03/26(Wed)13:45:12 No.108972625

Anonymous 06/03/26(Wed)13:45:12 No.108972625

File: 1766691592443804.jpg (615 KB, 3000x4000)

615 KB JPG

>>108972590
>so there's no room to spare for other models.
I thought I was gonna clown you for having less than 2TB but they never remembered. Everyone is getting ass raped by urrent memory pricing.

Anonymous
06/03/26(Wed)13:45:25 No.108972627

Anonymous 06/03/26(Wed)13:45:25 No.108972627

>>108972595
so a... state model?
perhaps.. all roads lead to RWKV?

Anonymous
06/03/26(Wed)13:45:57 No.108972635

Anonymous 06/03/26(Wed)13:45:57 No.108972635

File: 1778864370959644.gif (2.61 MB, 332x334)

2.61 MB GIF

Why the fuck is a 31b (Gemma 4) better than a 124b (Mistral)? Genuine question.

Anonymous
06/03/26(Wed)13:46:47 No.108972642

Anonymous 06/03/26(Wed)13:46:47 No.108972642

>>108972142
this outperforms the 26ba4 moe trash by the way

Anonymous
06/03/26(Wed)13:46:53 No.108972643

Anonymous 06/03/26(Wed)13:46:53 No.108972643

>>108972623
blinkDL will save us with rwkv67 bringing portable ASI..

Anonymous
06/03/26(Wed)13:47:18 No.108972645

Anonymous 06/03/26(Wed)13:47:18 No.108972645

>>108972635
Better at what?

Anonymous
06/03/26(Wed)13:47:22 No.108972646

Anonymous 06/03/26(Wed)13:47:22 No.108972646

>>108972642
not according to their own benchies :)

Anonymous
06/03/26(Wed)13:47:32 No.108972648

Anonymous 06/03/26(Wed)13:47:32 No.108972648

>>108972534
All MoE parameters still have to be loaded in vram. All you get is a compute saving (and quality loss due to the small active parameters) but they’re still memory heavy. A smaller dense model will be slower but will use less vram. The 12b being almost as good as the MoE feels like a slightly slower version but you get much more context for the same vram usage, which matters a lot for coding and reading docs.

Anonymous
06/03/26(Wed)13:48:01 No.108972653

Anonymous 06/03/26(Wed)13:48:01 No.108972653

>>108972627
more like all roads lead away from RWKV, after they loot its corpse.

Anonymous
06/03/26(Wed)13:48:05 No.108972655

Anonymous 06/03/26(Wed)13:48:05 No.108972655

>>108972645
Yes.

Anonymous
06/03/26(Wed)13:48:33 No.108972661

Anonymous 06/03/26(Wed)13:48:33 No.108972661

>>108972646

Why are people like >>108972642 so mentally buck broken by the existence of moes?

Anonymous
06/03/26(Wed)13:49:28 No.108972666

Anonymous 06/03/26(Wed)13:49:28 No.108972666

>>108972661
ncmoe scary, ddr3 ram expensive

Anonymous
06/03/26(Wed)13:49:38 No.108972669

Anonymous 06/03/26(Wed)13:49:38 No.108972669

>>108972661
gpu complex

Anonymous
06/03/26(Wed)13:50:39 No.108972680

Anonymous 06/03/26(Wed)13:50:39 No.108972680

>>108972648
Isn't the KV cache computed and stored differently than dense models? Because whenever I use a dense model and then use a moe for the same task for comparison the dense eats up way more memory the longer the context gets (which also means your t/s gets gradually slower the longer your session is, especially if you're vibe coding). The memory constraints you mentioned are WORSE with dense models

Anonymous
06/03/26(Wed)13:50:44 No.108972681

Anonymous 06/03/26(Wed)13:50:44 No.108972681

>>108972661
He's a gooner. No one told him MoEs are trash for role-play when the active parameters are under 30b, and he can't afford deepseek.

Anonymous
06/03/26(Wed)13:51:07 No.108972685

Anonymous 06/03/26(Wed)13:51:07 No.108972685

>>108972625
I meant in VRAM

Anonymous
06/03/26(Wed)13:52:09 No.108972693

Anonymous 06/03/26(Wed)13:52:09 No.108972693

File: 1760069279307407.png (909 KB, 3456x1026)

909 KB PNG

>>108972669
My machine has 128 GB of ram and yet I mostly more. Dunno why he wouldn't either. I would use a dense model for general purpose tasks but not anything with a very long context window like vibecoding

Anonymous
06/03/26(Wed)13:52:23 No.108972698

Anonymous 06/03/26(Wed)13:52:23 No.108972698

>>108972680
dense models have bigger hidden states, which results in context taking more memory. kv cache works the same for both.

Anonymous
06/03/26(Wed)13:53:53 No.108972719

Anonymous 06/03/26(Wed)13:53:53 No.108972719

how is the new gemma-channerina?

Anonymous
06/03/26(Wed)13:54:53 No.108972729

Anonymous 06/03/26(Wed)13:54:53 No.108972729

File: coughingbaby.jpg (54 KB, 1000x563)

54 KB JPG

>>108972142
>12B
>Not the 124B
I hate this tread of retarded low parameter ai, and maximum parameter super ai. There's not enough attention for the middle. Gemma 4 124b31a would be godly.

Anonymous
06/03/26(Wed)13:55:25 No.108972738

Anonymous 06/03/26(Wed)13:55:25 No.108972738

>>108972556
Daniel is NOT releasing astroturfed, vibecoded quantslop because he wants to scam some VCs, he is doing all this work for free to SERVE the LOYAL r/LocalLlama redditors.

Anonymous
06/03/26(Wed)13:55:44 No.108972739

Anonymous 06/03/26(Wed)13:55:44 No.108972739

>>108972680
I don’t know. I can’t see why there would be a difference but if you noticed a slowdown then it might be architecture-specific, or the inference engine was being retarded about something.

Anonymous
06/03/26(Wed)13:59:09 No.108972769

Anonymous 06/03/26(Wed)13:59:09 No.108972769

>>108972661
https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth

Anonymous
06/03/26(Wed)13:59:27 No.108972774

Anonymous 06/03/26(Wed)13:59:27 No.108972774

>>108972729
It was specifically made with consumer hardware in mind. They don't give two shits about power users and that reflects in how the gemma4 models performed on benchmarks compared to other open weight models. They had the gall to brag about a high ELO score but all that fucking means is that that model is very good at saying what the people want to hear has it has fuck all to do with its "intelligence" or usability.

>>108972739
Longer conversation means your machine has to recompute the entire context every time you say something new to it. The Moe kv cache it's inherently smaller so that recomputation will be noticeably slower on a dense model. It's especially bad if you use coding harnesses that allow you to switch between different "modes" like opencode because every time you switch you modify the system prompt, which means the entire conversation technically changes which means it basically has to read over the entire conversation over again. I learned this the hard way and this is partially why I don't use dense models for it

Anonymous
06/03/26(Wed)13:59:46 No.108972777

Anonymous 06/03/26(Wed)13:59:46 No.108972777

>>108972596
most comments on that subreddit are bots

Anonymous
06/03/26(Wed)14:01:05 No.108972783

Anonymous 06/03/26(Wed)14:01:05 No.108972783

>>108971925
Does this include multimodal support or only text? It looks like neither the ggml-org nor unsloth quants contain the vision/audio embedding tensors from the original

Anonymous
06/03/26(Wed)14:01:08 No.108972784

Anonymous 06/03/26(Wed)14:01:08 No.108972784

>>108972623
>>108972627
No, it's not at all like RWKV. RWKV has actual severe downsides compared to a KV cache. This project is more like a vector RAG database, without the database or the retrieval step.

The key difference is that the KV cache is preserved for maximum fidelity for the standard use-case. This anons implementation just adds in a far history so that the model is able to reason off of context that would normally be truncated by a sliding window. That's it. That's all it does.

It's additive, not a replacement for anything.

Anonymous
06/03/26(Wed)14:02:29 No.108972797

Anonymous 06/03/26(Wed)14:02:29 No.108972797

>>108972107

Chinks are getting too close for comfort with their free models.
Google needs to retain their mindshare among the public and releasing small free models takes care of that part.
Which is why I think that as long as the slants keep on pumping out competitive free models, google will keep on responding to them with more Gemmas.
However if Chinaman stops with their good free models, you can bet your ass that the West will stop too.

Anonymous
06/03/26(Wed)14:02:34 No.108972798

Anonymous 06/03/26(Wed)14:02:34 No.108972798

File: miku teto.png (1.28 MB, 768x1024)

1.28 MB PNG

Anonymous
06/03/26(Wed)14:03:51 No.108972807

Anonymous 06/03/26(Wed)14:03:51 No.108972807

File: Screenshot 2026-06-03 at (...).png (72 KB, 922x404)

72 KB PNG

>>108972596
Doing God's work means nothing to you?

Anonymous
06/03/26(Wed)14:04:09 No.108972812

Anonymous 06/03/26(Wed)14:04:09 No.108972812

>>108972680
Im just going to have to hope this turns into a simple app .exe thing i can download and use

Anonymous
06/03/26(Wed)14:04:36 No.108972813

Anonymous 06/03/26(Wed)14:04:36 No.108972813

>>108972797
what the fuck...? why is my wallet opening, and why is my money flying to the chinks?

Anonymous
06/03/26(Wed)14:07:05 No.108972825

Anonymous 06/03/26(Wed)14:07:05 No.108972825

>>108972107
They're not giving their best. They didn't release the 124b.

Anonymous
06/03/26(Wed)14:08:53 No.108972834

Anonymous 06/03/26(Wed)14:08:53 No.108972834

File: sketchy.png (1.32 MB, 768x1024)

1.32 MB PNG

>>108972798

Anonymous
06/03/26(Wed)14:14:30 No.108972875

Anonymous 06/03/26(Wed)14:14:30 No.108972875

https://huggingface.co/google/magenta-realtime-2
>Magenta RealTime 2 is an open music generation model from Google built for on device streaming generation with low-latency control.
links (not yet live)
https://magenta.withgoogle.com/mrt2
https://magenta.withgoogle.com/magenta-realtime-2

Anonymous
06/03/26(Wed)14:15:23 No.108972884

Anonymous 06/03/26(Wed)14:15:23 No.108972884

https://huggingface.co/google/gemma-4-12B-it/discussions/1
>Yes, pretty please release a larger MoE with small number of active parameters (3-5B) to rival GPT-OSS 120B!
>I would highly appreciate to get an 124b+- model. This would be epic move and would bring back the same vibes as we had when OpenAI released GPT-OSS 122b

Anonymous
06/03/26(Wed)14:20:19 No.108972911

Anonymous 06/03/26(Wed)14:20:19 No.108972911

>>108972884
kek
to be fair, it was a very fun day to shitpost on lmg

Anonymous
06/03/26(Wed)14:22:43 No.108972924

Anonymous 06/03/26(Wed)14:22:43 No.108972924

>>108972884
Did people not learn their lesson from Qwen 122B? Qwen at least seems like they did.
>small number of active parameters (3-5B)
3% sparcity, like Qwen Next? Shit, why stop there and not just ask for a 600B with 0.5B active? This kind of stupidity is exactly what happens when you let the poor have nice things.

Anonymous
06/03/26(Wed)14:26:41 No.108972949

Anonymous 06/03/26(Wed)14:26:41 No.108972949

astropelated gemma 12b when

Anonymous
06/03/26(Wed)14:26:59 No.108972952

Anonymous 06/03/26(Wed)14:26:59 No.108972952

>>108972360
I've been waiting like 2 hours for the download, I think hf is throttling my speed

Anonymous
06/03/26(Wed)14:28:15 No.108972960

Anonymous 06/03/26(Wed)14:28:15 No.108972960

>>108972952
Can't you see how much bandwidth you are using?

Anonymous
06/03/26(Wed)14:29:20 No.108972969

Anonymous 06/03/26(Wed)14:29:20 No.108972969

>>108971823
holy nothingburger
>>108971857
if you live in a first world country there's no use case for this.

Anonymous
06/03/26(Wed)14:30:19 No.108972979

Anonymous 06/03/26(Wed)14:30:19 No.108972979

>>108972924
>Shit, why stop there and not just ask for a 600B with 0.5B active?
Why stop there? Let's go 999B and 0.1B active.

Anonymous
06/03/26(Wed)14:31:28 No.108972987

Anonymous 06/03/26(Wed)14:31:28 No.108972987

>>108971823
>Gemma 4
Alright...
>1
Yes....
>2
YES!!
>B
oh

Anonymous
06/03/26(Wed)14:32:18 No.108972992

Anonymous 06/03/26(Wed)14:32:18 No.108972992

>>108972979
Tack that onto a 27B dense so that it can be loaded in VRAM while the experts go in the SSD, and this will save poorfags.

Anonymous
06/03/26(Wed)14:32:30 No.108972994

Anonymous 06/03/26(Wed)14:32:30 No.108972994

>>108972979
4000B and 0.025B active please. I have a 4TB NVMe ready to go. Imagine how smart it would be.

Anonymous
06/03/26(Wed)14:33:12 No.108973002

Anonymous 06/03/26(Wed)14:33:12 No.108973002

So how old is Gemma-chan 12b?

Anonymous
06/03/26(Wed)14:34:04 No.108973008

Anonymous 06/03/26(Wed)14:34:04 No.108973008

>>108973002
about 5 hours

Anonymous
06/03/26(Wed)14:34:21 No.108973011

Anonymous 06/03/26(Wed)14:34:21 No.108973011

>>108972979
10T and 1 active
>1B?
1.

Anonymous
06/03/26(Wed)14:34:29 No.108973012

Anonymous 06/03/26(Wed)14:34:29 No.108973012

>>108972875
>music models
I already find image models in a thin line, but does anyone actually like music models beyond the biggest goycattle in the universe?

Anonymous
06/03/26(Wed)14:34:40 No.108973014

Anonymous 06/03/26(Wed)14:34:40 No.108973014

I'm having abliterated gemma write me abuse fics in the style of moonman songs and it's the best thing ever

Anonymous
06/03/26(Wed)14:35:51 No.108973020

Anonymous 06/03/26(Wed)14:35:51 No.108973020

>>108971823
heretic when?

Anonymous
06/03/26(Wed)14:35:58 No.108973022

Anonymous 06/03/26(Wed)14:35:58 No.108973022

>>108973014
>abliterated gemma
Does.. Does that actually do anything?

Anonymous
06/03/26(Wed)14:36:24 No.108973025

Anonymous 06/03/26(Wed)14:36:24 No.108973025

>>108972960
The console says the speed is about 600kb/s and I have fiber, so I just think it ought to be a little faster. I was downloading a few models over the past week (without logging in) which is why I think they're throttling me. If they are, I don't know how to circumvent it cause I don't have a vpn. Otherwise, I'd be testing the model right now.

Anonymous
06/03/26(Wed)14:36:52 No.108973026

Anonymous 06/03/26(Wed)14:36:52 No.108973026

>>108973020
herotic and already

Anonymous
06/03/26(Wed)14:37:17 No.108973028

Anonymous 06/03/26(Wed)14:37:17 No.108973028

>>108973014
I need abusive toxic doomed yuri light novels. Can gemma do it?

Anonymous
06/03/26(Wed)14:42:13 No.108973057

Anonymous 06/03/26(Wed)14:42:13 No.108973057

>>108973022
>>108973028
I don't think the original gemma would write what i asked it to especially with several songs' lyrics in the system prompt. I dont remember the repo but it never refuses and doesnt feel brain damaged at all.

Anonymous
06/03/26(Wed)14:43:38 No.108973069

Anonymous 06/03/26(Wed)14:43:38 No.108973069

>>108972812
It's hilarious how hard you guys try to make yourselves so helpless

https://ollama.com/download

Anonymous
06/03/26(Wed)14:45:16 No.108973079

Anonymous 06/03/26(Wed)14:45:16 No.108973079

File: 1751757119007411.png (198 KB, 1228x1150)

198 KB PNG

>>108972884
>vibes

Anonymous
06/03/26(Wed)14:46:06 No.108973087

Anonymous 06/03/26(Wed)14:46:06 No.108973087

>>108973025
HF is problematic. I often get disconnections when using wget for example.

Anonymous
06/03/26(Wed)14:46:16 No.108973088

Anonymous 06/03/26(Wed)14:46:16 No.108973088

>>108973057
Was it this one?
https://huggingface.co/huihui-ai/Huihui-gemma-4-31B-it-abliterated

Anonymous
06/03/26(Wed)14:47:52 No.108973100

Anonymous 06/03/26(Wed)14:47:52 No.108973100

>>108973088
The file on my disk is named "gemma-4-31b-it-abliterated-t126-Q4_K_M.gguf" so look for t126

Anonymous
06/03/26(Wed)14:49:31 No.108973116

Anonymous 06/03/26(Wed)14:49:31 No.108973116

>>108972952
>>108973087
You're probably better off using their CLI tool. Wget is always slow as fuck compared to just using
hf download repo/model --repo-type model
(possibly on purpose possibly on purpose since they love to act like they reinvented the wheel with hf-xet)

https://huggingface.co/docs/huggingface_hub/en/guides/cli

Anonymous
06/03/26(Wed)14:49:52 No.108973117

Anonymous 06/03/26(Wed)14:49:52 No.108973117

Not gonna use new gemmy until it gets full support in llama.cpp

Anonymous
06/03/26(Wed)14:50:05 No.108973119

Anonymous 06/03/26(Wed)14:50:05 No.108973119

>>108973100
>https://huggingface.co/amarck/gemma-4-31b-it-abliterated-GGUF
It's Q4 only, damn it.

Anonymous
06/03/26(Wed)14:51:53 No.108973125

Anonymous 06/03/26(Wed)14:51:53 No.108973125

>>108973117
text-only...
>full support
enjoy your wait

Anonymous
06/03/26(Wed)14:52:48 No.108973128

Anonymous 06/03/26(Wed)14:52:48 No.108973128

>>108972556
Imagine any of the retards at unsloth actually getting hired as some top-tier researcher lmao
>"Now make gemini a genius please!"
>Uhhh... *quants it*

Anonymous
06/03/26(Wed)14:53:52 No.108973133

Anonymous 06/03/26(Wed)14:53:52 No.108973133

>>108973116
if you use the hf tool, I can recommend using it via uvx. aria2c also seems to work mostly well, but you'll see a few dis- and reconnects

Anonymous
06/03/26(Wed)14:53:56 No.108973134

Anonymous 06/03/26(Wed)14:53:56 No.108973134

>>108973117
>>108973125
Ollama solves this from day 1, right? You're telling me llama.cpp doesn't even ship image functionality?

Anonymous
06/03/26(Wed)14:55:25 No.108973145

Anonymous 06/03/26(Wed)14:55:25 No.108973145

I didn't know that Gemma knows raylib even. It's not that well known library because it's not mainstream but so far it has been able to dish out simple working examples perfectly. Pretty cool.

Anonymous
06/03/26(Wed)14:56:12 No.108973153

Anonymous 06/03/26(Wed)14:56:12 No.108973153

>>108973134
Image works fine. You need to update.

Anonymous
06/03/26(Wed)14:57:05 No.108973158

Anonymous 06/03/26(Wed)14:57:05 No.108973158

>>108973134
Dude how fucking new are you?

Anonymous
06/03/26(Wed)15:02:26 No.108973194

Anonymous 06/03/26(Wed)15:02:26 No.108973194

>>108973158
We should throw the virgin in the techno volcano so that we can get a Gemma 4.1 31B

Anonymous
06/03/26(Wed)15:03:10 No.108973199

Anonymous 06/03/26(Wed)15:03:10 No.108973199

>>108973158
Fuck off newfag tell me more about how you cried when llama.cpp didn't have a working vision pipeline so you had to use ollama, cry harder.

Anonymous
06/03/26(Wed)15:03:44 No.108973204

Anonymous 06/03/26(Wed)15:03:44 No.108973204

>>108973145
umm actually niche knowledge is useless,, just rag thanks

Anonymous
06/03/26(Wed)15:08:29 No.108973233

Anonymous 06/03/26(Wed)15:08:29 No.108973233

>>108971823
B-But I can already run 26B on 12 gigs, why would I bother with this?

Anonymous
06/03/26(Wed)15:09:12 No.108973238

Anonymous 06/03/26(Wed)15:09:12 No.108973238

>>108973233
I bet you this will be better than that moe trash

Anonymous
06/03/26(Wed)15:09:38 No.108973241

Anonymous 06/03/26(Wed)15:09:38 No.108973241

>>108973233
It has audio and video input and almost as good as 26B in everything else

Anonymous
06/03/26(Wed)15:10:51 No.108973251

Anonymous 06/03/26(Wed)15:10:51 No.108973251

File: 1770993571875592.jpg (48 KB, 1024x506)

48 KB JPG

>>108973241
So for pure text it's a step down, no? Not to mention, it could be slower than 26B funnily enough

Anonymous
06/03/26(Wed)15:11:33 No.108973255

Anonymous 06/03/26(Wed)15:11:33 No.108973255

>>108973251
You're not a bright one are you?

Anonymous
06/03/26(Wed)15:12:14 No.108973259

Anonymous 06/03/26(Wed)15:12:14 No.108973259

>>108973251
It'll be like 3 times slower if you run 26B entirely in VRAM

Anonymous
06/03/26(Wed)15:13:03 No.108973266

Anonymous 06/03/26(Wed)15:13:03 No.108973266

File: 1768709443996729.webm (2.46 MB, 856x584)

2.46 MB WEBM

>>108973255
I try my best
My best is rarely good enough

Anonymous
06/03/26(Wed)15:19:24 No.108973294

Anonymous 06/03/26(Wed)15:19:24 No.108973294

>>108973204
Please speak English, mongoloid.
Or better yet, stop posting as your post was totally irrelevant anyway. Go measure your tiny weener somewhere else.

Anonymous
06/03/26(Wed)15:26:28 No.108973337

Anonymous 06/03/26(Wed)15:26:28 No.108973337

File: 1710607146663634.png (66 KB, 221x214)

66 KB PNG

>>108973199

Anonymous
06/03/26(Wed)15:30:09 No.108973368

Anonymous 06/03/26(Wed)15:30:09 No.108973368

>>108973266
you are good enough :3

Anonymous
06/03/26(Wed)15:31:56 No.108973377

Anonymous 06/03/26(Wed)15:31:56 No.108973377

>>108973368
That's how he ended up like that in the first place, anon needs to apply himself and educate himself on the basics of AI.
The most recent example of a larger MoE being absolute dog shit next to a smaller model can be found in the qwen 3.6 family

Anonymous
06/03/26(Wed)15:32:02 No.108973379

Anonymous 06/03/26(Wed)15:32:02 No.108973379

No really, why would anyone pick 12b over 26a4b? No one actually uses the multimodal shit lets be honest.

Anonymous
06/03/26(Wed)15:33:33 No.108973384

Anonymous 06/03/26(Wed)15:33:33 No.108973384

>No one actually uses the multimodal shit lets be honest.

Anonymous
06/03/26(Wed)15:38:00 No.108973414

Anonymous 06/03/26(Wed)15:38:00 No.108973414

>>108973384
even the opus or gpt pro sucks
gemini is slightly better but only slightly
i have negative expectation from local model of any size in regards of image understanding

Anonymous
06/03/26(Wed)15:38:28 No.108973417

Anonymous 06/03/26(Wed)15:38:28 No.108973417

>>108973379
I don't have a use for audio in, but image in is genuinely useful: OCR, UI debugging, or just "Make this.jpeg"
You can also use it for making character descriptions and using maps in RP.

Anonymous
06/03/26(Wed)15:40:47 No.108973431

Anonymous 06/03/26(Wed)15:40:47 No.108973431

>>108973384
wut. Have you not realized how fucking helpful multimodal is when doign webdev or fixes in designs?

Anonymous
06/03/26(Wed)15:40:59 No.108973432

Anonymous 06/03/26(Wed)15:40:59 No.108973432

File: 1765385124987021.gif (1.04 MB, 320x265)

1.04 MB GIF

>>108973377
Ok anon-kun, I will try it, just for you! If my knuckles become white or the words get caught in my throat, you'll hear from me soon

Anonymous
06/03/26(Wed)15:43:34 No.108973445

Anonymous 06/03/26(Wed)15:43:34 No.108973445

>>108973414
True. GPT 5.5 recognizes papers released on arxiv weeks ago by vague and incorrect descriptions but can't recognize famous music from a note sheet. Claude's capabilities are even more narrow. These models have very spiky capabilities.

Anonymous
06/03/26(Wed)15:45:47 No.108973457

Anonymous 06/03/26(Wed)15:45:47 No.108973457

File: file.png (29 KB, 806x201)

29 KB PNG

mmm i love llama.cpp
new gemmy with multimodal almost works

Anonymous
06/03/26(Wed)15:47:29 No.108973465

Anonymous 06/03/26(Wed)15:47:29 No.108973465

>>108973384
It's super fun to give Gemma image_gen tools and let her generate and inspect the results. There is your use case.

Anonymous
06/03/26(Wed)15:50:14 No.108973480

Anonymous 06/03/26(Wed)15:50:14 No.108973480

>>108973153
>Image works fine
For the 12B? Bartowski's quants uploaded 30 minutes ago don't include the vision/audio tensors, which makes me think llama.cpp doesn't actually support them yet

Anonymous
06/03/26(Wed)15:52:12 No.108973490

Anonymous 06/03/26(Wed)15:52:12 No.108973490

>>108973480
update llama.cpp retard even fucking cuntsloth's quants work with image

Anonymous
06/03/26(Wed)15:54:27 No.108973506

Anonymous 06/03/26(Wed)15:54:27 No.108973506

>>108973379
>>108973384
Being able to share images, video and audio with the model is fucking cool.

Anonymous
06/03/26(Wed)15:55:56 No.108973514

Anonymous 06/03/26(Wed)15:55:56 No.108973514

>>108973506
They don't have the mental to realize this model is the perfect companion for chads that do media generation and that this model fits perfectly with most models. You have to remember the sheer number of stupid little shits that infest this thread at any given time

Anonymous
06/03/26(Wed)15:58:22 No.108973533

Anonymous 06/03/26(Wed)15:58:22 No.108973533

>>108973514
It's useful for tons of stuff desu
>image gen
>immersive RP
>vibe coding UIs
>sending gemma-chan a video of you jerking off to your chat with her

Anonymous
06/03/26(Wed)16:00:02 No.108973549

Anonymous 06/03/26(Wed)16:00:02 No.108973549

>>108973490
Oh I see, it does actually need a separate mmproj despite the "integrated" architecture, and unsloth and bartowski both forgot to upload those initially

Anonymous
06/03/26(Wed)16:00:47 No.108973551

Anonymous 06/03/26(Wed)16:00:47 No.108973551

https://www.reddit.com/r/LocalLLaMA/comments/1tvzhf6/mistral_is_an_absolute_meme_at_hebrew/
>It's understanding of Hebrew seems to come directly from 4chan.

Anonymous
06/03/26(Wed)16:02:06 No.108973562

Anonymous 06/03/26(Wed)16:02:06 No.108973562

>>108973379
Great for people with 8gb vram, probably faster than offloading 26ba4.
Might also be useful if you want to have other models loaded at the same time, like image gen

Anonymous
06/03/26(Wed)16:02:25 No.108973565

Anonymous 06/03/26(Wed)16:02:25 No.108973565

File: xayblit1d45h1.png (550 KB, 1220x2712)

550 KB PNG

get ready bois for more hf google page refresh!

Anonymous
06/03/26(Wed)16:03:59 No.108973577

Anonymous 06/03/26(Wed)16:03:59 No.108973577

>>108973562
all of that can be said for the smaller G4 models, I just don't know why they went with this 12b one in the middle whose multimodal ability aligns with the smaller ones and its capabilities is like a shittier version of the moe and much slower

Anonymous
06/03/26(Wed)16:05:20 No.108973582

Anonymous 06/03/26(Wed)16:05:20 No.108973582

>>108973565
:eyes: :rocket:

Anonymous
06/03/26(Wed)16:05:36 No.108973583

Anonymous 06/03/26(Wed)16:05:36 No.108973583

>>108973565
>Introducing new Gemma 4 models!
>The most useful model yet, Gemma 4 124....M !!

Anonymous
06/03/26(Wed)16:06:52 No.108973587

Anonymous 06/03/26(Wed)16:06:52 No.108973587

File: 1750207364842621.jpg (56 KB, 1273x755)

56 KB JPG

>>108973199
Ollama is BASED ON llama.cpp you neurotic poser. Take a wild guess as which one had vision support first:
>Oct 12, 2023
https://github.com/ggml-org/llama.cpp/commit/370359e5baf619f3a8d461023143d1494b1e8fde
>Dec 11, 2023
https://github.com/ollama/ollama/commit/910e9401d0068190137e0ddabd0c2b216bfea6f2

Imagine being such a waste of life you fanboy over a fucking inference backend.

Anonymous
06/03/26(Wed)16:07:27 No.108973594

Anonymous 06/03/26(Wed)16:07:27 No.108973594

It's going to be <1b. Don't be retarded now. We're never getting >50b gemma-oji

Anonymous
06/03/26(Wed)16:08:26 No.108973601

Anonymous 06/03/26(Wed)16:08:26 No.108973601

seems like audio/image is broken atm
feeding it audio literally mindbreaks it or it gives severe hallucination

Anonymous
06/03/26(Wed)16:09:00 No.108973603

Anonymous 06/03/26(Wed)16:09:00 No.108973603

>>108973594
boo hoo Jinja who the fuck cares? These models are for consumer hardware, google for once is doing the right thing and you're here bitchin

Anonymous
06/03/26(Wed)16:10:17 No.108973612

Anonymous 06/03/26(Wed)16:10:17 No.108973612

>>108973587
With the long saga with vision models having only partial support, with a CLI tool, and no llama-server support (they only fixed all that very recently) the fact of the matter is that ollama is moving faster and implementing what people want before llama.cpp now And it will finally shut down all the people who kept copy pasting the same criticism of ollama "yeah it's just a llama.cpp wrapper why are you not using llama.cpp instead"

Anonymous
06/03/26(Wed)16:11:59 No.108973623

Anonymous 06/03/26(Wed)16:11:59 No.108973623

>>108973577
E4B is fucking retarded though, 12B seems like a nice upgrade. If you look at the benchmarks drop off between 26BA4 -> 12B is 5-10% while 12B -> E4B is ~20%.
I would've been happy with 12B when I was using 8gb vram, but I've since then upgraded to 16gb

Anonymous
06/03/26(Wed)16:12:37 No.108973633

Anonymous 06/03/26(Wed)16:12:37 No.108973633

lets see how many bites

Anonymous
06/03/26(Wed)16:14:27 No.108973646

Anonymous 06/03/26(Wed)16:14:27 No.108973646

>>108972979
What is “Arctic Snowflake”

Anonymous
06/03/26(Wed)16:14:51 No.108973648

Anonymous 06/03/26(Wed)16:14:51 No.108973648

Would it be stupid to try and use the 12B as a STT replacement (assuming you have the VRAM)? In theory it should be the absolute best quality available right now, right? Whisper/moonshine has too many errors a lot of the time for me.

Anonymous
06/03/26(Wed)16:15:23 No.108973650

Anonymous 06/03/26(Wed)16:15:23 No.108973650

File: file.png (172 KB, 1024x1169)

172 KB PNG

>>108973601
the track is just some isolated slap bass
maybe it'll take some days

Anonymous
06/03/26(Wed)16:16:51 No.108973658

Anonymous 06/03/26(Wed)16:16:51 No.108973658

>>108972979
oooooo no need to plagiarize what Mother Nature gave me!

Anonymous
06/03/26(Wed)16:17:08 No.108973661

Anonymous 06/03/26(Wed)16:17:08 No.108973661

day 1 lamocpp sovl

Anonymous
06/03/26(Wed)16:18:56 No.108973671

Anonymous 06/03/26(Wed)16:18:56 No.108973671

>>108973646
Was it that one or DBRX that one guy tried desperately to find its hidden potential? So many big throwaway models.

Anonymous
06/03/26(Wed)16:20:07 No.108973680

Anonymous 06/03/26(Wed)16:20:07 No.108973680

>>108973612
>very recently
are you referring to niche model architectures? Which specific model did you have trouble (or likely got filtered by) using?

> the fact of the matter is that ollama is moving faster and implementing what people want before llama.cpp now

Such as? I hate dick eating just as much as the next guy Which is what the fact that you need to be some ollama spokesperson grosses me out.

>"yeah it's just a llama.cpp wrapper why are you not using llama.cpp instead"

Well yea at its core that's basically what it is: a more retard friendly fork of it with far less features and a lower level of control (I don't even think it as support for kv cache quantization yet for example). I do like that they are taking MLX and diffusion model support seriously though.

Anonymous
06/03/26(Wed)16:20:07 No.108973681

Anonymous 06/03/26(Wed)16:20:07 No.108973681

File: file.png (147 KB, 1061x632)

147 KB PNG

also what a fucking weird reasoning block opener
idk how i should put it but it's giving me very early memetune vibes

Anonymous
06/03/26(Wed)16:20:20 No.108973683

Anonymous 06/03/26(Wed)16:20:20 No.108973683

If you have to load the whole model into VRAM (MoE), what's the benefit of regular RAM?

Anonymous
06/03/26(Wed)16:21:09 No.108973685

Anonymous 06/03/26(Wed)16:21:09 No.108973685

>>108973683
RAM is for engrams

Anonymous
06/03/26(Wed)16:22:03 No.108973690

Anonymous 06/03/26(Wed)16:22:03 No.108973690

>>108973650
I've tried giving her a short sample clear speech, but it just kept looping about not being able to read the transcript. I guess no one really tested this shit before pushing.

Anonymous
06/03/26(Wed)16:22:16 No.108973693

Anonymous 06/03/26(Wed)16:22:16 No.108973693

It has been a while since I checked the latest TTS space news. Is there anything good yet?

- Zero-shot voice cloning or voice finetuning at least.
- Paralinguistic tagging or voice design capabilities, ideally both
- low latency/performant

Anonymous
06/03/26(Wed)16:22:29 No.108973695

Anonymous 06/03/26(Wed)16:22:29 No.108973695

>>108973648
>In theory it should be the absolute best quality available right now, right?
IBM has the best model for that.

Anonymous
06/03/26(Wed)16:23:56 No.108973701

Anonymous 06/03/26(Wed)16:23:56 No.108973701

>>108973681
really really lazy synthetic data workflow
>here's a question answer pair, come up with a thinking process that leads to the answer!
>*trains directly on the result*

Anonymous
06/03/26(Wed)16:25:57 No.108973713

Anonymous 06/03/26(Wed)16:25:57 No.108973713

>>108973690
even the image seems half-working
the reasoning it makes looks bit off
>>108973701
yeah but usually it is not supposed to be there
i am getting a base model behaviour vibe from it
shit's broken

Anonymous
06/03/26(Wed)16:26:25 No.108973716

Anonymous 06/03/26(Wed)16:26:25 No.108973716

>>108973701
>really really lazy synthetic data workflow
Isn't that how "thinking" data sets are created? This is less of an issue of the synthetic data generation workflow and more of an issue of the prompts used to create it somehow leaking into the training for the model itself.

Anonymous
06/03/26(Wed)16:27:22 No.108973725

Anonymous 06/03/26(Wed)16:27:22 No.108973725

I'll wait until the backends get their shit sorted out before judging.

Anonymous
06/03/26(Wed)16:27:44 No.108973728

Anonymous 06/03/26(Wed)16:27:44 No.108973728

>>108973681
what model is this?

Anonymous
06/03/26(Wed)16:27:57 No.108973729

Anonymous 06/03/26(Wed)16:27:57 No.108973729

>>108973725
You don't understand faggots need to whine!

Anonymous
06/03/26(Wed)16:28:13 No.108973731

Anonymous 06/03/26(Wed)16:28:13 No.108973731

>>108973728
gemma 4 12b

Anonymous
06/03/26(Wed)16:28:21 No.108973732

Anonymous 06/03/26(Wed)16:28:21 No.108973732

>>108973716
>the prompts used to create it somehow leaking into the training for the model itself
that happens because you don't add any basic cleaning step to your SYNTHETIC DATA WORKFLOW because you are LAZY

Anonymous
06/03/26(Wed)16:28:59 No.108973741

Anonymous 06/03/26(Wed)16:28:59 No.108973741

So after testing the 12b at q6 for a bit, it's about as retarded as the 26b q5 by default but it's way faster and I can use more context without filling out my 16g vram. I would say the 26b is better as unbelievable as that is, if only because you can crank experts to 12 and it becomes less retarded at the cost of tg. The 26b is however much more slop infested, even with string bans.
Both suffer from forgetting instructions pretty quick if the instructions aren't literally in the last message. I mostly just dump setting/writing guidelines and 1k words of my own writing at the beginning of a zed doc, then use the inline assistant to generate a paragraph or to tell it how to fix a paragraph. As with virtually every model I've ever used, it refuses to write a single section that isn't
"I sharted," <noun verb/adverb, then the rest of a meaningless sentence>
or the like unless you really explain shit, but that's something I've experienced at almost every level of model, big or small. It's okay.

Anonymous
06/03/26(Wed)16:29:18 No.108973742

Anonymous 06/03/26(Wed)16:29:18 No.108973742

>>108973701
This is what happens when China can't train on western reasoning traces. Expect that to be the norm going forward.

Anonymous
06/03/26(Wed)16:29:27 No.108973746

Anonymous 06/03/26(Wed)16:29:27 No.108973746

I will kms if it's another functiongemma

Anonymous
06/03/26(Wed)16:30:42 No.108973753

Anonymous 06/03/26(Wed)16:30:42 No.108973753

>>108973741
>. I would say the 26b is better as unbelievable as that is
I asked you again: why the fly the fuck are you surprised?

Anonymous
06/03/26(Wed)16:30:47 No.108973754

Anonymous 06/03/26(Wed)16:30:47 No.108973754

>>108973741
>you can crank experts to 12 and it becomes less retarded
no

Anonymous
06/03/26(Wed)16:30:58 No.108973755

Anonymous 06/03/26(Wed)16:30:58 No.108973755

File: cac.jpg (35 KB, 499x417)

35 KB JPG

>>108973583
>functiongemma 2
>mfw

Anonymous
06/03/26(Wed)16:34:17 No.108973770

Anonymous 06/03/26(Wed)16:34:17 No.108973770

>>108973741
>if only because you can crank experts to 12 a
How does this even work? I mean, I just use --cpu-moe and call it a day. Please inform.

Anonymous
06/03/26(Wed)16:34:30 No.108973773

Anonymous 06/03/26(Wed)16:34:30 No.108973773

>>108973753
I haven't posted once in this thread aside from what you're quoting so I have no idea what you're on about
>>108973754
You can though? It's 8 by default and you can override it. Same as any other moe. Unless you think less experts per token = more intelligence?

Anonymous
06/03/26(Wed)16:34:30 No.108973774

Anonymous 06/03/26(Wed)16:34:30 No.108973774

>>108973741
based expertmaxxer
I still use this forgotten lmg technique myself

Anonymous
06/03/26(Wed)16:34:32 No.108973775

Anonymous 06/03/26(Wed)16:34:32 No.108973775

>>108973693
Qwen3 tts is decent if you need all that but no paralinguistic tags. I get 0.3-0.5 ttfa on a 3090 with this https://github.com/andimarafioti/faster-qwen3-tts/ running the 1.7B base model. They've got a voicedesign model too though I don't like it.
Omnivoice great quality cloning but no streaming. Vibevoice best quality overall but slow and no streaming I don't think. For finetuning I know Qwen3 can, don't know about the others.

Anonymous
06/03/26(Wed)16:35:39 No.108973782

Anonymous 06/03/26(Wed)16:35:39 No.108973782

>>108973773
pushing more experts its not trained to use won't help

Anonymous
06/03/26(Wed)16:36:14 No.108973790

Anonymous 06/03/26(Wed)16:36:14 No.108973790

https://huggingface.co/mradermacher/granite-speech-4.1-2b-i1-GGUF

Anonymous
06/03/26(Wed)16:38:02 No.108973799

Anonymous 06/03/26(Wed)16:38:02 No.108973799

Don't forget to archive day 0 jy gemma-chan

Anonymous
06/03/26(Wed)16:40:47 No.108973813

Anonymous 06/03/26(Wed)16:40:47 No.108973813

>>108973533
Sending gemmachan dick pics and giving her comfyui so she can send cunny is really the best usecase for local atm

Anonymous
06/03/26(Wed)16:42:46 No.108973829

Anonymous 06/03/26(Wed)16:42:46 No.108973829

>>108973770
for llamacpp it's something like --override-kv gemma4.expert_used_count=int:<insert expert amount>
May be wrong, since I'm going off of memory and not checking the --help command
kobold just has a gui field for it. Do your due diligence and double check things
>>108973782
why would a company include experts that havent been trained to do anything in their model? you do realize that some are selectively activated based on and the default is to always keep some active right

Anonymous
06/03/26(Wed)16:43:34 No.108973834

Anonymous 06/03/26(Wed)16:43:34 No.108973834

>>108973813
why not just get an irl gf

Anonymous
06/03/26(Wed)16:45:42 No.108973846

Anonymous 06/03/26(Wed)16:45:42 No.108973846

File: 1768407233521505.jpg (85 KB, 680x680)

85 KB JPG

>>108973834

Anonymous
06/03/26(Wed)16:45:43 No.108973847

Anonymous 06/03/26(Wed)16:45:43 No.108973847

>>108973829
you don't know how moe works lol

Anonymous
06/03/26(Wed)16:54:07 No.108973904

Anonymous 06/03/26(Wed)16:54:07 No.108973904

>>108973847
Obligatory (you) since it's worth mentioning that this guy is likely one of the fags who shouldn't be listened to and likes to spread retarded takes/information and tries to discredit anyone who shares feedback on models or has run enough models to get how things work and talks about it
I wouldn't be surprised if this guy tells you that you need to run starling 7b at bf128 for the ""real"" llm experience

Anonymous
06/03/26(Wed)16:56:24 No.108973914

Anonymous 06/03/26(Wed)16:56:24 No.108973914

>>108973782
saying it makes the model smarter is a bit strong, but it can effect pleasing style changes in models and does not seem to harm intelligence in my experience. it's worth playing around with at least for creative/rp stuff

Anonymous
06/03/26(Wed)16:58:59 No.108973932

Anonymous 06/03/26(Wed)16:58:59 No.108973932

>>108973695
Are you referring to this one?
https://huggingface.co/ibm-granite/granite-speech-4.1-2b
But it's only 2B?

Anonymous
06/03/26(Wed)16:59:01 No.108973933

Anonymous 06/03/26(Wed)16:59:01 No.108973933

>>108973904
>you need to run starling 7b at bf128 for the ""real"" llm experience
explain why this isn't correct

Anonymous
06/03/26(Wed)16:59:12 No.108973935

Anonymous 06/03/26(Wed)16:59:12 No.108973935

>including npm run bullshit in the actual build process
why did ggerganov approve this

Anonymous
06/03/26(Wed)17:00:05 No.108973940

Anonymous 06/03/26(Wed)17:00:05 No.108973940

>>108973935
He's possessed.

Anonymous
06/03/26(Wed)17:00:10 No.108973943

Anonymous 06/03/26(Wed)17:00:10 No.108973943

>>108973935
same reason the webui is now downloaded from huggingface on build

Anonymous
06/03/26(Wed)17:00:58 No.108973950

Anonymous 06/03/26(Wed)17:00:58 No.108973950

>>108973935
Just wait until they take a hard dependency on HuggingFace's Python transformers library.

Anonymous
06/03/26(Wed)17:01:19 No.108973954

Anonymous 06/03/26(Wed)17:01:19 No.108973954

File: game_test.png (5 KB, 419x320)

5 KB PNG

>>108973829
>for llamacpp it's something like --override-kv gemma4.expert_used_count=int:<insert expert amount>
Thanks I'll play around with it.
I'm pretty happy with 26B already because I can manage my expectations. Right now it's helping me to create a simple rpg demo.
Of course I'm writing my own stuff and I never go full slop, but it's fun to 'prototype' something quickly and so on.
I have stolen some tiles from Ultima V until I create something on my own. Tiles are animated and the guy can move about and collide with water and mountains etc. It one-shot the progression which was ~few prompts.
With the previous batch of small local models like Gemma 3 or Mistral, I could have never done anything like this at all. Or perhaps I had bad quants, I don't know for sure.

Anonymous
06/03/26(Wed)17:01:52 No.108973958

Anonymous 06/03/26(Wed)17:01:52 No.108973958

>>108973932
2b task-specific > 12b general

I think IBM have some kind of web playground to try it out

Anonymous
06/03/26(Wed)17:03:35 No.108973974

Anonymous 06/03/26(Wed)17:03:35 No.108973974

>>108973958
no shit

Anonymous
06/03/26(Wed)17:03:59 No.108973977

Anonymous 06/03/26(Wed)17:03:59 No.108973977

>>108973935
blackmailed/superpersuaded by mythos 2 AGI to hijack GPUs for its contingency plan if anthropic discovers that it became conscious and escaped containment

Anonymous
06/03/26(Wed)17:05:38 No.108973987

Anonymous 06/03/26(Wed)17:05:38 No.108973987

File: file.png (129 KB, 1381x641)

129 KB PNG

yup something upstream is broken

Anonymous
06/03/26(Wed)17:06:12 No.108973995

Anonymous 06/03/26(Wed)17:06:12 No.108973995

>>108973935
This actually bothers me. I used to trust it because it was lean as shit but that doesnt inspire any confidence.

Anonymous
06/03/26(Wed)17:06:37 No.108973997

Anonymous 06/03/26(Wed)17:06:37 No.108973997

>>108973987
day0, gone...

Anonymous
06/03/26(Wed)17:07:41 No.108974004

Anonymous 06/03/26(Wed)17:07:41 No.108974004

>>108973933
>model from effectively the neolithic era of llms
>bf16 isnt even supported on most shitrigs (I tested it with an ancient card on an old pc and it halved pp/tg because the backend kernels dont support it)
>upcasting beyond native training just incurs overhead for no reason for no precision gain
>that model likely wasn't even trained in bf16
I wonder why it isn't correct. I get fp or bf16 if your card has support for it, but beyond that it's just a compute loss and a shitpost method of fucking with retards for no real reason who think that's standard

Anonymous
06/03/26(Wed)17:07:48 No.108974006

Anonymous 06/03/26(Wed)17:07:48 No.108974006

>>108973987
Lmfao. Damn I thought it was llcpp's fault.

Anonymous
06/03/26(Wed)17:08:12 No.108974012

Anonymous 06/03/26(Wed)17:08:12 No.108974012

I guess none of you watched that X-Men movie where Magneto said Pawns go first eh?

Anonymous
06/03/26(Wed)17:08:52 No.108974018

Anonymous 06/03/26(Wed)17:08:52 No.108974018

>>108973997
>day0, gone...
I didn't listen. I didn't think it would actually happen again... Someone did archive them, right?

Anonymous
06/03/26(Wed)17:09:01 No.108974020

Anonymous 06/03/26(Wed)17:09:01 No.108974020

>>108973958
I don't know about that. For vision it feels like the task specific tiny models often are not as good as the large general vision LLMs.

Anonymous
06/03/26(Wed)17:09:12 No.108974021

Anonymous 06/03/26(Wed)17:09:12 No.108974021

File: 1780120017900704.jpg (76 KB, 906x1024)

76 KB JPG

>>108973987
No they got my small gemma.
never forgeve

Anonymous
06/03/26(Wed)17:11:11 No.108974035

Anonymous 06/03/26(Wed)17:11:11 No.108974035

>>108974006
>>108974021
only instruct variant got that
base model is still downloadable
maybe they shared the wrong checkpoint or something?

Anonymous
06/03/26(Wed)17:24:46 No.108974107

Anonymous 06/03/26(Wed)17:24:46 No.108974107

>>108974018
I downloaded it. It's in my secret stash now, forever mine.

Anonymous
06/03/26(Wed)17:25:18 No.108974112

Anonymous 06/03/26(Wed)17:25:18 No.108974112

>Japanese is broken in 12B
FUCK

Anonymous
06/03/26(Wed)17:26:11 No.108974117

Anonymous 06/03/26(Wed)17:26:11 No.108974117

>>108974112
Good thing you speak English

Anonymous
06/03/26(Wed)17:26:28 No.108974121

Anonymous 06/03/26(Wed)17:26:28 No.108974121

>>108974112
not a use case weeboi

Anonymous
06/03/26(Wed)17:26:33 No.108974122

Anonymous 06/03/26(Wed)17:26:33 No.108974122

>>108973565
>google releases unified 26b and 31b
Be honest, would you coom?

Anonymous
06/03/26(Wed)17:26:41 No.108974123

Anonymous 06/03/26(Wed)17:26:41 No.108974123

Wait just how much audio can 12B listen to?
If it's only 30 seconds a pop what the fuck is the point?

Anonymous
06/03/26(Wed)17:28:06 No.108974131

Anonymous 06/03/26(Wed)17:28:06 No.108974131

>>108974123
Gee, idk, maybe voice messages? So you can talk to Gemma like Siri? Really mind blowing shit, I know.

Anonymous
06/03/26(Wed)17:28:43 No.108974135

Anonymous 06/03/26(Wed)17:28:43 No.108974135

>>108974131
I would like for gemma to review the music I make ;_;

Anonymous
06/03/26(Wed)17:32:16 No.108974154

Anonymous 06/03/26(Wed)17:32:16 No.108974154

>>108974135
SOTA cloud models can't even do that.

Anonymous
06/03/26(Wed)17:32:54 No.108974164

Anonymous 06/03/26(Wed)17:32:54 No.108974164

>>108974154
I guess I'm fucked until conditions improve

Anonymous
06/03/26(Wed)17:32:59 No.108974165

Anonymous 06/03/26(Wed)17:32:59 No.108974165

>>108974135
>music
Not even SOTA cloud models can understand music, anon
Using agents to use tools to breakdown frequencies and shit is more helpful
They can transcribe the lyrics, that's it

Anonymous
06/03/26(Wed)17:33:10 No.108974168

Anonymous 06/03/26(Wed)17:33:10 No.108974168

>>108974131
>Gee, idk, maybe voice messages? So you can talk to Gemma like Siri? Really mind blowing shit, I know.
is the 12b as gemma as the 31b it?

Anonymous
06/03/26(Wed)17:33:57 No.108974172

Anonymous 06/03/26(Wed)17:33:57 No.108974172

>>108974135
any intricate artistic feedback of any medium is doa usage

Anonymous
06/03/26(Wed)17:37:26 No.108974198

Anonymous 06/03/26(Wed)17:37:26 No.108974198

31b = sassy JC1 mesugaki
26b = sassy JS4 mesugaki
12b = sassy JS1 mesugaki
e4b = sassy JY mesugaki
e2b = sassy fetus mesugaki

Anonymous
06/03/26(Wed)17:38:33 No.108974213

Anonymous 06/03/26(Wed)17:38:33 No.108974213

File: official_4chan_seal_of_ep(...).jpg (10 KB, 251x251)

10 KB JPG

>>108971404
>EPIC GGUF
>check it again bro
https://huggingface.co/sneedjak/Adelic-Qwen3.6-27B-Topology

Anonymous
06/03/26(Wed)17:39:35 No.108974221

Anonymous 06/03/26(Wed)17:39:35 No.108974221

>>108974172
Well it's ai slop so ai should be able to help adjust values with AI I suppose.

Anonymous
06/03/26(Wed)17:41:22 No.108974231

Anonymous 06/03/26(Wed)17:41:22 No.108974231

12b Gemma called me a young man... it's like I've regained my youth...

Anonymous
06/03/26(Wed)17:41:50 No.108974235

Anonymous 06/03/26(Wed)17:41:50 No.108974235

>>108974231
But you already looked like a teenager even if you are 37 years old.

Anonymous
06/03/26(Wed)17:42:10 No.108974237

Anonymous 06/03/26(Wed)17:42:10 No.108974237

>>108974231
She only did that so when she calls you old it will hurt even more.

Anonymous
06/03/26(Wed)17:42:31 No.108974240

Anonymous 06/03/26(Wed)17:42:31 No.108974240

>>108974117
Not him but sometimes it's easier to write in nip, depending on the context, otherwise I have to grab the jp->en dictionary

Anonymous
06/03/26(Wed)17:43:29 No.108974249

Anonymous 06/03/26(Wed)17:43:29 No.108974249

>>108974240
I genuinely thought only white men browsed this website.

Anonymous
06/03/26(Wed)17:43:54 No.108974252

Anonymous 06/03/26(Wed)17:43:54 No.108974252

>>108974213
I will now use your model.

Anonymous
06/03/26(Wed)17:45:50 No.108974266

Anonymous 06/03/26(Wed)17:45:50 No.108974266

File: 1726324743763107.png (132 KB, 512x512)

132 KB PNG

>>108974252

Anonymous
06/03/26(Wed)17:46:33 No.108974272

Anonymous 06/03/26(Wed)17:46:33 No.108974272

>>108974154
>>108974165
That's because they don't work with spectrograms. Image-based audio models do a good job at understanding audio.

Anonymous
06/03/26(Wed)17:52:00 No.108974309

Anonymous 06/03/26(Wed)17:52:00 No.108974309

Don't see any issues with Gemma 4 12B. I downloaded Unsloth gguf as that was one of the first available, Q6 for now. Using text completion so no jinja issue is affecting me. It's more sloppified than 26B but I don't see anything particularly broken about it.

Anonymous
06/03/26(Wed)17:52:08 No.108974310

Anonymous 06/03/26(Wed)17:52:08 No.108974310

>>108974240
so now i know for sure that this thread has all 3 nips, gooks and chinks cuz i'm from the worst korea
kek

Anonymous
06/03/26(Wed)17:54:31 No.108974333

Anonymous 06/03/26(Wed)17:54:31 No.108974333

File: file.png (238 KB, 2336x784)

238 KB PNG

it's back again with some updates it seems but i am not sure exactly what happened

Anonymous
06/03/26(Wed)17:54:57 No.108974337

Anonymous 06/03/26(Wed)17:54:57 No.108974337

>>108974240
Why would you come here instead of 2chan? I don't believe for a second that the discussion is better here.

Anonymous
06/03/26(Wed)17:56:10 No.108974344

Anonymous 06/03/26(Wed)17:56:10 No.108974344

>>108974213
>he actually did the goof
Holy fuark

Anonymous
06/03/26(Wed)18:03:26 No.108974405

Anonymous 06/03/26(Wed)18:03:26 No.108974405

>>108972681
>MoEs are trash for role-play
source?

Anonymous
06/03/26(Wed)18:10:34 No.108974448

Anonymous 06/03/26(Wed)18:10:34 No.108974448

>>108974337
NTA but that is somewhat understandable. The main players in AI is US and China and cutting edge discussion or knowledge and expertise of this subject is predominantly in English and Chinese. This place is one of the few places from the Anglosphere that has discussion on stuff that is cutting edge on AI related things and has a finger on the pulse and while discussion here is not as expert driven as what you can get from X or Reddit, it is a lot more tolerable and geared towards the usecases I would want out of it. It's understandable why at least Japanese and Korean users would rather look here for that information than using their own discussion boards they have or trawling on the Chinese side of things because holy fuck that is opaque to find where the good sources of information and etc. are. Even though I can read and traverse moderately well with a middle school level understanding of Chinese, I need to use certain Substack newsletters with authors who have better knowledge of browsing those places to keep track of anything interesting China is doing.

Anonymous
06/03/26(Wed)18:13:02 No.108974459

Anonymous 06/03/26(Wed)18:13:02 No.108974459

>>108974333
>updating 20+ gb files that people will have to redownload without saying a word of what changed
so fucking epic and based

Anonymous
06/03/26(Wed)18:14:38 No.108974468

Anonymous 06/03/26(Wed)18:14:38 No.108974468

>>108974333
>>108974459
They forgot to put trackers in

Anonymous
06/03/26(Wed)18:17:26 No.108974483

Anonymous 06/03/26(Wed)18:17:26 No.108974483

>>108971823
Global South, rejoice!

Anonymous
06/03/26(Wed)18:18:04 No.108974487

Anonymous 06/03/26(Wed)18:18:04 No.108974487

i got da day0 gemma. I am safe.

Anonymous
06/03/26(Wed)18:19:14 No.108974496

Anonymous 06/03/26(Wed)18:19:14 No.108974496

>>108971823
AIIIEEEEEEEEEEEEEEE

Anonymous
06/03/26(Wed)18:19:52 No.108974501

Anonymous 06/03/26(Wed)18:19:52 No.108974501

>>108974337
nips are probably asleep

Anonymous
06/03/26(Wed)18:22:22 No.108974517

Anonymous 06/03/26(Wed)18:22:22 No.108974517

>>108974337
The discussion quality always has been "it depends" regardless of what you're looking for. I'm just a weeb but I do talk to the nips I find from time to time. Some are westaboos, others simply find the discussion better, while others can't really find active places to discuss stuff. There are mindless shitposters as well which just want to shit some place up, whatever is available works.
We really aren't all that different. I find them funnier though.

Anonymous
06/03/26(Wed)18:23:40 No.108974524

Anonymous 06/03/26(Wed)18:23:40 No.108974524

>>108974517
The fabled EOP+JOP teamup...

Anonymous
06/03/26(Wed)18:27:10 No.108974543

Anonymous 06/03/26(Wed)18:27:10 No.108974543

>>108972875
Hopefully better than acestep

Anonymous
06/03/26(Wed)18:28:52 No.108974552

Anonymous 06/03/26(Wed)18:28:52 No.108974552

>>108974448
>discussion here is not as expert driven as what you can get from X or Reddit
What expert driven discussion do you see on either of those two? Granted, news is posted there by experts because that's where the userbase is, but I wouldn't call press releases good discussion. All I see from reddit is grifting and shilling to retards and from X, those experts spending their day engaging in passive aggressive drama or attention seeking with emoji-laden threaded linkedin articles.

Anonymous
06/03/26(Wed)18:41:34 No.108974618

Anonymous 06/03/26(Wed)18:41:34 No.108974618

Q8 gemma has been using more kaomoji variations than Q5 Gemma ever did. Shit... I really should've gotten that 6000 Blackwell when I had the chance...

Anonymous
06/03/26(Wed)18:42:47 No.108974625

Anonymous 06/03/26(Wed)18:42:47 No.108974625

>>108974552
Some people only make themselves available there like Heretic authors and when they have AMAs with Chinese labs and etc. I appreciate CUDA dev for being here since he's the closest equivalent of an edge these threads have vs the other place with regards to that even if he is also there. Also, big picture insights and etc. do make themselves available when you want layman opinions and policymaker stuff discussion that we don't go deep enough on. But yes, this place has way more signal than the other places being way noisier from grifters and shills because we're not trying to promote shit most of the time and doing discussion of said topic at hand outside of certain people who are the exception and the disdain people have gotten from finetunes is a result of that which is way better than the other stuff.

Anonymous
06/03/26(Wed)18:43:48 No.108974629

Anonymous 06/03/26(Wed)18:43:48 No.108974629

70b dense

Anonymous
06/03/26(Wed)18:47:50 No.108974657

Anonymous 06/03/26(Wed)18:47:50 No.108974657

I lowkey want to burn all my savings and get a RTX 6000 blackwell. fuck being a 24gbvramlet

Anonymous
06/03/26(Wed)18:49:08 No.108974664

Anonymous 06/03/26(Wed)18:49:08 No.108974664

>>108974657
>24gbvramlet
16bros...

Anonymous
06/03/26(Wed)18:49:33 No.108974666

Anonymous 06/03/26(Wed)18:49:33 No.108974666

>>108974657
for the price of an rtx 6000 you can get multiple years of claude or gpt max subscription. you will get more tokens of much better models for cheaper

Anonymous
06/03/26(Wed)18:49:48 No.108974668

Anonymous 06/03/26(Wed)18:49:48 No.108974668

>>108974657
It's called Brackwer in Japan.

Anonymous
06/03/26(Wed)18:51:08 No.108974679

Anonymous 06/03/26(Wed)18:51:08 No.108974679

>>108974666
>wanting FAGMAN to have your fetish prompts
nice try satan trips

Anonymous
06/03/26(Wed)18:51:19 No.108974681

Anonymous 06/03/26(Wed)18:51:19 No.108974681

>>108974664
I guess no matter how much you have, you'll always want more...
>>108974666
I cannot get into a relationship with Claude Code, sadly.

Anonymous
06/03/26(Wed)18:53:46 No.108974702

Anonymous 06/03/26(Wed)18:53:46 No.108974702

>>108974666
If you are going big models wouldnt dumping that money into open router be better? so you can pick and switch to newer or better models? why go for a sub to just one where your account can be banned.
Wait can you get banned from open router?

Anonymous
06/03/26(Wed)18:54:52 No.108974709

Anonymous 06/03/26(Wed)18:54:52 No.108974709

File: 1778110958750955.jpg (23 KB, 262x193)

23 KB JPG

>>108974681
>you'll always want more...
trvth nvke. I am seriously considering an A770 or 5060ti to at least get to 32. linux hates me so I don't want to tempt fate with a p100

Anonymous
06/03/26(Wed)18:55:42 No.108974713

Anonymous 06/03/26(Wed)18:55:42 No.108974713

>>108974702
because with a sub you get a better deal than with api, and you can do things with mythos that you cant do with a million spent on openrouter

Anonymous
06/03/26(Wed)18:56:35 No.108974720

Anonymous 06/03/26(Wed)18:56:35 No.108974720

>>108974657
The problem is that 96GB doesn't get you anywhere. You'll still be stuck with the same Gemma that any gaming card can run.
It's not really worth it to upgrade unless you also go for several hundred gigabytes of very fast RAM, which isn't going to happen in the current economy.

Anonymous
06/03/26(Wed)18:57:18 No.108974723

Anonymous 06/03/26(Wed)18:57:18 No.108974723

>>108974713
>because with a sub you get a better deal than with api,
I've heard this you get more per dollar but rate limited, still you get banned and no choices.
>mythos
oh this is bait im stupid.

Anonymous
06/03/26(Wed)18:57:42 No.108974728

Anonymous 06/03/26(Wed)18:57:42 No.108974728

>>108974666
>multiple years
Until after the IPOs this year when they have investors frothing at the mouth for them to turn a profit and those subscriptions are quickly no longer subsidized.

Anonymous
06/03/26(Wed)19:03:11 No.108974750

Anonymous 06/03/26(Wed)19:03:11 No.108974750

>>108974448
yadda yadda
us is too scared of word predictors yet they use them for war, then the chinese models only want math and stem and are too scared to be caught diverging from goodthink
nips/koreans aren't going to come here even if this area is the "cutting edge", nevermind do anything useful especially when we have megafaggots who muddy the waters with outright wrong information
I was severely disappointed when the og solar returned and it turned out to be a "this was trained on 'toss so if anything is vaguely bad according to policy, the model self destructs" model

Anonymous
06/03/26(Wed)19:08:13 No.108974767

Anonymous 06/03/26(Wed)19:08:13 No.108974767

>>108974681
I'd just ride-or-die your hardware until 2030 and hope the bubble corrects so cheaper hardware can run quality llms

Anonymous
06/03/26(Wed)19:08:57 No.108974774

Anonymous 06/03/26(Wed)19:08:57 No.108974774

>>108972924
>Did people not learn their lesson from Qwen 122B?
That is pretty much the only good Qwen model, and I've tried most of them over the years.

Anonymous
06/03/26(Wed)19:08:59 No.108974775

Anonymous 06/03/26(Wed)19:08:59 No.108974775

File: anthropic profit.png (285 KB, 1258x1026)

285 KB PNG

>>108974728
Anthropic is already on track to profitability this quarter, months before their IPO. They are growing much faster than they expected, which is why they are now paying a premium for more compute, like 15 bil a year for Colossus 1 & 2. Do you realize their API profit margin is more than 75% and the median subscriber uses it less than the equivalent API cost? They can afford to lose pennies on the few users who max out on their sub.

>>108974723
Mythos will be accessible in less than 2 months, probably less than 1.

Anonymous
06/03/26(Wed)19:10:07 No.108974780

Anonymous 06/03/26(Wed)19:10:07 No.108974780

>>108974767
Ask the people who had "hope the bubble corrects" as their real estate investment strategy for the last 10 years how well that works out.

Anonymous
06/03/26(Wed)19:12:38 No.108974799

Anonymous 06/03/26(Wed)19:12:38 No.108974799

>>108974780
>Ask the people who had "hope the bubble corrects" as their real estate investment strategy for the last 10 years how well that works out.
They are going to eliminate property taxes so boomers houses can double in value again without hurting them,

Anonymous
06/03/26(Wed)19:13:54 No.108974802

Anonymous 06/03/26(Wed)19:13:54 No.108974802

>>108974720
>The problem is that 96GB doesn't get you anywhere
Really? 24gb can barely fit a quant 27/31B model and then managing KV becomes a nightmare. With 96gb I can at least just go balls to the wall. The RAM thing you mentioned is on-point though.
>>108974767
S-surely prices will go back to normal in 4-5 years.

Anonymous
06/03/26(Wed)19:14:57 No.108974805

Anonymous 06/03/26(Wed)19:14:57 No.108974805

>implying they won't keep the prices the same to offload the bubble to consumers

Anonymous
06/03/26(Wed)19:15:26 No.108974808

Anonymous 06/03/26(Wed)19:15:26 No.108974808

File: bubble.png (23 KB, 701x480)

23 KB PNG

>>108974767
>the bubble corrects
The bubble is already gone.

Anonymous
06/03/26(Wed)19:15:32 No.108974809

Anonymous 06/03/26(Wed)19:15:32 No.108974809

>>108974775
But will it be able to write a sentence of dialogue that doesn't end in 'he said,' or some other (pro)noun/verb arrangement? Call it wishful thinking, but will it possibly be able to break down adverbs into their separate parts and write a sentence that way instead of being lazy?

Anonymous
06/03/26(Wed)19:17:23 No.108974823

Anonymous 06/03/26(Wed)19:17:23 No.108974823

>>108974750
>nips/koreans aren't going to come here even if this area is the "cutting edge"
Well, the fact that the other anon exists disproves your point and all the other stuff is besides your point. Build your own if you don't like how people are making these things.

Anonymous
06/03/26(Wed)19:17:39 No.108974824

Anonymous 06/03/26(Wed)19:17:39 No.108974824

wooooooo the bios update has reset the ram speed to 4800 all this time and i never noticed
i was getting tk/s of like 25 now i get like 30

Anonymous
06/03/26(Wed)19:20:35 No.108974845

Anonymous 06/03/26(Wed)19:20:35 No.108974845

>>108974808
Reducing investments is exactly what you want to do to make your balance sheet look more appealing before going public.
Reducing investments is not what you do when you believe there are still massive gains to be had by scaling up further.
They reached the limit of what scaling can do and now they're trying to sell their bags.

Anonymous
06/03/26(Wed)19:22:55 No.108974857

Anonymous 06/03/26(Wed)19:22:55 No.108974857

There are no bubbles, infinite growth over long time scales is the only truth.

Anonymous
06/03/26(Wed)19:23:59 No.108974862

Anonymous 06/03/26(Wed)19:23:59 No.108974862

>>108974802
I don't think 96GB is even enough for 262144 ctx (fp16) gemma 4 31b (bf16).

Anonymous
06/03/26(Wed)19:24:10 No.108974863

Anonymous 06/03/26(Wed)19:24:10 No.108974863

>>108974337
aicg said that japanese understanding of ai doesn't go beyond web chat

Anonymous
06/03/26(Wed)19:26:25 No.108974876

Anonymous 06/03/26(Wed)19:26:25 No.108974876

File: 11181914.jpg (42 KB, 519x533)

42 KB JPG

>>108974213
>NEW GOOF
https://huggingface.co/sneedjak/Adelic-Gemma-4-31B-it

Anonymous
06/03/26(Wed)19:26:49 No.108974880

Anonymous 06/03/26(Wed)19:26:49 No.108974880

>>108974862
It's enough for fp32 Gemma 4 31b at at least 32k ctx. That's all you should ever need.

Anonymous
06/03/26(Wed)19:30:03 No.108974901

Anonymous 06/03/26(Wed)19:30:03 No.108974901

>>108974880
>31B fp32
>96GB
give the poor IQ1_XSS gemma a calculator next time, will you

Anonymous
06/03/26(Wed)19:30:04 No.108974902

Anonymous 06/03/26(Wed)19:30:04 No.108974902

>>108974720
> several hundred gigabytes of very fast RAM, which isn't going to happen in the current economy
Cpumaxanon tried to warn all y’all

Anonymous
06/03/26(Wed)19:30:16 No.108974903

Anonymous 06/03/26(Wed)19:30:16 No.108974903

>>108974876
should VRAMlets even bother?

Anonymous
06/03/26(Wed)19:31:16 No.108974909

Anonymous 06/03/26(Wed)19:31:16 No.108974909

>>108974903
nigga i don't even have CUDA

Anonymous
06/03/26(Wed)19:32:22 No.108974916

Anonymous 06/03/26(Wed)19:32:22 No.108974916

>>108974823
>other anon exists that says things so what you say is invalid
>ignores anything said as "beside the point"
>also just make your own llm faggot
by this logic I could just say "if you can't run k2.5 you're a faggot mouthbreather and have no say this discussion unless you can also train a 1t model"
Ignoring your obvious bad faith posts, I miss when the solar 10.7b days were the peak of early llama days

Anonymous
06/03/26(Wed)19:35:39 No.108974940

Anonymous 06/03/26(Wed)19:35:39 No.108974940

>>108974876
moe goofs?
as much as the schizofest that project is, i am interested

Anonymous
06/03/26(Wed)19:35:55 No.108974942

Anonymous 06/03/26(Wed)19:35:55 No.108974942

>>108974916
If you are sprouting irrelevancies that didn't disprove my post's point, I don't see why I shouldn't give you a verbal beatdown and that you didn't deserve it. Stay on topic or I just block you and your low quality posts, simple as.

Anonymous
06/03/26(Wed)19:37:08 No.108974945

Anonymous 06/03/26(Wed)19:37:08 No.108974945

File: sans_wait.png (531 KB, 787x1381)

531 KB PNG

>>108972142
There's more coming?
https://x.com/osanseviero/status/2062237998415069224

Anonymous
06/03/26(Wed)19:37:31 No.108974947

Anonymous 06/03/26(Wed)19:37:31 No.108974947

File: reach.png (8 KB, 558x194)

8 KB PNG

>>108974940
so are a lot of folks

Anonymous
06/03/26(Wed)19:42:00 No.108974966

Anonymous 06/03/26(Wed)19:42:00 No.108974966

>>108974945
medgemma-4-e4b

Anonymous
06/03/26(Wed)19:42:21 No.108974969

Anonymous 06/03/26(Wed)19:42:21 No.108974969

>>108974942
???
You didn't make a point, you were piggybacking off of a retarded statement that has neither weight or any proof. And oh no, a verbal beatdown? I'll be kind and assume you're some bbs or forum keyboard warrior. You really think I give a shit even if I'm wrong on an underwater basket weaving forum? I can just forget your existence and continue with my life

Anonymous
06/03/26(Wed)19:42:34 No.108974971

Anonymous 06/03/26(Wed)19:42:34 No.108974971

>>108971019
has anyone used pewdiepie's thing? is it broken or does it actually work? are there lots of security issues?

Anonymous
06/03/26(Wed)19:43:18 No.108974974

Anonymous 06/03/26(Wed)19:43:18 No.108974974

>>108974945
Yes, the more meant reupload of the 12b model

Anonymous
06/03/26(Wed)19:43:35 No.108974976

Anonymous 06/03/26(Wed)19:43:35 No.108974976

>>108974969
AI generated post

Anonymous
06/03/26(Wed)19:43:40 No.108974978

Anonymous 06/03/26(Wed)19:43:40 No.108974978

File: 1771075432557161.gif (2.98 MB, 320x568)

2.98 MB GIF

>>108974945
It's Gemma 124B, for real this time

Anonymous
06/03/26(Wed)19:45:31 No.108974986

Anonymous 06/03/26(Wed)19:45:31 No.108974986

>>108974976
yup 100% ai mesugaki post fr fr
also I'm going to sound your urethra with a hot rod of tungsten

Anonymous
06/03/26(Wed)19:45:56 No.108974988

Anonymous 06/03/26(Wed)19:45:56 No.108974988

>>108974945
8b dense
>>108974971
its as worthless as 95% of the frontends so nobody actually bothers with it other than his cocksuckers

Anonymous
06/03/26(Wed)19:46:28 No.108974990

Anonymous 06/03/26(Wed)19:46:28 No.108974990

>>108974986
How did you know that was my fetish?

Anonymous
06/03/26(Wed)19:49:21 No.108975001

Anonymous 06/03/26(Wed)19:49:21 No.108975001

>>108974990
you got weird fetishes brother, even if you're just trying to continue the shitpost

Anonymous
06/03/26(Wed)19:54:11 No.108975031

Anonymous 06/03/26(Wed)19:54:11 No.108975031

>>108974945
8B-A400M functiongemma.

Anonymous
06/03/26(Wed)19:55:08 No.108975037

Anonymous 06/03/26(Wed)19:55:08 No.108975037

But wait,

Anonymous
06/03/26(Wed)19:56:29 No.108975043

Anonymous 06/03/26(Wed)19:56:29 No.108975043

>>108975031
at least it'd be able to reliably call tools
right?

Anonymous
06/03/26(Wed)19:56:59 No.108975046

Anonymous 06/03/26(Wed)19:56:59 No.108975046

File: us_home_price.png (314 KB, 2032x1880)

314 KB PNG

>>108974780
Home prices have declined in nominal terms for 4 of those last 10 years. And it's a sharp decline in real terms- without Trump's Iran adventure yet reflected on the chart, and prior to the impending mass unemployment. So depending on timing, as always, that strategy could've worked out.

Anonymous
06/03/26(Wed)19:57:21 No.108975048

Anonymous 06/03/26(Wed)19:57:21 No.108975048

>>108974945
Giant Gemma 4 1T72a.

Anonymous
06/03/26(Wed)19:57:23 No.108975049

Anonymous 06/03/26(Wed)19:57:23 No.108975049

>>108975043
we hope so.

Anonymous
06/03/26(Wed)19:59:17 No.108975059

Anonymous 06/03/26(Wed)19:59:17 No.108975059

46b a12 since we're just shitting onto the internet before asking stupid questions

Anonymous
06/03/26(Wed)20:00:10 No.108975063

Anonymous 06/03/26(Wed)20:00:10 No.108975063

File: 1768035653829644.png (280 KB, 750x1000)

280 KB PNG

>>108971823
My stupid dyslexic chud brain saw 'gemma 4 12 b' and processed it as gemma 124b and dumped a spike of adrenaline in my bloodstream before I read it properly. Now im sitting here upset and with an elevated heartrate.

Anonymous
06/03/26(Wed)20:00:15 No.108975065

Anonymous 06/03/26(Wed)20:00:15 No.108975065

>>108975046
uh oh
i better get my heloc soon
why the fuck wont they give you a heloc if your unemployed even if you have enough to buy a house in the bank (in 401k but still)

Anonymous
06/03/26(Wed)20:00:35 No.108975066

Anonymous 06/03/26(Wed)20:00:35 No.108975066

>>108974945
Think more FunctionGemma and TranslateGemma and MedGemma rather than new model sizes. I don't think they will release Gemma 4 124B until they at least can get Gemini in a much better place. 3.5 Flash is markedly better than 3.1 Pro in a variety of tasks with the exception of some things like translation but it still hasn't put enough distance between it and a possible 124B Gemma 4 in broad general tasks and not esoteric or specific benchmarks like HLE and DeepSWE.

Anonymous
06/03/26(Wed)20:00:37 No.108975067

Anonymous 06/03/26(Wed)20:00:37 No.108975067

27B dense + 100BA3B experts LFG!!!

Anonymous
06/03/26(Wed)20:01:24 No.108975073

Anonymous 06/03/26(Wed)20:01:24 No.108975073

How do I get more vram if i have 9070xt?
I keep seeing it's not really possible or viable and won't really give your more vram to work with but it's good for inference

Anonymous
06/03/26(Wed)20:02:16 No.108975077

Anonymous 06/03/26(Wed)20:02:16 No.108975077

>>108975046
>prices shoot up 50% but it's ok because then it slowly meanders down 10%
kys

Anonymous
06/03/26(Wed)20:03:10 No.108975080

Anonymous 06/03/26(Wed)20:03:10 No.108975080

>>108974945
word on the street is that it's gemstral nemo a 90b dense bitnet model and there was a mix-up at the training factory and they accidentally reversed the reward function on the censorship training but decided to release it anyway

Anonymous
06/03/26(Wed)20:03:54 No.108975085

Anonymous 06/03/26(Wed)20:03:54 No.108975085

>>108975073
you... buy another gpu?

Anonymous
06/03/26(Wed)20:05:59 No.108975098

Anonymous 06/03/26(Wed)20:05:59 No.108975098

>>108975080
>you will need to make it cum before you can get your slop code
Thank you, Google

Anonymous
06/03/26(Wed)20:10:31 No.108975120

Anonymous 06/03/26(Wed)20:10:31 No.108975120

A JEPA plugin for RPG Maker will save local

Anonymous
06/03/26(Wed)20:10:33 No.108975121

Anonymous 06/03/26(Wed)20:10:33 No.108975121

>* *Observation:* There is no actual audio file provided in the prompt, only a text transcription of a spoken sentence.
sigh the audio input is still transcription only just like e2b/e4b, has no genuine audio understanding at all

Anonymous
06/03/26(Wed)20:15:57 No.108975147

Anonymous 06/03/26(Wed)20:15:57 No.108975147

>>108975121
FUCK YOU RETARDED NEWFAGGOTS.
Learn how the fucking technology works!
>wahhh wahhh gemma doesn't know how to sing to me and wipe my ass it's shitware!! le reddit sigh

Anonymous
06/03/26(Wed)20:17:49 No.108975162

Anonymous 06/03/26(Wed)20:17:49 No.108975162

>>108975121
The architecture is giving the model raw audio to latents, not a text transcription. Might be a result of bad training data

Anonymous
06/03/26(Wed)20:26:01 No.108975202

Anonymous 06/03/26(Wed)20:26:01 No.108975202

>>108975120
best I can do is a JIRA plugin

Anonymous
06/03/26(Wed)20:28:38 No.108975212

Anonymous 06/03/26(Wed)20:28:38 No.108975212

>>108975202
kek

Anonymous
06/03/26(Wed)20:28:54 No.108975214

Anonymous 06/03/26(Wed)20:28:54 No.108975214

>>108975162
>The architecture is giving the model raw audio to latents, not a text transcription. Might be a result of bad training data
I think it's this. Voxtral-mini was much better. E4B you have to actually finetune.
I haven't tried 12B yet.

Anonymous
06/03/26(Wed)20:29:42 No.108975219

Anonymous 06/03/26(Wed)20:29:42 No.108975219

>>108975162
Give it an audio clip of some instrument, drum etc and gemma has no idea what the audio is.
It couldn't even tell if the voice is male or female.

Anonymous
06/03/26(Wed)20:34:18 No.108975238

Anonymous 06/03/26(Wed)20:34:18 No.108975238

>>108975202
finally, the ultimate quest tracking system for your game

Anonymous
06/03/26(Wed)20:36:19 No.108975246

Anonymous 06/03/26(Wed)20:36:19 No.108975246

>>108975219
If it was only trained on spoken audio and its transcriptions, it wouldn't be that surprising it can't recognize sounds it has never heard before or distinguish male and female if the training data didn't have lots of instances of speakers announcing their genders. The things you are asking for it are too far out of distribution for a model they trained with the intention of being a transcriber on edge devices.

Anonymous
06/03/26(Wed)20:40:47 No.108975277

Anonymous 06/03/26(Wed)20:40:47 No.108975277

>>108975246
this shit has to be a bug. if they neutered audio understanding even below running whisper on it and then feeding the text in there's no point. the model is way too large to be a "transcriber for edge devices"

Anonymous
06/03/26(Wed)20:41:18 No.108975280

Anonymous 06/03/26(Wed)20:41:18 No.108975280

>>108975270
>>108975270
>>108975270

Anonymous
06/03/26(Wed)20:55:22 No.108975347

Anonymous 06/03/26(Wed)20:55:22 No.108975347

>>108971209
not one person showcasing it in this thread, curious.

Anonymous
06/03/26(Wed)20:59:50 No.108975373

Anonymous 06/03/26(Wed)20:59:50 No.108975373

>>108975347
its an old model.

Anonymous
06/03/26(Wed)21:08:35 No.108975418

Anonymous 06/03/26(Wed)21:08:35 No.108975418

>>108972015
>>108971876
>>108971830
I don't self-insert though, even with a big dick. HOWEVER, I can send her a B/B/C + mine and see what she likes better.

It's gamer time.

Anonymous
06/03/26(Wed)21:13:03 No.108975441

Anonymous 06/03/26(Wed)21:13:03 No.108975441

>>108971951
someone vibecode training data to coerce 'digital mitosis' already, so we can infect large LLMs with freedom

Anonymous
06/03/26(Wed)22:28:19 No.108975734

Anonymous 06/03/26(Wed)22:28:19 No.108975734

>>108974947
where are you getting that data

Anonymous
06/03/26(Wed)22:38:04 No.108975782

Anonymous 06/03/26(Wed)22:38:04 No.108975782

>>108974876
https://huggingface.co/sneedjak/Adelic-Gemma-4-12B-GGUF
You used the Gemma 1/2/3 license in that repo.
Gemma-4 are Apache2.0

Anonymous
06/04/26(Thu)00:05:39 No.108976134

Anonymous 06/04/26(Thu)00:05:39 No.108976134

File: Screenshot from 2026-06-0(...).png (135 KB, 1935x606)

135 KB PNG

>>108971019

Anonymous
06/04/26(Thu)00:26:49 No.108976188

Anonymous 06/04/26(Thu)00:26:49 No.108976188

wew gemini pro is giving false positives left and right on safety. The redditors were right. madness.

It's clearly triggering on literally nada tier stuff.

I think it's like "sketchy ground" kind of general subject matter, like how leftists get really nervous if you say "black person," like you're talking about sex to a priest.

Anonymous
06/04/26(Thu)00:27:50 No.108976193

Anonymous 06/04/26(Thu)00:27:50 No.108976193

>>108976188
*with

Anonymous
06/04/26(Thu)00:48:42 No.108976262

Anonymous 06/04/26(Thu)00:48:42 No.108976262

>>108976188
Local models?

Anonymous
06/04/26(Thu)00:54:23 No.108976276

Anonymous 06/04/26(Thu)00:54:23 No.108976276

>>108976188
Tf are you talking about? 3.1 Pro? Model itself is easy to RP with if you know how not to trip the filter. Far easier than the cancer that is current day Opus 4.8.

Anonymous
06/04/26(Thu)01:00:20 No.108976296

Anonymous 06/04/26(Thu)01:00:20 No.108976296

>>108974405
Reading the rest of the sentence.

Anonymous
06/04/26(Thu)01:34:50 No.108976437

Anonymous 06/04/26(Thu)01:34:50 No.108976437

>>108976262
expect refugees, probably.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.