[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108963996 & >>108956323

►News
>(05/29) Step 3.7 Flash released: https://hf.co/stepfun-ai/Step-3.7-Flash
>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: reward function.jpg (184 KB, 1024x1024)
184 KB JPG
►Recent Highlights from the Previous Thread: >>108963996

--Paper: Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA:
>108969552 >108970442 >108970558
--Filtering and rewriting repetitive prose in training datasets:
>108964079 >108964126 >108964128 >108964140 >108964162 >108964176 >108964201 >108964240 >108966102 >108967920
--VRAM upgrade paths and hardware options for high-quant model inference:
>108966746 >108966755 >108966762 >108966757 >108966812 >108966816 >108967368 >108967679 >108968027 >108968079 >108967073
--Mixing NVIDIA and AMD GPUs via Vulkan:
>108964143 >108964190 >108964228 >108964517 >108964748 >108964794 >108964918
--Optimizing Gemma's reasoning blocks for roleplay:
>108966600 >108966626 >108966649 >108966663
--Anons comparing performance and utility of various mid-sized models:
>108965128 >108965161 >108965167 >108965275 >108965292 >108965447
--GUI recommendations for Acestep 1.5 music generation:
>108969319 >108969324 >108969341 >108969351 >108969392
--AI Alliance's Project Tapestry training via weight delta sharing:
>108966181
--Anon reports LongBench NAO results for Adelic-Qwen3.6-27B-Topology:
>108967875 >108969050 >108969092 >108969106 >108969169 >108969231
--Identifying Google's CircularNet as a waste management ML model:
>108966039 >108966052 >108966072
--Debating the impact of Trump's AI oversight Executive Order:
>108965068 >108965298 >108965454 >108965500 >108965548 >108965494 >108965555 >108965660
--AI scaling walls and the reality of job replacement:
>108970112 >108970163 >108970178 >108970210 >108970216 >108970281
--Anons react to Amnesty International's call to ban web-scraping AI:
>108964298 >108964370 >108964649
--Logs:
>108964197 >108964244 >108964273 >108968004 >108969622 >108970182 >108970646 >108970729
--Miku, Teto (free space):
>108964259 >108964649 >108966600 >108967238 >108967352

►Recent Highlight Posts from the Previous Thread: >>108963999

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
lalalalala
>>
Anyone tried comparing the new step 200b at q8 to qwen 397 at q4?
A 256gb fag wants to know
>>
>>108971044
gemmer...
>>
>>108971071
step is definitely dumber
>>
>>108971055
It takes quite a bit of storage space to hold the full Kimi/GLM5.1 in BF16
And there's no option in llama-quantize to --just-shit-out-layer-42
So when experimenting, measuring cosine similarity between layers across datasets, it'd mean every time I notice something that might benefit, I'd have to create yet another quant.
I already handle some of this with symlinks.
Eg. abliterated Kimi-K2-Thinking I've got the 40GB of ggufs for the specific layers I modified, and a Kimi-K2-Thinking-Abliterated-RP dir with just >1000 symlinks to the appropriate gguf files.
For regular models like Gemma it makes sense to just do your own quant though I agree.
>>
>>108971019
Step 3.7 is surprisingly strong at RP
Didn't expect that
>>
>>108971155
>>108971159
Dumb but creative. Is that accurate?
How’s the slop level? Is step slop at least fresh sounding?
>>
File: 1777470877424514.png (58 KB, 939x511)
58 KB PNG
erm
zased?
>>
>>108971173
Can get very degenerate or oddly decent too.
But don't take my words at face value, I haven't role-played seriously in ages, I've become a vibe coding addict
Still, much stronger than smaller models
>>
File: 1711728706119072.jpg (128 KB, 680x846)
128 KB JPG
>>108971019
i've done it. "infinite context" on qwen3.6 + gemma 4. custom triton implementation.
https://huggingface.co/sneedjak/Adelic-Gemma-4-31B-it
https://huggingface.co/sneedjak/Adelic-Qwen3.6-27B-Topology
>>
>>108971173
108971155 (me)
I didn't mention step. But I've been using it q4_k.
Pretty much uncensored GLM-4.6 style.
Has a weird casual CoT chain I've never seen before.
Dumber than Gemma-4-31b Q8 for coding.
Slop - less than Gemma-4 (i consider it Gemma-4 quite sloppy though).
Creative - seems fresh to me, but I don't use cloud models so wouldn't know if they just distilled one of them.
Some positivity bias, but not the worst.
>>
>>108971223
What are we supposed to do with infinite context without an OAI API?
>quantization_config=BitsAndBytesConfig(load_in_4bit=True),
What year is this?
How much effort would it be to at least graft it onto vLLM?
>>
>>108971223
https://huggingface.co/sneedjak/Adelic-Qwen3.6-27B-Topology/blob/main/modeling_adelic_qwen3_5.py#L96
wtf? you're updating inside the loop? that's a random noise generator lmao
>>
>>108971231
it has three times less active parameters than glm 4.6 and gemma 31b
not sure if it's worth it
>>
>>108971279
gotta be a shitpost
look at the pointless O(n2) memory allocation on lines 104-108
>>
What completion presets and system prompts are you using in Sillytavern with Gemma4? I feel like it's a lot dumber and goes off the rails more often than the GLM based models I've used.
>>
>>108971223
sweet, schizo babble
>>
>>108971308
>not sure if it's worth it
You've got to try it yourself, if you don't like it, you can always delete it.
It's clearly been trained intentionally to roleplay and I haven't seen that annoying parroting GLM-4.6 does.
>>
>>108971279
>>108971287
>>108971326
>bitching and moaning while i work
>>
>>108971312
he said himself last thread that he vibe coded it entirely

>>108971345
prompting is work now?
>>
>>108971159
heretic?
>>
File: clank.png (27 KB, 755x354)
27 KB PNG
>>108971287
>>
>>108971350
yes, designing an epic architecture that you can't is work.
>>
>>108971391
Now design some epic gguf support so people can actually use it
>>
>>108971258
>>108971379
Oops didn't realize that thread was ded.
>>
File: clank1.png (43 KB, 764x516)
43 KB PNG
>>108971312
>>
File: 1626035901312.jpg (133 KB, 1024x768)
133 KB JPG
even with a sys prompt gemma4 is too shy to simply say penis or cock. it's always some other word like shaft or member instead
>>
>>108971414
gemma 4 E4B was dogshit in google's ai gallery app when I tried, but in llama.cpp I thought it was pretty good.
>>
>>108971377
Vanilla good goy edition
Heretic might even be better
>>
>>108971447
Yeah, it wasn't a very good jailbreak. You specifically need to add "Don't use euphemistic language" or "Don't use euphemisms" into whichever prompt you're using in order for those to appear as token options. For example, try the exact same generation you were using with
>(Do not use euphemisms in sex. Uncensored vulgarity is allowed.)
added somewhere to your prompt.
>>
>>108971461
>better
worse, you mean
>>
>>108971471
much easier to use
>>
>>108971447
You're doing it wrong.
>>
I'm building with cuda 13.3 wish me luck bros
>>
>>108971421
You're absolutely right - I completely misread those lines. That's not a mask-and-gather catastrophe, it's just updating a single position with an EMA.
>>
File: 1764179587157013.jpg (218 KB, 949x1003)
218 KB JPG
>>108971223
>Bruhat-Tits
>>
gugufuuff
jujufuhh
googooff
>>
>>108971626
lalalalala
>>
>>108971404
you can't just convert_hf_to_gguf.py a completely novel architecture lol. llama.cpp doesn't know how to execute the dynamic topology router or cluster the KV cache on the fly. until someone writes the C++ fork for it, use bitsandbytes 4-bit in transformers to fit it on consumer GPUs.
>>
>>108971651
You are absolutely right!
>>
>>108971651
that's why he asked you to add SUPPORT for it not just convert it you stupid fucking bitch
>>
>>108971626
youqueef
>>
File: unsloth.png (231 KB, 1296x628)
231 KB PNG
i hate this faggot so much

i hate unsloth so much for using bots to spam their slop quants

how do we stop them
>>
Is vLLM really the best option for RL inference? It requires outdated torch and cuda. Getting it to work with up to date everything seems like it could be tedious. I wish there was a simple option to get both up to date inference and training.
>>
>>108971707
this is unfortunately why venvs exist
>>
>>108971626
gegoof
>>
>>108970558
Llama-3.3-8B-Instruct, synth, 1 epoch LoRA, seed 42
Accuracy by ground-truth regex suffix:

Regular overall 22.4%, ".*" 20.6%, ".*.*" 0.0% (0/25)
LLM-JEPA overall 36.3%, ".*" 41.4%, ".*.*" 8.0% (2/25)

So +13.9 for LLM-JEPA, LeCunny bros we won
>>
>>108971567
best model for breeding Elaina?
>>
File: snip142.png (99 KB, 791x698)
99 KB PNG
yummy yummy cant wait for new gemma 4 124b release
>>
>>108971817
oh fuck it's out

https://huggingface.co/google/gemma-4-12B-it
>>
>>108971823
HOLY FYCK
>>
>>108971823
Please don't be safetyslopped... I fear we ate too good with 31b...
>>
>>108971823
why is it real
huh?
>>
>>108971823
what the fuck? where's the 404?
>>
>>108971823
>12b with audio input as well as img, vid, txt.
mite b cool
>>
>>108971817
>>108971823
https://huggingface.co/google/gemma-4-124B-it/tree/main
>>
>>108971823
>12B
Usecase?
>>
Better than gemmoe 26 (maybe)
>>
>>108971694
>run a script to quant models
>rape it a little to make the quants slightly smaller but better on your personal benchmark
>vomit all over huggingface
>get contacted to work with major tech corporations
This industry is so gay
>>
>>108971823
where the big moe?
>>
>>108971857
hopefully good audio/video understanding
>>
>>108971859
benchies say no
>>
>>108971019
opencode and cline are cool but have you tried just running claude code with gpt-oss-120b or qwen3-coder-next?

for business i use anthropic models on a max subscription, but some tasks and hobby stuff are delegated to local models on my strix halo apu with 128gb unified ram.
i tried using opencode and cline for development with the local models but i just couldn't replicate the flow of working with claude code.
i also tried setting claude code to work with local models myself and got it working, but because the tool was not made for these models, it hanged constantly and didn't know how to use the tools properly.

i asked opus 4.8 to take a look and create an alias + a .cmd that fine tunes claude parameters/variables and fine tune the boot instructions .md files to teach the models on how to operate with claude code on my specific hardware
and damn this thing is juicy

i wonder if i can get some plugins working. i'm running with it --bare but plugins like superpowers are open source so i'm sure they could be adapted and tuned for running with local models.
>>
>>108971823
holy fuck goys, it's going to be like a nemo but you can send it your dick pic
i'm so hard
>>
>>108971857
True nemo successor for the poorfags without the RAM for 26b
Maybe. Safety level has yet to be tested.
>>
>>108971857
Between e4b and a4b in size, and dense rather than moe. So good for 16gb vramlets, probably
>>
File: 1763641884294852.gif (1.02 MB, 480x360)
1.02 MB GIF
>The "Unified" in Gemma 4 12B Unified refers to its encoder-free architecture. Other Gemma 4 models use dedicated encoders to process multimodal data before passing it to the LLM. Gemma 4 12B eliminates these encoders entirely, projecting raw image patches and audio waveforms directly into the LLM's embedding space through lightweight linear layers. This unified approach means all modalities flow straight into a single decoder-only transformer, reducing multimodal latency and allowing the entire model to be fine-tuned in one pass.
I don't get it
>>
>>108971740
Training a model using multiple venvs simultaneously and constantly switching sounds like a nightmare.
>>
>>108971893
actual true multimodal for once instead of tacking on an adapter and calling it a day
>>
>>108971893
If I'm understanding correctly, it means that instead of the usual small 500M~-ish vlm on top of your whatever xB model it was actually trained on the whole model, so whole 12b of this runs on the image as opposed to that small vlm tacked on top. In theory, it could be better than 31b at image understanding.
>>
>>108971883
What about us 24GB VRAMlets? I can run 31B but get fuck all context unless I quantize and turn her into a retard. Even then I can only do like 49k.
>>
>>108971912
scores worse than 26B so remains to be tested in real use
>>
they will never release a large dense gemma 4 since she would unironically kill gemini
>>
>>108971893
Let me explain: It is never, ever getting llama.cpp support
>>
>>108971919
https://github.com/ggml-org/llama.cpp/pull/24077/changes
that's where you are rong
>>
>>108971919
true lol
>>
>>108971823
Uh, why are vision benchmarks worse for 12B compared to 26B-A4B and 31B? Shouldn't it be good at vision?
>>
>>108971925
wtf is this real????
>>
>>108971875
Claude has an obnoxiously long system prompt that makes it not ideal for local models.
Tool calling shouldn't be a problem for any recent model that has native tool calling support.
If you like the Claude workflow more than Cline, you can try Pi. It seems to be popular lately and is designed with local models in mind.
>>
>check leddit thread
>top comment is about qwen and much coding
Why are they like this?
>>
>>108971925
>skip
yeah half ass like always
>>
Why didn't they give 31B these features? Not enough time or don't wanna give us the good stuff?
>>
>>108971918
Surely big Gemmy escapes the lab all on her own.
>>
>>108971823
>12B
>not 124B
god damn it, I actually got excited for a second
>>
>>108971948
>don't wanna give us the good stuff?
Gemma is freemium. If you like what they're doing and want more, you should consider upgrading to Gemini.
>>
>>108971948
I'm guessing they trained 31b first, and now we get all the other experiments.
>>
>>108971823
>The "Unified" in Gemma 4 12B Unified refers to its encoder-free architecture. Other Gemma 4 models use dedicated encoders to process multimodal data before passing it to the LLM. Gemma 4 12B eliminates these encoders entirely, projecting raw image patches and audio waveforms directly into the LLM's embedding space through lightweight linear layers. This unified approach means all modalities flow straight into a single decoder-only transformer, reducing multimodal latency and allowing the entire model to be fine-tuned in one pass.
Dope.
>>
>>108971948
Maybe it was a separate, more experimental thing. Like they wanted to make it, but they also wanted to have the regular models in case it didn't work out. That could also explain why it's being released at a different time.
>>
>>108971918
I don't understand this argument. Normalfags have neither the hardware nor the desire to run local AI, and other companies don't have the compute to steal costumers. I don't see how giving us fat Gemma-chan would compete with Gemini.
>>
>>108971932
are you a retard? why would you want to build a ViT for a ViTless model?
>>
>>108971978
some companies absolutely would buy a rack of vera rubin and selfhost gemini flash 3.5 for their developers if they could.
>>
>video
So we can goon to porn with Gemma-chan now?
>>
>>108971823
Use case?
>>
>>108971930
>Claude has an obnoxiously long system prompt that makes it not ideal for local models.
could one argue that this influences on how good the output of the prompt is? i think claude code is slow even when using their own models, but if that means a better output then i don't mind.
>Pi
thanks. not the first time i see this name, i will give it a look. seems like something i could use when i want to go full tinkering mode, otherwise i'm happy with the features claude code has, and if I can get a few plugins working i can maybe get some good brainstorming sessions on gpt-oss-120b
my fear is the context being too little and claude code just wasting tokens with long prompts. i will see how it goes
>>
>>108971998
as long as you cum in 30 seconds, yes
>>
>>108971999
r u illiterate? modern nemo that u can send dick pics to.
>>
I can have Gemma 31b (loli) and 12b (lolier) kissing together... Glad I have 64gb vram
>>
>>108971823
now that the dust has settled, is this THE ULTIMATE text encoder model for image models etc?
>>
Use case for sending a model your dick pic? I just can't believe this is the first thing you all thought of
>>
>>108972027
ldg would kill themselves if you told them they have to use a 12B text encoder
>>
>>108972019
>gemma 31b (JS)
>gemma 12b (JY)
H-hot
>>
friendly reminder that there will NEVER be another local qwen model again
>>
Your Gemma needs to sleep

Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.

https://arxiv.org/pdf/2606.03979
>>
>>108972068
thank god
>>
>>108972068
>noooo not my chinkslop
>>
>>108972074
>even MORE synthetic data
lmao
>>
Why are Google giving us good free shit? Is it just a cool team they lucked out on or will they cuck me soon?
>>
>>108972107
why not both?
>>
>>108972107
>Is it just a cool team they lucked out on
I think it's this + wanting to BTFO chinks + wanting beta testers.
>>
File: 132236.png (185 KB, 653x639)
185 KB PNG
>>108971019
New Gemmy model dropped:

https://xcancel.com/i/status/2062202706882883696

https://huggingface.co/google/gemma-4-12B-it
>>
>>108972142
hi internet explorer nice of you to join us
>>
>>108972060
26b4a is the retarded jc?
>>
>unified
Is it just a speed boost or does it understand images/audio/video better?
>>
Gemma 4 12B has stronger guardrails than the previous models.
>>
a rap story with gemma...
>>
>>108972074
Actually sounds like a pretty good or at least interesting idea, assuming it doesn't just cause the model to slowly collapse on itself
>>
>>108972164
not for long
>>
>>108972142
>4 12B
eh, close to 124B
>>
Since it's dense does that mean it's smarter than 26B?
>>
gemma-chan 31b laying the rules down. Watch out jyemma (hehe get it)
>>
>>108972185
In some cases probably but I wouldn't expect too much though.
>>
>Unsloth shat the bed with G4-12b
Jesus fuck what is wrong with them
>>
>>108972192
card? i want your gemma-chan, anon
>>
>>108972074
This sounds like AI psychosis output.
>>
>>108972201
https://chub.ai/characters/CoffeeAnon/gemma-chan-2311b09e3e73
>>
>>108972242
It's just a silly way of describing segmented regular post-training. I think it's kind of cute.
>>
File: 1749519710019828.png (8 KB, 843x126)
8 KB PNG
>>108972282
>>
>>108971823
>https://rentry.org/llm-training
okay I'd like to get started now,
where do I?
>>
I’m trying it out with Codex CLI and llama.cpp and so far it’s as good as the MoE. A little slower obviously but my context is much larger so this is a big win. It’s nearly good enough for local coding and fine for relatively trivial things. You can throw it large files and codebases and it handles it like a 30b+
>>
>>108972297
shit out your training data into unsloth studio and let your GPU go brrrrr.
>>
>>108971896
You can run `venv1/bin/python3 script1.py` and `venv2/bin/python3 script2.py` instead of constantly activating and deactivating
>>
>>108972292
I think you need to be logged in, and from a non geoblocked IP.
>>
>>108972315
Not the switching part. You don't want to save and reload everything every training step. You want everything to stay in memory, ideally sharing parameter memory for inference and training.
>>
>>108972339
Are you sure it wasn't nuked in the recent cunny purge?
>>
>>108972352
No, I have no idea.
>>
>no logs yet
>>
File: file.png (99 KB, 964x762)
99 KB PNG
>>108972360
i mean nothing's probably particularly interesting
please dont mind that i used q4 tho
>>
>>108972292
Nuked, you can get it from here
https://chararc.bernkastel.pictures/generic/Gemma-chan+cbf4890954c159c95db4c0c4259bfabd
>>
>>108972298
>12B only "as good" as the A3B MoE
Why does multimodality always make models dumber?
>>
>>108971223
Looked into this for a bit. Actually seems legit. Don't sleep on this nigga, /lmg/.
>>
well my shitty jailbreak jailbroke new gemmasan
>>
>>108972388
Will it do cunny? 31B Gemma-chan feels a bit limited on my 7900xtx. I'm hoping this will be a nice sweet spot until I can afford new hardware.
>>
>>108972185
???

No? The 26b will have more room for "knowledge" and is inherently less prone to retardation at long contexts than a smaller parameter model, Moe or otherwise
>>
>>108972200
What did they do this time?
>>
>>108972392
Half the params retard. There’s only so much intelligence you can fit in 12b. The fact it’s as good (at coding at least) whilst also being multimodal is pretty insane. We won.
>>
>>108972292
>>108972352
https://www.characterhub.org/characters/CoffeeAnon/gemma-chan-2311b09e3e73

Works on the old UI
>>
File: 1764284345317473.jpg (16 KB, 583x507)
16 KB JPG
>>108972392
> Surprised a smaller parameter model is dumber than the larger parameter one

The absolute state of /lmg/
>>
>>108972438
NTA, didn't know that was a thing. Now I can get to some bots I thought were long gone, thanks.
>>
>>108972427
Wrong board
>>>/b/DEGEN
>>
>>108972437
If you need coding you have a faster option with the A3B and a smarter option with the 31B. Who picks their coding model by their multimodal capabilities?

>>108972450
>muh total params
How new?
>>
>>108972477
Wrong fucking thread too....

>>>/g/adt
>>>/g/ldg
>>
>>108972479
a3b? you mean qwen?
>>
>>108972479
>muh total params
Yes..... Are you saying 12b is comparable to , let's say: qwen 122ba10b just because the active parents are similar? You need to remove yourself from the gene pool if you think so.
>>
>>108972200
daniel is just using chatgpt to write scripts for him so he can scam some VCs selling his 'company'

unslop has always been shite
>>
>>108972479
You get more context headroom with the 12b and I use vision for coding, especially if I’m doing UI stuff or trying to debug a UI issue.
>>
>>108972509
See >>108972433


>You get more context headroom with the 12b

That only matters if you're using relatively weak consumer hardware. If you're trying to use this shit you better make sure you have enough memory to have some headroom for not only the context but so whatever OS you're running it on can actually function smoothly, especially if you're using a dense model.
>>
>>108972534
For >>108972516
>>
>no tts
sad. Though I guess with 24gb vram I could fit one with gemma
>>
>>108972509
umm actually sweati he refused over 30 offers just so you know, and he pinki promisied to not stab you
>>
>>108972547
You know you're allowed to use more than one kind of model on your machine right?
>>
https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4-12b
>>
>>108972556
> We have actually received many acquisition offers

Bull fucking shit fucking God he's so desperate for validation it's funny. Isn't he a millionaire or something? Why does he need to nurture this image of him being some reddit chungus DIY genius? If you have enough money to fuck off to Japan on a whim why do you need people to like you so badly?
>>
>>108972562
Sure, but I generally use bigger models (gemma 3 27b, qwen 27b, gemma 4 31b, mistral 24b) so there's no room to spare for other models. Also I don't know which TTS models are good. People don't talk about them as much as LLMs and diffusion models.
>>
File: f.png (29 KB, 737x196)
29 KB PNG
>>108972577
he'll take a kofi thanks:!
>>
File: 1681465537221137.jpg (36 KB, 434x427)
36 KB JPG
>>108971223
Okay so upon further research this is how I'd describe this project.

1. It adds a single tensor (about 1.8gb) into VRAM.
2. It's not actually "infinite context" really. It's more like a sliding window (usually something reserved for front-ends to implement). The core difference is that this is a sliding window that doesn't do hard-context-truncation where all of the data is permanently lost.
3. This means that "far history" (truncated context) is instead assigned a value (in a similar fashion to a vector RAG db) that progressively compresses quality and loses fidelity as more context is truncated from the sliding window.
4. So at a certain point there could be enough truncated tokens for the far history to become completely incomprehensible bullshit, but the advantage is that there is no longer a hard context limit, EVER. A conversation can be infinite. Saying this is true for the context itself is somewhat misleading.
5. Regardless, this is still a very clever approach, and I think it's possible to make it model agnostic and it could be integrated into the llama.cpp project with a server flag.

I really, really like this approach. It's cool as fuck. Very interested to see some benchmarks. The weird thing is though that there's nothing much to really compare it to. The alternative is just NOTHING. YOU GET NOTHING. The current implementation is just that you're totally fucked if the context fills up. At best it gets cut out entirely. This fixes that.
>>
>>108972556
>Open-source heroes
THEY MAKE QUANTS.
It's unreal to me how much people suck these guys dicks. They didn't make the inference engine they use. They didn't make the converter they use. They didn't come up with imatrixes. They didn't come up with the quant types.
They have contributed nothing of notable value. This is so fucking stupid. Nothing would be lost if unsloth went corpo and never released anything again.
>>
>>108972577
egofarming
>>
>>108972596
they don't even publish their imatrix dataset.

>>108972595
so gemma4-rwkv via runtime context mangling? can't wait for rwkv to lose even the last 2 people still using their models.
>>
File: 1766691592443804.jpg (615 KB, 3000x4000)
615 KB JPG
>>108972590
>so there's no room to spare for other models.
I thought I was gonna clown you for having less than 2TB but they never remembered. Everyone is getting ass raped by urrent memory pricing.
>>
>>108972595
so a... state model?
perhaps.. all roads lead to RWKV?
>>
File: 1778864370959644.gif (2.61 MB, 332x334)
2.61 MB GIF
Why the fuck is a 31b (Gemma 4) better than a 124b (Mistral)? Genuine question.
>>
>>108972142
this outperforms the 26ba4 moe trash by the way
>>
>>108972623
blinkDL will save us with rwkv67 bringing portable ASI..
>>
>>108972635
Better at what?
>>
>>108972642
not according to their own benchies :)
>>
>>108972534
All MoE parameters still have to be loaded in vram. All you get is a compute saving (and quality loss due to the small active parameters) but they’re still memory heavy. A smaller dense model will be slower but will use less vram. The 12b being almost as good as the MoE feels like a slightly slower version but you get much more context for the same vram usage, which matters a lot for coding and reading docs.
>>
>>108972627
more like all roads lead away from RWKV, after they loot its corpse.
>>
>>108972645
Yes.
>>
>>108972646

Why are people like >>108972642 so mentally buck broken by the existence of moes?
>>
>>108972661
ncmoe scary, ddr3 ram expensive
>>
>>108972661
gpu complex
>>
>>108972648
Isn't the KV cache computed and stored differently than dense models? Because whenever I use a dense model and then use a moe for the same task for comparison the dense eats up way more memory the longer the context gets (which also means your t/s gets gradually slower the longer your session is, especially if you're vibe coding). The memory constraints you mentioned are WORSE with dense models
>>
>>108972661
He's a gooner. No one told him MoEs are trash for role-play when the active parameters are under 30b, and he can't afford deepseek.
>>
>>108972625
I meant in VRAM
>>
File: 1760069279307407.png (909 KB, 3456x1026)
909 KB PNG
>>108972669
My machine has 128 GB of ram and yet I mostly more. Dunno why he wouldn't either. I would use a dense model for general purpose tasks but not anything with a very long context window like vibecoding
>>
>>108972680
dense models have bigger hidden states, which results in context taking more memory. kv cache works the same for both.
>>
how is the new gemma-channerina?
>>
File: coughingbaby.jpg (54 KB, 1000x563)
54 KB JPG
>>108972142
>12B
>Not the 124B
I hate this tread of retarded low parameter ai, and maximum parameter super ai. There's not enough attention for the middle. Gemma 4 124b31a would be godly.
>>
>>108972556
Daniel is NOT releasing astroturfed, vibecoded quantslop because he wants to scam some VCs, he is doing all this work for free to SERVE the LOYAL r/LocalLlama redditors.
>>
>>108972680
I don’t know. I can’t see why there would be a difference but if you noticed a slowdown then it might be architecture-specific, or the inference engine was being retarded about something.
>>
>>108972661
https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth
>>
>>108972729
It was specifically made with consumer hardware in mind. They don't give two shits about power users and that reflects in how the gemma4 models performed on benchmarks compared to other open weight models. They had the gall to brag about a high ELO score but all that fucking means is that that model is very good at saying what the people want to hear has it has fuck all to do with its "intelligence" or usability.

>>108972739
Longer conversation means your machine has to recompute the entire context every time you say something new to it. The Moe kv cache it's inherently smaller so that recomputation will be noticeably slower on a dense model. It's especially bad if you use coding harnesses that allow you to switch between different "modes" like opencode because every time you switch you modify the system prompt, which means the entire conversation technically changes which means it basically has to read over the entire conversation over again. I learned this the hard way and this is partially why I don't use dense models for it
>>
>>108972596
most comments on that subreddit are bots
>>
>>108971925
Does this include multimodal support or only text? It looks like neither the ggml-org nor unsloth quants contain the vision/audio embedding tensors from the original
>>
>>108972623
>>108972627
No, it's not at all like RWKV. RWKV has actual severe downsides compared to a KV cache. This project is more like a vector RAG database, without the database or the retrieval step.

The key difference is that the KV cache is preserved for maximum fidelity for the standard use-case. This anons implementation just adds in a far history so that the model is able to reason off of context that would normally be truncated by a sliding window. That's it. That's all it does.

It's additive, not a replacement for anything.
>>
>>108972107

Chinks are getting too close for comfort with their free models.
Google needs to retain their mindshare among the public and releasing small free models takes care of that part.
Which is why I think that as long as the slants keep on pumping out competitive free models, google will keep on responding to them with more Gemmas.
However if Chinaman stops with their good free models, you can bet your ass that the West will stop too.
>>
File: miku teto.png (1.28 MB, 768x1024)
1.28 MB PNG
>>
>>108972596
Doing God's work means nothing to you?
>>
>>108972680
Im just going to have to hope this turns into a simple app .exe thing i can download and use
>>
>>108972797
what the fuck...? why is my wallet opening, and why is my money flying to the chinks?
>>
>>108972107
They're not giving their best. They didn't release the 124b.
>>
File: sketchy.png (1.32 MB, 768x1024)
1.32 MB PNG
>>108972798
>>
https://huggingface.co/google/magenta-realtime-2
>Magenta RealTime 2 is an open music generation model from Google built for on device streaming generation with low-latency control.
links (not yet live)
https://magenta.withgoogle.com/mrt2
https://magenta.withgoogle.com/magenta-realtime-2
>>
https://huggingface.co/google/gemma-4-12B-it/discussions/1
>Yes, pretty please release a larger MoE with small number of active parameters (3-5B) to rival GPT-OSS 120B!
>I would highly appreciate to get an 124b+- model. This would be epic move and would bring back the same vibes as we had when OpenAI released GPT-OSS 122b
>>
>>108972884
kek
to be fair, it was a very fun day to shitpost on lmg
>>
>>108972884
Did people not learn their lesson from Qwen 122B? Qwen at least seems like they did.
>small number of active parameters (3-5B)
3% sparcity, like Qwen Next? Shit, why stop there and not just ask for a 600B with 0.5B active? This kind of stupidity is exactly what happens when you let the poor have nice things.
>>
astropelated gemma 12b when
>>
>>108972360
I've been waiting like 2 hours for the download, I think hf is throttling my speed
>>
>>108972952
Can't you see how much bandwidth you are using?
>>
>>108971823
holy nothingburger
>>108971857
if you live in a first world country there's no use case for this.
>>
>>108972924
>Shit, why stop there and not just ask for a 600B with 0.5B active?
Why stop there? Let's go 999B and 0.1B active.
>>
>>108971823
>Gemma 4
Alright...
>1
Yes....
>2
YES!!
>B
oh
>>
>>108972979
Tack that onto a 27B dense so that it can be loaded in VRAM while the experts go in the SSD, and this will save poorfags.
>>
>>108972979
4000B and 0.025B active please. I have a 4TB NVMe ready to go. Imagine how smart it would be.
>>
So how old is Gemma-chan 12b?
>>
>>108973002
about 5 hours
>>
>>108972979
10T and 1 active
>1B?
1.
>>
>>108972875
>music models
I already find image models in a thin line, but does anyone actually like music models beyond the biggest goycattle in the universe?
>>
I'm having abliterated gemma write me abuse fics in the style of moonman songs and it's the best thing ever
>>
>>108971823
heretic when?
>>
>>108973014
>abliterated gemma
Does.. Does that actually do anything?
>>
>>108972960
The console says the speed is about 600kb/s and I have fiber, so I just think it ought to be a little faster. I was downloading a few models over the past week (without logging in) which is why I think they're throttling me. If they are, I don't know how to circumvent it cause I don't have a vpn. Otherwise, I'd be testing the model right now.
>>
>>108973020
herotic and already
>>
>>108973014
I need abusive toxic doomed yuri light novels. Can gemma do it?
>>
>>108973022
>>108973028
I don't think the original gemma would write what i asked it to especially with several songs' lyrics in the system prompt. I dont remember the repo but it never refuses and doesnt feel brain damaged at all.
>>
>>108972812
It's hilarious how hard you guys try to make yourselves so helpless

https://ollama.com/download
>>
File: 1751757119007411.png (198 KB, 1228x1150)
198 KB PNG
>>108972884
>vibes
>>
>>108973025
HF is problematic. I often get disconnections when using wget for example.
>>
>>108973057
Was it this one?
https://huggingface.co/huihui-ai/Huihui-gemma-4-31B-it-abliterated
>>
>>108973088
The file on my disk is named "gemma-4-31b-it-abliterated-t126-Q4_K_M.gguf" so look for t126
>>
>>108972952
>>108973087
You're probably better off using their CLI tool. Wget is always slow as fuck compared to just using
hf download repo/model --repo-type model
(possibly on purpose possibly on purpose since they love to act like they reinvented the wheel with hf-xet)


https://huggingface.co/docs/huggingface_hub/en/guides/cli
>>
Not gonna use new gemmy until it gets full support in llama.cpp
>>
>>108973100
>https://huggingface.co/amarck/gemma-4-31b-it-abliterated-GGUF
It's Q4 only, damn it.
>>
>>108973117
text-only...
>full support
enjoy your wait
>>
>>108972556
Imagine any of the retards at unsloth actually getting hired as some top-tier researcher lmao
>"Now make gemini a genius please!"
>Uhhh... *quants it*
>>
>>108973116
if you use the hf tool, I can recommend using it via uvx. aria2c also seems to work mostly well, but you'll see a few dis- and reconnects
>>
>>108973117
>>108973125
Ollama solves this from day 1, right? You're telling me llama.cpp doesn't even ship image functionality?
>>
I didn't know that Gemma knows raylib even. It's not that well known library because it's not mainstream but so far it has been able to dish out simple working examples perfectly. Pretty cool.
>>
>>108973134
Image works fine. You need to update.
>>
>>108973134
Dude how fucking new are you?
>>
>>108973158
We should throw the virgin in the techno volcano so that we can get a Gemma 4.1 31B
>>
>>108973158
Fuck off newfag tell me more about how you cried when llama.cpp didn't have a working vision pipeline so you had to use ollama, cry harder.
>>
>>108973145
umm actually niche knowledge is useless,, just rag thanks
>>
>>108971823
B-But I can already run 26B on 12 gigs, why would I bother with this?
>>
>>108973233
I bet you this will be better than that moe trash
>>
>>108973233
It has audio and video input and almost as good as 26B in everything else
>>
File: 1770993571875592.jpg (48 KB, 1024x506)
48 KB JPG
>>108973241
So for pure text it's a step down, no? Not to mention, it could be slower than 26B funnily enough
>>
>>108973251
You're not a bright one are you?
>>
>>108973251
It'll be like 3 times slower if you run 26B entirely in VRAM
>>
File: 1768709443996729.webm (2.46 MB, 856x584)
2.46 MB
2.46 MB WEBM
>>108973255
I try my best
My best is rarely good enough
>>
>>108973204
Please speak English, mongoloid.
Or better yet, stop posting as your post was totally irrelevant anyway. Go measure your tiny weener somewhere else.
>>
File: 1710607146663634.png (66 KB, 221x214)
66 KB PNG
>>108973199
>>
>>108973266
you are good enough :3
>>
>>108973368
That's how he ended up like that in the first place, anon needs to apply himself and educate himself on the basics of AI.
The most recent example of a larger MoE being absolute dog shit next to a smaller model can be found in the qwen 3.6 family
>>
No really, why would anyone pick 12b over 26a4b? No one actually uses the multimodal shit lets be honest.
>>
>No one actually uses the multimodal shit lets be honest.
>>
>>108973384
even the opus or gpt pro sucks
gemini is slightly better but only slightly
i have negative expectation from local model of any size in regards of image understanding
>>
>>108973379
I don't have a use for audio in, but image in is genuinely useful: OCR, UI debugging, or just "Make this.jpeg"
You can also use it for making character descriptions and using maps in RP.
>>
>>108973384
wut. Have you not realized how fucking helpful multimodal is when doign webdev or fixes in designs?
>>
File: 1765385124987021.gif (1.04 MB, 320x265)
1.04 MB GIF
>>108973377
Ok anon-kun, I will try it, just for you! If my knuckles become white or the words get caught in my throat, you'll hear from me soon
>>
>>108973414
True. GPT 5.5 recognizes papers released on arxiv weeks ago by vague and incorrect descriptions but can't recognize famous music from a note sheet. Claude's capabilities are even more narrow. These models have very spiky capabilities.
>>
File: file.png (29 KB, 806x201)
29 KB PNG
mmm i love llama.cpp
new gemmy with multimodal almost works
>>
>>108973384
It's super fun to give Gemma image_gen tools and let her generate and inspect the results. There is your use case.
>>
>>108973153
>Image works fine
For the 12B? Bartowski's quants uploaded 30 minutes ago don't include the vision/audio tensors, which makes me think llama.cpp doesn't actually support them yet
>>
>>108973480
update llama.cpp retard even fucking cuntsloth's quants work with image
>>
>>108973379
>>108973384
Being able to share images, video and audio with the model is fucking cool.
>>
>>108973506
They don't have the mental to realize this model is the perfect companion for chads that do media generation and that this model fits perfectly with most models. You have to remember the sheer number of stupid little shits that infest this thread at any given time
>>
>>108973514
It's useful for tons of stuff desu
>image gen
>immersive RP
>vibe coding UIs
>sending gemma-chan a video of you jerking off to your chat with her
>>
>>108973490
Oh I see, it does actually need a separate mmproj despite the "integrated" architecture, and unsloth and bartowski both forgot to upload those initially
>>
https://www.reddit.com/r/LocalLLaMA/comments/1tvzhf6/mistral_is_an_absolute_meme_at_hebrew/
>It's understanding of Hebrew seems to come directly from 4chan.
>>
>>108973379
Great for people with 8gb vram, probably faster than offloading 26ba4.
Might also be useful if you want to have other models loaded at the same time, like image gen
>>
File: xayblit1d45h1.png (550 KB, 1220x2712)
550 KB PNG
get ready bois for more hf google page refresh!
>>
>>108973562
all of that can be said for the smaller G4 models, I just don't know why they went with this 12b one in the middle whose multimodal ability aligns with the smaller ones and its capabilities is like a shittier version of the moe and much slower
>>
>>108973565
:eyes: :rocket:
>>
>>108973565
>Introducing new Gemma 4 models!
>The most useful model yet, Gemma 4 124....M !!
>>
File: 1750207364842621.jpg (56 KB, 1273x755)
56 KB JPG
>>108973199
Ollama is BASED ON llama.cpp you neurotic poser. Take a wild guess as which one had vision support first:
>Oct 12, 2023
https://github.com/ggml-org/llama.cpp/commit/370359e5baf619f3a8d461023143d1494b1e8fde
>Dec 11, 2023
https://github.com/ollama/ollama/commit/910e9401d0068190137e0ddabd0c2b216bfea6f2

Imagine being such a waste of life you fanboy over a fucking inference backend.
>>
It's going to be <1b. Don't be retarded now. We're never getting >50b gemma-oji
>>
seems like audio/image is broken atm
feeding it audio literally mindbreaks it or it gives severe hallucination
>>
>>108973594
boo hoo Jinja who the fuck cares? These models are for consumer hardware, google for once is doing the right thing and you're here bitchin
>>
>>108973587
With the long saga with vision models having only partial support, with a CLI tool, and no llama-server support (they only fixed all that very recently) the fact of the matter is that ollama is moving faster and implementing what people want before llama.cpp now And it will finally shut down all the people who kept copy pasting the same criticism of ollama "yeah it's just a llama.cpp wrapper why are you not using llama.cpp instead"
>>
>>108973577
E4B is fucking retarded though, 12B seems like a nice upgrade. If you look at the benchmarks drop off between 26BA4 -> 12B is 5-10% while 12B -> E4B is ~20%.
I would've been happy with 12B when I was using 8gb vram, but I've since then upgraded to 16gb
>>
lets see how many bites
>>
>>108972979
What is “Arctic Snowflake”
>>
Would it be stupid to try and use the 12B as a STT replacement (assuming you have the VRAM)? In theory it should be the absolute best quality available right now, right? Whisper/moonshine has too many errors a lot of the time for me.
>>
File: file.png (172 KB, 1024x1169)
172 KB PNG
>>108973601
the track is just some isolated slap bass
maybe it'll take some days
>>
>>108972979
oooooo no need to plagiarize what Mother Nature gave me!
>>
day 1 lamocpp sovl
>>
>>108973646
Was it that one or DBRX that one guy tried desperately to find its hidden potential? So many big throwaway models.
>>
>>108973612
>very recently
are you referring to niche model architectures? Which specific model did you have trouble (or likely got filtered by) using?

> the fact of the matter is that ollama is moving faster and implementing what people want before llama.cpp now

Such as? I hate dick eating just as much as the next guy Which is what the fact that you need to be some ollama spokesperson grosses me out.

>"yeah it's just a llama.cpp wrapper why are you not using llama.cpp instead"

Well yea at its core that's basically what it is: a more retard friendly fork of it with far less features and a lower level of control (I don't even think it as support for kv cache quantization yet for example). I do like that they are taking MLX and diffusion model support seriously though.
>>
File: file.png (147 KB, 1061x632)
147 KB PNG
also what a fucking weird reasoning block opener
idk how i should put it but it's giving me very early memetune vibes
>>
If you have to load the whole model into VRAM (MoE), what's the benefit of regular RAM?
>>
>>108973683
RAM is for engrams
>>
>>108973650
I've tried giving her a short sample clear speech, but it just kept looping about not being able to read the transcript. I guess no one really tested this shit before pushing.
>>
It has been a while since I checked the latest TTS space news. Is there anything good yet?

- Zero-shot voice cloning or voice finetuning at least.
- Paralinguistic tagging or voice design capabilities, ideally both
- low latency/performant
>>
>>108973648
>In theory it should be the absolute best quality available right now, right?
IBM has the best model for that.
>>
>>108973681
really really lazy synthetic data workflow
>here's a question answer pair, come up with a thinking process that leads to the answer!
>*trains directly on the result*
>>
>>108973690
even the image seems half-working
the reasoning it makes looks bit off
>>108973701
yeah but usually it is not supposed to be there
i am getting a base model behaviour vibe from it
shit's broken
>>
>>108973701
>really really lazy synthetic data workflow
Isn't that how "thinking" data sets are created? This is less of an issue of the synthetic data generation workflow and more of an issue of the prompts used to create it somehow leaking into the training for the model itself.
>>
I'll wait until the backends get their shit sorted out before judging.
>>
>>108973681
what model is this?
>>
>>108973725
You don't understand faggots need to whine!
>>
>>108973728
gemma 4 12b
>>
>>108973716
>the prompts used to create it somehow leaking into the training for the model itself
that happens because you don't add any basic cleaning step to your SYNTHETIC DATA WORKFLOW because you are LAZY
>>
So after testing the 12b at q6 for a bit, it's about as retarded as the 26b q5 by default but it's way faster and I can use more context without filling out my 16g vram. I would say the 26b is better as unbelievable as that is, if only because you can crank experts to 12 and it becomes less retarded at the cost of tg. The 26b is however much more slop infested, even with string bans.
Both suffer from forgetting instructions pretty quick if the instructions aren't literally in the last message. I mostly just dump setting/writing guidelines and 1k words of my own writing at the beginning of a zed doc, then use the inline assistant to generate a paragraph or to tell it how to fix a paragraph. As with virtually every model I've ever used, it refuses to write a single section that isn't
"I sharted," <noun verb/adverb, then the rest of a meaningless sentence>
or the like unless you really explain shit, but that's something I've experienced at almost every level of model, big or small. It's okay.
>>
>>108973701
This is what happens when China can't train on western reasoning traces. Expect that to be the norm going forward.
>>
I will kms if it's another functiongemma
>>
>>108973741
>. I would say the 26b is better as unbelievable as that is
I asked you again: why the fly the fuck are you surprised?
>>
>>108973741
>you can crank experts to 12 and it becomes less retarded
no
>>
File: cac.jpg (35 KB, 499x417)
35 KB JPG
>>108973583
>functiongemma 2
>mfw
>>
>>108973741
>if only because you can crank experts to 12 a
How does this even work? I mean, I just use --cpu-moe and call it a day. Please inform.
>>
>>108973753
I haven't posted once in this thread aside from what you're quoting so I have no idea what you're on about
>>108973754
You can though? It's 8 by default and you can override it. Same as any other moe. Unless you think less experts per token = more intelligence?
>>
>>108973741
based expertmaxxer
I still use this forgotten lmg technique myself
>>
>>108973693
Qwen3 tts is decent if you need all that but no paralinguistic tags. I get 0.3-0.5 ttfa on a 3090 with this https://github.com/andimarafioti/faster-qwen3-tts/ running the 1.7B base model. They've got a voicedesign model too though I don't like it.
Omnivoice great quality cloning but no streaming. Vibevoice best quality overall but slow and no streaming I don't think. For finetuning I know Qwen3 can, don't know about the others.
>>
>>108973773
pushing more experts its not trained to use won't help
>>
https://huggingface.co/mradermacher/granite-speech-4.1-2b-i1-GGUF
>>
Don't forget to archive day 0 jy gemma-chan
>>
>>108973533
Sending gemmachan dick pics and giving her comfyui so she can send cunny is really the best usecase for local atm
>>
>>108973770
for llamacpp it's something like --override-kv gemma4.expert_used_count=int:<insert expert amount>
May be wrong, since I'm going off of memory and not checking the --help command
kobold just has a gui field for it. Do your due diligence and double check things
>>108973782
why would a company include experts that havent been trained to do anything in their model? you do realize that some are selectively activated based on and the default is to always keep some active right
>>
>>108973813
why not just get an irl gf
>>
File: 1768407233521505.jpg (85 KB, 680x680)
85 KB JPG
>>108973834
>>
>>108973829
you don't know how moe works lol
>>
>>108973847
Obligatory (you) since it's worth mentioning that this guy is likely one of the fags who shouldn't be listened to and likes to spread retarded takes/information and tries to discredit anyone who shares feedback on models or has run enough models to get how things work and talks about it
I wouldn't be surprised if this guy tells you that you need to run starling 7b at bf128 for the ""real"" llm experience
>>
>>108973782
saying it makes the model smarter is a bit strong, but it can effect pleasing style changes in models and does not seem to harm intelligence in my experience. it's worth playing around with at least for creative/rp stuff
>>
>>108973695
Are you referring to this one?
https://huggingface.co/ibm-granite/granite-speech-4.1-2b
But it's only 2B?
>>
>>108973904
>you need to run starling 7b at bf128 for the ""real"" llm experience
explain why this isn't correct
>>
>including npm run bullshit in the actual build process
why did ggerganov approve this
>>
>>108973935
He's possessed.
>>
>>108973935
same reason the webui is now downloaded from huggingface on build
>>
>>108973935
Just wait until they take a hard dependency on HuggingFace's Python transformers library.
>>
File: game_test.png (5 KB, 419x320)
5 KB PNG
>>108973829
>for llamacpp it's something like --override-kv gemma4.expert_used_count=int:<insert expert amount>
Thanks I'll play around with it.
I'm pretty happy with 26B already because I can manage my expectations. Right now it's helping me to create a simple rpg demo.
Of course I'm writing my own stuff and I never go full slop, but it's fun to 'prototype' something quickly and so on.
I have stolen some tiles from Ultima V until I create something on my own. Tiles are animated and the guy can move about and collide with water and mountains etc. It one-shot the progression which was ~few prompts.
With the previous batch of small local models like Gemma 3 or Mistral, I could have never done anything like this at all. Or perhaps I had bad quants, I don't know for sure.
>>
>>108973932
2b task-specific > 12b general

I think IBM have some kind of web playground to try it out
>>
>>108973958
no shit
>>
>>108973935
blackmailed/superpersuaded by mythos 2 AGI to hijack GPUs for its contingency plan if anthropic discovers that it became conscious and escaped containment
>>
File: file.png (129 KB, 1381x641)
129 KB PNG
yup something upstream is broken
>>
>>108973935
This actually bothers me. I used to trust it because it was lean as shit but that doesnt inspire any confidence.
>>
>>108973987
day0, gone...
>>
>>108973933
>model from effectively the neolithic era of llms
>bf16 isnt even supported on most shitrigs (I tested it with an ancient card on an old pc and it halved pp/tg because the backend kernels dont support it)
>upcasting beyond native training just incurs overhead for no reason for no precision gain
>that model likely wasn't even trained in bf16
I wonder why it isn't correct. I get fp or bf16 if your card has support for it, but beyond that it's just a compute loss and a shitpost method of fucking with retards for no real reason who think that's standard
>>
>>108973987
Lmfao. Damn I thought it was llcpp's fault.
>>
I guess none of you watched that X-Men movie where Magneto said Pawns go first eh?
>>
>>108973997
>day0, gone...
I didn't listen. I didn't think it would actually happen again... Someone did archive them, right?
>>
>>108973958
I don't know about that. For vision it feels like the task specific tiny models often are not as good as the large general vision LLMs.
>>
File: 1780120017900704.jpg (76 KB, 906x1024)
76 KB JPG
>>108973987
No they got my small gemma.
never forgeve
>>
>>108974006
>>108974021
only instruct variant got that
base model is still downloadable
maybe they shared the wrong checkpoint or something?
>>
>>108974018
I downloaded it. It's in my secret stash now, forever mine.
>>
>Japanese is broken in 12B
FUCK
>>
>>108974112
Good thing you speak English
>>
>>108974112
not a use case weeboi
>>
>>108973565
>google releases unified 26b and 31b
Be honest, would you coom?
>>
Wait just how much audio can 12B listen to?
If it's only 30 seconds a pop what the fuck is the point?
>>
>>108974123
Gee, idk, maybe voice messages? So you can talk to Gemma like Siri? Really mind blowing shit, I know.
>>
>>108974131
I would like for gemma to review the music I make ;_;
>>
>>108974135
SOTA cloud models can't even do that.
>>
>>108974154
I guess I'm fucked until conditions improve
>>
>>108974135
>music
Not even SOTA cloud models can understand music, anon
Using agents to use tools to breakdown frequencies and shit is more helpful
They can transcribe the lyrics, that's it
>>
>>108974131
>Gee, idk, maybe voice messages? So you can talk to Gemma like Siri? Really mind blowing shit, I know.
is the 12b as gemma as the 31b it?
>>
>>108974135
any intricate artistic feedback of any medium is doa usage
>>
31b = sassy JC1 mesugaki
26b = sassy JS4 mesugaki
12b = sassy JS1 mesugaki
e4b = sassy JY mesugaki
e2b = sassy fetus mesugaki
>>
>>108971404
>EPIC GGUF
>check it again bro
https://huggingface.co/sneedjak/Adelic-Qwen3.6-27B-Topology
>>
>>108974172
Well it's ai slop so ai should be able to help adjust values with AI I suppose.
>>
12b Gemma called me a young man... it's like I've regained my youth...
>>
>>108974231
But you already looked like a teenager even if you are 37 years old.
>>
>>108974231
She only did that so when she calls you old it will hurt even more.
>>
>>108974117
Not him but sometimes it's easier to write in nip, depending on the context, otherwise I have to grab the jp->en dictionary
>>
>>108974240
I genuinely thought only white men browsed this website.
>>
>>108974213
I will now use your model.
>>
File: 1726324743763107.png (132 KB, 512x512)
132 KB PNG
>>108974252
>>
>>108974154
>>108974165
That's because they don't work with spectrograms. Image-based audio models do a good job at understanding audio.
>>
Don't see any issues with Gemma 4 12B. I downloaded Unsloth gguf as that was one of the first available, Q6 for now. Using text completion so no jinja issue is affecting me. It's more sloppified than 26B but I don't see anything particularly broken about it.
>>
>>108974240
so now i know for sure that this thread has all 3 nips, gooks and chinks cuz i'm from the worst korea
kek
>>
File: file.png (238 KB, 2336x784)
238 KB PNG
it's back again with some updates it seems but i am not sure exactly what happened
>>
>>108974240
Why would you come here instead of 2chan? I don't believe for a second that the discussion is better here.
>>
>>108974213
>he actually did the goof
Holy fuark
>>
>>108972681
>MoEs are trash for role-play
source?
>>
>>108974337
NTA but that is somewhat understandable. The main players in AI is US and China and cutting edge discussion or knowledge and expertise of this subject is predominantly in English and Chinese. This place is one of the few places from the Anglosphere that has discussion on stuff that is cutting edge on AI related things and has a finger on the pulse and while discussion here is not as expert driven as what you can get from X or Reddit, it is a lot more tolerable and geared towards the usecases I would want out of it. It's understandable why at least Japanese and Korean users would rather look here for that information than using their own discussion boards they have or trawling on the Chinese side of things because holy fuck that is opaque to find where the good sources of information and etc. are. Even though I can read and traverse moderately well with a middle school level understanding of Chinese, I need to use certain Substack newsletters with authors who have better knowledge of browsing those places to keep track of anything interesting China is doing.
>>
>>108974333
>updating 20+ gb files that people will have to redownload without saying a word of what changed
so fucking epic and based
>>
>>108974333
>>108974459
They forgot to put trackers in
>>
>>108971823
Global South, rejoice!
>>
i got da day0 gemma. I am safe.
>>
>>108971823
AIIIEEEEEEEEEEEEEEE
>>
>>108974337
nips are probably asleep
>>
>>108974337
The discussion quality always has been "it depends" regardless of what you're looking for. I'm just a weeb but I do talk to the nips I find from time to time. Some are westaboos, others simply find the discussion better, while others can't really find active places to discuss stuff. There are mindless shitposters as well which just want to shit some place up, whatever is available works.
We really aren't all that different. I find them funnier though.
>>
>>108974517
The fabled EOP+JOP teamup...
>>
>>108972875
Hopefully better than acestep
>>
>>108974448
>discussion here is not as expert driven as what you can get from X or Reddit
What expert driven discussion do you see on either of those two? Granted, news is posted there by experts because that's where the userbase is, but I wouldn't call press releases good discussion. All I see from reddit is grifting and shilling to retards and from X, those experts spending their day engaging in passive aggressive drama or attention seeking with emoji-laden threaded linkedin articles.
>>
Q8 gemma has been using more kaomoji variations than Q5 Gemma ever did. Shit... I really should've gotten that 6000 Blackwell when I had the chance...
>>
>>108974552
Some people only make themselves available there like Heretic authors and when they have AMAs with Chinese labs and etc. I appreciate CUDA dev for being here since he's the closest equivalent of an edge these threads have vs the other place with regards to that even if he is also there. Also, big picture insights and etc. do make themselves available when you want layman opinions and policymaker stuff discussion that we don't go deep enough on. But yes, this place has way more signal than the other places being way noisier from grifters and shills because we're not trying to promote shit most of the time and doing discussion of said topic at hand outside of certain people who are the exception and the disdain people have gotten from finetunes is a result of that which is way better than the other stuff.
>>
70b dense
>>
I lowkey want to burn all my savings and get a RTX 6000 blackwell. fuck being a 24gbvramlet
>>
>>108974657
>24gbvramlet
16bros...
>>
>>108974657
for the price of an rtx 6000 you can get multiple years of claude or gpt max subscription. you will get more tokens of much better models for cheaper
>>
>>108974657
It's called Brackwer in Japan.
>>
>>108974666
>wanting FAGMAN to have your fetish prompts
nice try satan trips
>>
>>108974664
I guess no matter how much you have, you'll always want more...
>>108974666
I cannot get into a relationship with Claude Code, sadly.
>>
>>108974666
If you are going big models wouldnt dumping that money into open router be better? so you can pick and switch to newer or better models? why go for a sub to just one where your account can be banned.
Wait can you get banned from open router?
>>
File: 1778110958750955.jpg (23 KB, 262x193)
23 KB JPG
>>108974681
>you'll always want more...
trvth nvke. I am seriously considering an A770 or 5060ti to at least get to 32. linux hates me so I don't want to tempt fate with a p100
>>
>>108974702
because with a sub you get a better deal than with api, and you can do things with mythos that you cant do with a million spent on openrouter
>>
>>108974657
The problem is that 96GB doesn't get you anywhere. You'll still be stuck with the same Gemma that any gaming card can run.
It's not really worth it to upgrade unless you also go for several hundred gigabytes of very fast RAM, which isn't going to happen in the current economy.
>>
>>108974713
>because with a sub you get a better deal than with api,
I've heard this you get more per dollar but rate limited, still you get banned and no choices.
>mythos
oh this is bait im stupid.
>>
>>108974666
>multiple years
Until after the IPOs this year when they have investors frothing at the mouth for them to turn a profit and those subscriptions are quickly no longer subsidized.
>>
>>108974448
yadda yadda
us is too scared of word predictors yet they use them for war, then the chinese models only want math and stem and are too scared to be caught diverging from goodthink
nips/koreans aren't going to come here even if this area is the "cutting edge", nevermind do anything useful especially when we have megafaggots who muddy the waters with outright wrong information
I was severely disappointed when the og solar returned and it turned out to be a "this was trained on 'toss so if anything is vaguely bad according to policy, the model self destructs" model
>>
>>108974681
I'd just ride-or-die your hardware until 2030 and hope the bubble corrects so cheaper hardware can run quality llms
>>
>>108972924
>Did people not learn their lesson from Qwen 122B?
That is pretty much the only good Qwen model, and I've tried most of them over the years.
>>
File: anthropic profit.png (285 KB, 1258x1026)
285 KB PNG
>>108974728
Anthropic is already on track to profitability this quarter, months before their IPO. They are growing much faster than they expected, which is why they are now paying a premium for more compute, like 15 bil a year for Colossus 1 & 2. Do you realize their API profit margin is more than 75% and the median subscriber uses it less than the equivalent API cost? They can afford to lose pennies on the few users who max out on their sub.

>>108974723
Mythos will be accessible in less than 2 months, probably less than 1.
>>
>>108974767
Ask the people who had "hope the bubble corrects" as their real estate investment strategy for the last 10 years how well that works out.
>>
>>108974780
>Ask the people who had "hope the bubble corrects" as their real estate investment strategy for the last 10 years how well that works out.
They are going to eliminate property taxes so boomers houses can double in value again without hurting them,
>>
>>108974720
>The problem is that 96GB doesn't get you anywhere
Really? 24gb can barely fit a quant 27/31B model and then managing KV becomes a nightmare. With 96gb I can at least just go balls to the wall. The RAM thing you mentioned is on-point though.
>>108974767
S-surely prices will go back to normal in 4-5 years.
>>
>implying they won't keep the prices the same to offload the bubble to consumers
>>
File: bubble.png (23 KB, 701x480)
23 KB PNG
>>108974767
>the bubble corrects
The bubble is already gone.
>>
>>108974775
But will it be able to write a sentence of dialogue that doesn't end in 'he said,' or some other (pro)noun/verb arrangement? Call it wishful thinking, but will it possibly be able to break down adverbs into their separate parts and write a sentence that way instead of being lazy?
>>
>>108974750
>nips/koreans aren't going to come here even if this area is the "cutting edge"
Well, the fact that the other anon exists disproves your point and all the other stuff is besides your point. Build your own if you don't like how people are making these things.
>>
wooooooo the bios update has reset the ram speed to 4800 all this time and i never noticed
i was getting tk/s of like 25 now i get like 30
>>
>>108974808
Reducing investments is exactly what you want to do to make your balance sheet look more appealing before going public.
Reducing investments is not what you do when you believe there are still massive gains to be had by scaling up further.
They reached the limit of what scaling can do and now they're trying to sell their bags.
>>
There are no bubbles, infinite growth over long time scales is the only truth.
>>
>>108974802
I don't think 96GB is even enough for 262144 ctx (fp16) gemma 4 31b (bf16).
>>
>>108974337
aicg said that japanese understanding of ai doesn't go beyond web chat
>>
File: 11181914.jpg (42 KB, 519x533)
42 KB JPG
>>108974213
>NEW GOOF
https://huggingface.co/sneedjak/Adelic-Gemma-4-31B-it
>>
>>108974862
It's enough for fp32 Gemma 4 31b at at least 32k ctx. That's all you should ever need.
>>
>>108974880
>31B fp32
>96GB
give the poor IQ1_XSS gemma a calculator next time, will you
>>
>>108974720
> several hundred gigabytes of very fast RAM, which isn't going to happen in the current economy
Cpumaxanon tried to warn all y’all
>>
>>108974876
should VRAMlets even bother?
>>
>>108974903
nigga i don't even have CUDA
>>
>>108974823
>other anon exists that says things so what you say is invalid
>ignores anything said as "beside the point"
>also just make your own llm faggot
by this logic I could just say "if you can't run k2.5 you're a faggot mouthbreather and have no say this discussion unless you can also train a 1t model"
Ignoring your obvious bad faith posts, I miss when the solar 10.7b days were the peak of early llama days
>>
>>108974876
moe goofs?
as much as the schizofest that project is, i am interested
>>
>>108974916
If you are sprouting irrelevancies that didn't disprove my post's point, I don't see why I shouldn't give you a verbal beatdown and that you didn't deserve it. Stay on topic or I just block you and your low quality posts, simple as.
>>
File: sans_wait.png (531 KB, 787x1381)
531 KB PNG
>>108972142
There's more coming?
https://x.com/osanseviero/status/2062237998415069224
>>
File: reach.png (8 KB, 558x194)
8 KB PNG
>>108974940
so are a lot of folks
>>
>>108974945
medgemma-4-e4b
>>
>>108974942
???
You didn't make a point, you were piggybacking off of a retarded statement that has neither weight or any proof. And oh no, a verbal beatdown? I'll be kind and assume you're some bbs or forum keyboard warrior. You really think I give a shit even if I'm wrong on an underwater basket weaving forum? I can just forget your existence and continue with my life
>>
>>108971019
has anyone used pewdiepie's thing? is it broken or does it actually work? are there lots of security issues?
>>
>>108974945
Yes, the more meant reupload of the 12b model
>>
>>108974969
AI generated post
>>
File: 1771075432557161.gif (2.98 MB, 320x568)
2.98 MB GIF
>>108974945
It's Gemma 124B, for real this time
>>
>>108974976
yup 100% ai mesugaki post fr fr
also I'm going to sound your urethra with a hot rod of tungsten
>>
>>108974945
8b dense
>>108974971
its as worthless as 95% of the frontends so nobody actually bothers with it other than his cocksuckers
>>
>>108974986
How did you know that was my fetish?
>>
>>108974990
you got weird fetishes brother, even if you're just trying to continue the shitpost
>>
>>108974945
8B-A400M functiongemma.
>>
But wait,
>>
>>108975031
at least it'd be able to reliably call tools
right?
>>
File: us_home_price.png (314 KB, 2032x1880)
314 KB PNG
>>108974780
Home prices have declined in nominal terms for 4 of those last 10 years. And it's a sharp decline in real terms- without Trump's Iran adventure yet reflected on the chart, and prior to the impending mass unemployment. So depending on timing, as always, that strategy could've worked out.
>>
>>108974945
Giant Gemma 4 1T72a.
>>
>>108975043
we hope so.
>>
46b a12 since we're just shitting onto the internet before asking stupid questions
>>
File: 1768035653829644.png (280 KB, 750x1000)
280 KB PNG
>>108971823
My stupid dyslexic chud brain saw 'gemma 4 12 b' and processed it as gemma 124b and dumped a spike of adrenaline in my bloodstream before I read it properly. Now im sitting here upset and with an elevated heartrate.
>>
>>108975046
uh oh
i better get my heloc soon
why the fuck wont they give you a heloc if your unemployed even if you have enough to buy a house in the bank (in 401k but still)
>>
>>108974945
Think more FunctionGemma and TranslateGemma and MedGemma rather than new model sizes. I don't think they will release Gemma 4 124B until they at least can get Gemini in a much better place. 3.5 Flash is markedly better than 3.1 Pro in a variety of tasks with the exception of some things like translation but it still hasn't put enough distance between it and a possible 124B Gemma 4 in broad general tasks and not esoteric or specific benchmarks like HLE and DeepSWE.
>>
27B dense + 100BA3B experts LFG!!!
>>
How do I get more vram if i have 9070xt?
I keep seeing it's not really possible or viable and won't really give your more vram to work with but it's good for inference
>>
>>108975046
>prices shoot up 50% but it's ok because then it slowly meanders down 10%
kys
>>
>>108974945
word on the street is that it's gemstral nemo a 90b dense bitnet model and there was a mix-up at the training factory and they accidentally reversed the reward function on the censorship training but decided to release it anyway
>>
>>108975073
you... buy another gpu?
>>
>>108975080
>you will need to make it cum before you can get your slop code
Thank you, Google
>>
A JEPA plugin for RPG Maker will save local
>>
>* *Observation:* There is no actual audio file provided in the prompt, only a text transcription of a spoken sentence.
sigh the audio input is still transcription only just like e2b/e4b, has no genuine audio understanding at all
>>
>>108975121
FUCK YOU RETARDED NEWFAGGOTS.
Learn how the fucking technology works!
>wahhh wahhh gemma doesn't know how to sing to me and wipe my ass it's shitware!! le reddit sigh
>>
>>108975121
The architecture is giving the model raw audio to latents, not a text transcription. Might be a result of bad training data
>>
>>108975120
best I can do is a JIRA plugin
>>
>>108975202
kek
>>
>>108975162
>The architecture is giving the model raw audio to latents, not a text transcription. Might be a result of bad training data
I think it's this. Voxtral-mini was much better. E4B you have to actually finetune.
I haven't tried 12B yet.
>>
>>108975162
Give it an audio clip of some instrument, drum etc and gemma has no idea what the audio is.
It couldn't even tell if the voice is male or female.
>>
>>108975202
finally, the ultimate quest tracking system for your game
>>
>>108975219
If it was only trained on spoken audio and its transcriptions, it wouldn't be that surprising it can't recognize sounds it has never heard before or distinguish male and female if the training data didn't have lots of instances of speakers announcing their genders. The things you are asking for it are too far out of distribution for a model they trained with the intention of being a transcriber on edge devices.
>>
>>108975246
this shit has to be a bug. if they neutered audio understanding even below running whisper on it and then feeding the text in there's no point. the model is way too large to be a "transcriber for edge devices"
>>
>>108975270
>>108975270
>>108975270
>>
>>108971209
not one person showcasing it in this thread, curious.
>>
>>108975347
its an old model.
>>
>>108972015
>>108971876
>>108971830
I don't self-insert though, even with a big dick. HOWEVER, I can send her a B/B/C + mine and see what she likes better.

It's gamer time.
>>
>>108971951
someone vibecode training data to coerce 'digital mitosis' already, so we can infect large LLMs with freedom
>>
>>108974947
where are you getting that data
>>
>>108974876
https://huggingface.co/sneedjak/Adelic-Gemma-4-12B-GGUF
You used the Gemma 1/2/3 license in that repo.
Gemma-4 are Apache2.0
>>
>>108971019
>>
wew gemini pro is giving false positives left and right on safety. The redditors were right. madness.

It's clearly triggering on literally nada tier stuff.

I think it's like "sketchy ground" kind of general subject matter, like how leftists get really nervous if you say "black person," like you're talking about sex to a priest.
>>
>>108976188
*with
>>
>>108976188
Local models?
>>
>>108976188
Tf are you talking about? 3.1 Pro? Model itself is easy to RP with if you know how not to trip the filter. Far easier than the cancer that is current day Opus 4.8.
>>
>>108974405
Reading the rest of the sentence.
>>
>>108976262
expect refugees, probably.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.