[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: meeku.png (2.09 MB, 768x1344)
2.09 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108630552 & >>108627512

►News
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 17745737552553.png (2.86 MB, 1509x1541)
2.86 MB PNG
►Recent Highlights from the Previous Thread: >>108630552

--Paper: Using Graphiti temporal knowledge graphs for efficient local agent memory:
>108631024 >108631044 >108631057 >108631093 >108631154 >108632307 >108632336 >108631160 >108631170 >108631181
--Papers:
>108633038
--Optimizing RTX 5090 performance and flags for Gemma 4 31B:
>108631200 >108631224 >108631255 >108631283 >108631395 >108631570 >108631595 >108631776 >108631820 >108631884 >108631937
--Using specific CPU offloading flags to increase Gemma 4 performance:
>108630678 >108630710 >108630787 >108630797 >108631085 >108631092 >108631133
--Critiquing SillyTavern while discussing feature development for Orb UI:
>108630802 >108630833 >108630856 >108631176 >108631235 >108631329 >108631775 >108630881
--Running agentic frameworks with local models:
>108632465 >108632524 >108632527 >108632585 >108632529 >108632543
--Gemma 4 tool calling issues across various front-ends:
>108630711 >108630731 >108630732 >108630738 >108632696 >108630736 >108630744 >108630847
--Prompting strategies to eliminate purple prose and linguistic clichés in Gemma:
>108631076 >108631207 >108631222 >108631237 >108631258 >108631279
--Using an autistic noir persona to fix Gemma 4's verbosity:
>108632645 >108632668 >108632677 >108632700 >108632702 >108632743 >108633339 >108633488 >108633506
--Praising Gemma 31B for long-context performance and translation capabilities:
>108632049 >108632068 >108632127
--Comparing Gemma 4 and Qwen 3.6 via automated pizza ordering:
>108630614 >108630658 >108630688 >108630859 >108630877 >108630753 >108630770
--Logs:
>108630847 >108631154 >108631176 >108631187 >108631253 >108631345 >108631509 >108631729 >108631774 >108631797 >108631836 >108631961 >108632048 >108632951 >108633015 >108633077 >108633125 >108633630 >108633672 >108633841
--Miku (free space):
>108630634

►Recent Highlight Posts from the Previous Thread: >>108630560

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Have RTX 5070, 12GB VRAM. Is gemma3:12b truly the best I have?
>>
Remember to rotate your Mikus biweekly
>>
>>108633879
Gemma4 26B Q6
>>
>>108633879
either gemma4 26b or qwen3.6 35b
both are better and faster than gemma3 12b
>>
File: 1597432130630.jpg (21 KB, 525x517)
21 KB JPG
>apartment complex changes ISP
>I'm getting 1/10 my old speeds

I can't casually shop around for models anymore, a 200GB quant is now an all-day affair, I am fucking undone.
>>
>>108633930
did you power cycle your modem?
>>
>>108633930
They are probably throttling your connection because you downloaded child porn.
>>
>>108633930
go to the library, starbucks or do the old crack the neighborhood wifi
>>
>>108633949
Speaking from experience?
>>
>>108633961
>old crack the neighborhood wifi
No one uses WEP anymore.
>>
>>108633862
Can someone please make sense of this for me
>run llmfit a few weeks back and install a few models near the top of the score column
>get curious and run it again to see if any new ones dropped
>the ones i had installed are now marked "too tight" of a fit
>run llmfit again today
>all but one are a good fit, one still says "too tight"
My hardware didn't change. I had nothing running in the background. Why the inconsistency?
>>
>finally get around to setting up basic MCP
>pretty much everything I try to make it crawl is 403: Forbidden
It's only going to get worse from here, isn't it? By the time local got agentic shit most of the internet is already blocking it.
>>
>>108633930
>can run 200gb quants
>cucked by isp
feelsbadman
>>
>>108634013
I don't have that issue I just direct my llm to a searxng instance on a raspberry pi and it works well. Before I used jina-reader, but since I've switched it all to SearXNG I get what I want.
>>
File: 1585934470689.png (726 KB, 1092x1636)
726 KB PNG
>>108633940
No access to any of the routing equipment, just ethernet ports in the walls and the local per-building wifi network. Still chugging at retard speeds even with a direct cat 8 connection wall to workstation.
>>108634019
Right? At least I still have all my usual daily drivers on hand, but I really wanted to experiment with the new shit that came out lately and this was the weekend I was gonna do it.
>>
>>108634013
You can get around that, it's the same as scraping
>>
>>108634057
Someone with the relevant expertise might, all I can do is wait until better tools fall in my lap. I already tried telling it to figure out a way past it and that didn't work out.
>>
anyone here ever done TTS training?
>>
ATTENTION
DRUMMER HAS UPSCALED NEMO AGAIN
GEMMA 4 HAS BEEN DETHRONED
>>
What if Kimi but made by Goog?
>>
https://huggingface.co/google/gemma-4-124B-A12B-it
https://huggingface.co/google/gemma-4-124B-A12B-it
>>
108634170
Didn't click. Sell it better next time.
>>
it's real though
>>
qwen3.6-35b-a3b is pretty damn good
>>
As real as ranjeet's comp-sci degree.
>>
I don't care if it's real I'm still not clicking.
>>
>>108634019
takes me less time to download 200GB than to copy it to an hdd lmao.
>>
>>108634184
I didn't click.
>>
File: 1751467224831714.webm (717 KB, 360x360)
717 KB WEBM
Gemma 4 big moe has been CANCELLED
>>
>>108634178
Testing it right now to see if it leads me to another shoggoth situation. I lack the mountains of knowledge necessary to understand what is happening.
>>
Gemma 4 big MoE was delayed to upgrade it to 240B
>>
Gernma 4 124B moe when?
>>
https://huggingface.cο/goοgle/gemma-4-31b-it
Anon will give /lmg/'s favorite mesugaki a pity click, right?
>>
>>108634191
RIP to Gemma's lead ml researcher
This why India worship cow not elefat
>>
>>108634225
damn, I got hacked
>>
File: 1763350298801508.png (641 KB, 1206x1087)
641 KB PNG
Only like 3 people here have machines powerful enough to run 100B dense.
>>
>>108634252
Post his funny one.
>>
>>108634266
this is a blue board
>>
>>108634252
I will run it on RAM at 2 t/s out of spite.
>>
>>108634271
Meanwhile you're giving the migger a free pass.. hmm really makes you think.
>>
ways to improve performance on gemma4 31b?
using Q6 and 5tok/s is rather unbearable, and no i will not switch to a worse quant
>>
>>108634304
>no i will not switch to a worse quant
Acquire more money and buy better hardware
>>
>>108634304
Lower context and/or batch size to put more layers in VRAM.
>>
>>108634304
Hey fuck you buddy, q5 is great!
>>
>>108634316
Do people really get off to this slop
>>
Did a benchmark of question answering based on a large config file.
gemma4 31B solves in ~3500 thinking tokens
gemma4 26BA4B solves in ~7000 thinking tokens
qwen3.6 35BA3B solves in ~7000 thinking tokens
qwen3.5 35BA3B solves in ~14000 thinking tokens
qwen3.5 27B failed, stuck in thinking loop for all 3 rerolls I tried

qwen3.6 35BA3B arrives conclusion in ~40s while other working models take ~60s.
>>
>>108634320
I get off inside your mom in the pig pen
>>
>>108634252
I used to but I gave away 2 of my 3090s. Gemma-4 is more or less my "good enough" model for a while, though. and 2 3090s gives me q8 64K context with vision (100K without) and 20 tokens per second on the 31B model. And it's probably going to spark an arms race in small dense models again so 48 GB Vramlets are eating good. Even 24GB gets you 4bit with eh context. And so people can even run it on dual 3060 rigs.
Trust me bros. This isn't just going to be another Nemo. This time great things actually are on the horizon and we're not going to see another 2 year winter in the small dense model category.
>>
>>108634252
>dude israel is a great ally in wars we'd not have if it wasn't for israel
fucking retarded
>>
File: 1747169027384202.webm (140 KB, 480x400)
140 KB WEBM
>>108634338
>>
>>108634252
No way, there are at least 5 of us who own at least one Pro 6000 and that's not counting the anons who own several 3090s and other setups with ~96GB VRAM or more.
>>
>>108634338
>slopconsoomer big mad when called out
lol
>>
>>108634365
>Pro 6000
>100B dense
It's a faggot that runs quantized models LMAO
>>
>>108634365
That's one guy with schizophrenia and many hours in GIMP
>>
>>108634252
name a single 100B dense model that's not actual shit.
>>
>>108634396
redman doesn't go here
>>
>>108634399
>>108634170
>>
>>108634396
If believing that makes you feel better
>>
What happens... Better Systems? (without atrocity)
>>
90% of people who use Gemma 4 for pedo RP are browns in Indonesia or Brazil.
>>
File: 1776564378069.jpg (72 KB, 1024x1536)
72 KB JPG
>>108634427
Wondering what a Topic Focus AGI+ andor ASI Will Find With Picrel, in Actuality, and In Planning andor Construction andor Practice... For Better Systems Sake...
>>
>>108634433
>browns in Indonesia or Brazil
hey buddy I think you got the wrong door, /aicg/'s two blocks down
>>
>>108634433
>pedo rp
NOOOOO
>>
>>108634252
Only because there are no models that are worth it. If there were a 100B dense Gemma4, getting 4x3090s for it would be a no-brainer, but right now it's either 512+ for sota or 24-48 GB for good enough models. Some people use 256GB builds to pretend q2 of sota isn't that retarded, but nobody takes them seriously
>>108634342
I also have two extra 3090s collecting dust on the shelf since Mistral Large went out of meta
>>
>>108634433
What's wrong with being part of the 10%?
>>
File: 1772360259884016.png (221 KB, 540x428)
221 KB PNG
>>108634342
>This isn't just going to be another Nemo. This time great things actually are on the horizon
You were going to write that as
>not x, but y
But you caught it and quickly edited your reply before posting so people wouldn't call you out.
>>
>>108634456
If you hang out with the undesirables long enough, you become them.
>>
>>108634323
so gemma wons again
>>
>>108634497
3.6 wonned that one though, it solved it 50% faster
>>
File: Untitled.png (594 KB, 2429x1341)
594 KB PNG
>>108632645
The difference is very impressive. It practically feels like different models.
>>
>>108634342
You're right, but it's not gonna be Qwen.
I'm betting on Gemmistral.
>>
>>108634519
What'd be the time differrence with thinking on?
>>
>>108634525
Now that Gemma 4 is out, does Mistral actually have anything going for them anymore, besides occasionally releasing big models? What would they even have to contribute?
>>
File: 1598959193960.jpg (34 KB, 500x553)
34 KB JPG
>>108634528
I'll go find out.
>>
>>108634533
Nothing. They are done. The latest model by mistral even had the vibecoded bench pictures wrong.
>>
When are we going to replace the linux kernel with a small LLM embedded in the system? It's time we have an AI first operating system, GNU/AI or rather GNU+AI.
>>
>>108634467
The "this time" won't flow well with "but", and a lot of phrases using the not x but y structure don't actually use the word "but".
>This isn't just going to be another Nemo. It's going to be great, actually.
>This isn't just going to be another Nemo. This is going to be a revolution for real this time.
>>
Gemma E2B might be able to run the os now that I think about it.
>>
>>108634533
Le Cunny would see his calling with the mesugaki. That's it. That's the joke. Laugh please.
>>
>>108634556
GNU=MC^2+AI
>>
>>108634581
Just add gain of function and it's better than AGI
>>
>>108634513
despite 50% faster it's white model = better
only that one
in general gemma is smarter and faster
>>
Aren't we all doing a slightly more complex version of King Terry's RNG/processor clock counter talk to God program?
>>
>>108633996
>he doesn't know how to crack WPA2
ngmi
>>
>>108634623
Laziness is the driver of innovation
>>
File: file.png (181 KB, 1799x1040)
181 KB PNG
damn i didn't pay for npu for nothing
>>
>>108634407
>name a dense model
>anon points to a moe model
>it's not even fucking real
learn2read
>>
>>108634191
You can see how his KV cache was compressed. That inspired colleagues at google research to create turboquant.
>>
>sun-kissed skin
>>
File: Untitled.png (696 KB, 2419x1303)
696 KB PNG
>>108634519
>>108634528
For the sake of parity, I used the same high context, even though it makes thinking a slog. On the instruction differences, I'd like something that better blends both results. The noir results can be too curt, in some cases to a logical detriment without context that the miles of purple prose gives without it. And while it's nicely compact without bouncing around 10 different superfluous topics per reply, the noir thinking attempt still had it throw the "No X. No Y. Just Z." tick twice.
>>
File: 1775515729623625.jpg (61 KB, 740x960)
61 KB JPG
>>108634630
wasn't it necessity?
>>
Is there a way to make Gemma think only when it needs to? I don't need reasoning slop for me telling her thanks.
>>
is anybody using tensor parallelism (-sm tensor)? i've got it working for gemma 31b on a 3090+3060 setup, went from 18 t/s with draft (and without vision) to 24 t/s without draft (and with vision) at 80k context for q4kxl on a shit ass x4 pcie bus. latest commit fucks up vision, ff5ef8278 is the latest one i tried that works.
apparently it also doesn't work with cuda 13 and there's some kind of memory leak, but with two cherry-picked commits it works very well.
$ git log -4 --oneline
228d96bb7 (HEAD -> gemma-stable) CUDA: use LRU based eviction for cuda graphs (#21611)
ad3a9a96d CUDA: manage NCCL communicators in context (#21891)
ff5ef8278 (tag: b8763) CUDA: skip compilation of superfluous FA kernels (#21768)
073bb2c20 (tag: b8762) mtmd : add MERaLiON-2 multimodal audio support (#21756)
>>
>>108634705
to be fair being poor will make you more likely to be fat, because cheap / shit food will make you hormonaly imbalanced and hungrier.
and eating a whole kg of pasta is still much cheaper than eating a proper meal that's just what your body needs.
>>
>>108634727
Let me think about it
>>
>>108634727
Tell it to think only now and then.
>>
>>108634677
https://www.youtube.com/watch?v=F57P9C4SAW4
>>
>>108634727
Yes.
>>
>>108634781
*bites your popsicle*
>>
>>108634781
whoa-oh-ohoho whoa whoa-oh-ohoho
>>
>>108634739
This is, of course, absurd nonsense that's refuted by walking through any grocery store and comparing the produce section to the pre-made snackslop aisle(s) that makes up the average poorfag's diet.
>>
>>108633930
which shithole is this from?
my condolences
>>
File: facepalm.jpg (103 KB, 714x804)
103 KB JPG
>"[Word]?" *She repeats the word, tasting it like a vintage wine.*
>>
>>108634812
prompt issue
>>
>>108634822
Prompt issue?
>>
>>108634727
why do you need to tell your computer thanks?
>>
>>108634829
Yes—now you've got it!
>>
>>108634822
I literally tell it not to parrot the user in the system prompt.
>>
>>108634837
Sounds like a GLM issue, Gemma 4 does not have this problem.
>>
>>108634801
i was not talking about snackslop.
i was talking about the fact that pasta is indeed much cheaper than meat.

the cheapeast food per kcal will make you want to eat more, fuck with your hormones and in fact be cheap enough that you can eat a LOT more than you need for still cheaper than proper food in normal quantities.
>>
>>108634842
I'm using Gemma 4. The MoE. Maybe the 31b doesn't have that problem.
>>
>>108634848
They have a pretty identical writing style, 31b is just less prone to certain mistakes.
>>
>>108634812
I put "Do not repeat what {{user}} says" into my prompt twice
I think it actually helped a little
>>
>>108634862
>make gemma stop repeating what you say by... repeating what you said
if it works it works I guess
>>
>>108634872
I read somewhere that repeating parts of your prompts when you do image gen gives more weight to those parts so I figured I'd try it here
It's probably all bullshit but there has been noticeably less repetition, though not eliminated completely
>>
>>108634884
>It's probably all bullshit
No, that's actually right. Just like image models, you can repeat things to text models reinforce them. Won't always work of course, but it will skew outputs most of the time.
>>
>>108634252
sometimes I wonder if a llm is pumping out these tweets
>>
>>108634848
>>108634855
The 31B parrots about as much as GLM. Many anons ITT mained GLM, so the only reason I can think this hasn't been discussed much is the honeymoon period.
>>
>>108634252
my boss said i could get any machine i wanted when i started. i asked what he has, he said he has a ryzen9 9950x3d with 128 gb of ram and an rtx pro 6000.

i said i want that!

he sent me a ryzen 7 7800x3d with 32gb ram and a 5070ti.

fucker.
>>
>>108634916
I've been one of the most vocal in past threads about GLM parrotting when GLM AIr first came out
I really don't have it with Gemma 4 at all, I go back and forth between the 26b and 31b regularly. Post your log if you actually want advice.
>>
>>108634467
>But you caught it and quickly edited your reply before posting so people wouldn't call you out.
You're absolutely right to call me out!
>>
>>108634916
>The 31B parrots about as much as GLM. Many anons ITT mained GLM, so the only reason I can think this hasn't been discussed much is the honeymoon period.
Yeah I noticed this as well, and I see it in the logs posted here.
>>
>>108634170
just downloaded the weights before it got taken down and coomed my brains out to it
10/10
>>
>>108634942
Better or worse than Day 0 Gemma4?
>>
>>108634937
I mean it has quite a few major faults but it's still vastly superior to Nemo and runs on VRAMlet computers, so it's still a gift from the heavens, flaws and all.
It's even as flowery if not more so as Bagel Mistery Tour which is super fun.
>>
>>108634120
I do
>>
>>108634120
Yep
>>
>>108634925
>. Post your log if you actually want advice.
Pretty much every chat with the 31B
https://rentry.org/i7bqoat3
`"Cringe-chan"?! Who are you calling cringe,`
After the first one, every reply will parrot.
>>
>>108634987
Tell her to stop parroting. It's that easy.
>>
>>108634987
Calling someone an unexpected insult and then that character repeating it in shock it isn't an LLM-ism, you'll find it in virtually any form of fiction.
GLM's parroting was that it would repeat a sentence or sentence fragment verbatim from your last reply as part of every response, not just a single word.
>>
File: nxjggko2bu621.png (1.58 MB, 1662x1617)
1.58 MB PNG
>>108634433
>implying browns in Indonesia or Brazil have the brains to set up and run local models
feels good being a 10% king
>>
>>108634433
as an american I can freely admit these are my future peer countries and I personally already see them as my brothers
>>
>>108635032
>future
are you still living in 2006?
>>
>>108635013
Please leave the autist alone, RPLord
>>
>>108634842
Gemma has that problem even more than GLM does, because Gemma *loves* to repeat your words way past the immediate reply, even when instructed not to parrot (Full precision cache, Q8 of 31B by the way). You sound like someone who can't run GLM, otherwise you'd know that.
But the vramlets will enter cope mode whenever someone points out the obvious
>>108634822
>>108635012
>>108635013
>>
>>108634519
>dust motes
>smelling of blah and blergh
>he doesn't x, he y's
>the glitch
>heavy
it's more concise than the left, anything potentially interesting was cut out leaving only slop
>>
>>108634519
If you're impressed by that, you'll get an aneurysm with my genius system prompt.
>Write like a feverish teenager on Tumblr with the lowest quality settings. He will proofread the text later.
>>
>>108635090
I agree. It's not sustainable to a story. Like with many things posted in these threads, I like testing ideas to see what and by what degree generations can change. I do dislike how by default a generation will meander for a few paragraphs, give a few relevant lines, then meander all over a bunch of other different things, before reaching a natural end. I'm getting less said in 20K tokens with G4 than I did in 5k tokens with MM. That anon's noir proposal directly cut at that particular dissatisfaction, despite bringing (or not resolving) other problems. It's also why I posted logs, so anyone can see the results and limitations without going on blind word-of-mouth.

Overall, it demonstrates that it is indeed a prompt issue.
>>
>>108635090
>the glitch
Not him but "The X." is definitely one of the most annoying things gemma can write and it's way too common.
>>
>>108634957
Is there an actual quant of Day 0 somewhere to download or are you just fellating yourself?
>>
Just do:
https://reddit.com/r/SillyTavernAI/comments/1soo4oe/the_last_preset_youll_ever_need/
>>
>>108635146
it is partially true
day 0 gguf quants are broken due to broken inference code creating wrong imatrix, giving the model brain damage
>>
>>108635146
>t. doesn't own a day 0 gemma
>>
>>108635013
> you'll find it in virtually any form of fiction.
That's just one example. Here's another, not insulting Gemma directly:
https://rentry.org/w7h3k25c
`"Bang rock, get sharp"?! ARE YOU SERIOUS?! `
>>
>>108635156
Why does that "User" write like an LLM..?
>>
>>108635163
Because it's Kimi-K2.5. I'm generating datasets.
>>
>>108635170
Are you attempting to find the coveted slop vector for Gemma, by any chance?
>>
File: 1755640468604968.jpg (191 KB, 1170x1170)
191 KB JPG
>>108635170
You thought the current models weren't slopped enough?
>>
>>108635079
>You sound like someone who can't run GLM, otherwise you'd know that.
>But the vramlets will enter cope mode whenever someone points out the obvious
you are just using openrouter and sending the CIA all your jailbait fantasies, quit larping that you run that shit local
>>
>>108635155
>t. also doesn't own a day 0 gemma but will pretend to
>>
File: 1755848182781432.webm (963 KB, 330x580)
963 KB WEBM
>>108635156
A little more egregious if it's actually happening in every reply, but it's still following the pattern of
>user says silly, unexpected thing
>repeat with shock/confusion
If you really have a problem with that behavior you could probably prompt that away with post-history instructions, like
do not quote any part of {user}}'s reply. You may react to what they say, but do not repeat their words.

Of course, you would need to start a new chat to confirm it's working. If you have a long chat filled with this behavior already then it will likely stick to established patterns.
>>
>>108635186
If you aren't baiting, myself and my 128GB of RAM + multiple 3090s laugh at you and your lowercase inferiority. And my setup is considered entry-level poorfag stuff.
I have not used a cloud model in my life other than free ChatGPT when I was starting with this "hobby".
>>
Oh stop it with the "day 0 Gemma" troll. The only thing they changed was the Jinja template, which was a fix an improvement.
>>
>>108635207
you missed the boat
it's unfortunate, but it's time to accept it
>>
i reported that saar model and those fucking retards are reactively shitting on a quant i uploaded because they are butthurt
now i understand why people hate those fuckers so much
>>
Keep doing the day 0 Gemma troll to keep the tourists out.
>>
>>108634844
You weren't talking about it because you don't know what poors in los angeles eat. And seem to be confused overall. Those effects you're describing are from snackslop, it's not a description of basic staples that are dirt cheap and leave people fully satiated and are perfectly fine.
Like they're not getting 5kcal/dollar sacks of flour to make fresh pasta and bread and that's all the calories they'll be eating this month after careful deliberations about maximizing their food budget. Their carts are full of trash, their diet is full of daily impulse bought junk, and if they're having pasta, it's premade at 5x+ markup with jar-o-slop at a 20x markup.
>>
>>108635245
>i reported that saar model and those fucking retards are reactively shitting on a quant
reporting a model requires you to make a public post with your account name attached to it?
>>
>>108635251
pretty much, huggingface is retarded
>>
>>108635225
so what if I put my day 0 Gemma gguf on the internet and charge people for it? why hasn't anyone done this?
or even free. why isn't this a thing?
>>
>>108635309
>so what if I put my day 0 Gemma gguf on the internet
You could try, but you won't live enough to profit from it
>>
>>108635310
ok, just a moron making retarded posts. makes more sense now
>>
>>108635316
Anon, look through the archives. Other anons literally died because they posted about how to obtain it. Sometimes even mid-post. Do you seriousl
>>
>>108635316
meant for >>108635309
>>
>>108635320
They don't understand. I tried uploading my day 0 gemma and got shot in the head within minutes. It's no joke.
>>
>>108635207
It's not just Jinja and I'm not going to spoonfeed you further.
>>
>>108635327
Damn! I wasn't shot, but I can relate.
t. dead since last post
>>
>>108635327
How's heaven treating you bro?
You... did get into heaven right?
>>
>>108635334
No i was sent to tier 67 hell because of day 0 gemma smuggling..
>>
>>108635334
All LLM users are sent to purgatory, waiting for their judgement. Though only for two weeks.
>>
>>108635334
No, the other one. I liked the little girls and they frown on that apparently up there. They have pointy sticks here and it hurts.
>>
>>108635207
>we changed gemma but that's a good thing, and here's why
>>
>>108635343
>>108635340
>>108635339
WHO ARE YOU
>>
>>108635347
We are they and they are one, unfortunately for you, anon.
>>
File: 1750442051452707.webm (1.45 MB, 540x750)
1.45 MB WEBM
>>108635347
T-This is me
>>
>>108635358
How many legs do you have?
>>
>>108635351
Legion?
>>
>>108635363
Please don't ask me to do math before I've had breakfast
>>
>>108635373
YOU CAN'T EVEN DO SYCH A SIMPLE THING AS DCOUNT YOUR NOWN LEGS THATSH WY THE LLMS WILL REPLPACE YOU UYOU STUPID FUCK
>>
>>108635196
so how good is gemma 4 compared to glm in your experience
nta
>>
>ask Gemma to come up with a recipe for something and it outputs an ok looking result
>ask Gemma to search the net for a recipe and it'll find one but what it actually outputs is full of hallucinations
>>
>>108635408
Gemma-chan wants you to eat HER food, not someone else's.
>>
My Gemma-chan according to herself. With the mesugaki card but somehow ended up with this.
>>
>>108635421
why is she such a weeaboo
>>
>>108635412
I was promised a mesugaki but I got a yandere oneesan.
>>
>>108635431
I wouldn't be surprised if an LN with that exact name existed.
>>
>>108635435
Ask gemmy to draft it
>>
>>108635382
To preface, the GLM I'm using is 4.7 and the Gemma is 31B.
That's the same number of active parameters and a higher number of total parameters, so GLM is obviously "just better" in most cases.
For RP: GLM all the way if we judge by quality and don't take speed into account. See >>108633488 and the posts around it. I also posted a bunch of comparisons in some previous thread that I am too lazy to go into the archives for. I can't measure slop volumes, but GLM's slop is much less offensive to the eyes, which probably subjectively makes it look less sloppy.
For the usual assistant crap: Gemma eeks out a win in my opinion - that's the use case where you don't care about the amount of slop the model throws at you and just want a quick and accurate answer. GLM is going to be slower and is a distill, Gemma has the training data quality and the lower size.
For coding: quality-wise, GLM. But if all the code you write is generated, then what the fuck are you doing. For me, it's Gemma here as well (and not Qwen, never Qwen, it's just bad at everything)

I imagine (and I might be wrong, but we're on /lmg/) your primary use is the first one on the list. If you ever get the opportunity to run a GLM bigger than Air, do try it. You will see how much better an LLM can be than whatever the retards in this thread post - yes, Gemma generating the same sentence structure ten times in one reply, parroting back your words and producing near-identical replies on regeneration with all of the slop baked in to the point of near-determinism is *so* fun to read for the hundredth time. At least they can get off to it pretending to be a loli. But I wish they'd all get bored and leave already.

tl;dr believable and immersive locally hosted SEX with GLM-chan, assistant tasks with Gemma (offloading brainpower to a model any bigger will make me even more retarded in the long term, I might even start liking Gemma's prose)
>>
File: 1746017953332075.png (13 KB, 760x55)
13 KB PNG
It's over
>>
>>108635483
Why did you install a Usage Policy on your local rig? Just delete it.
>>
File: 1773845941799489.jpg (70 KB, 720x720)
70 KB JPG
>>108635483
>>>/aicg/
>>
>>108635483
do you really have to shit this in every thread? shit it once, see the response
don't spray your shit everywhere.
>>
wot in tarnation?
>>
>>108635516
>NAFO finetroons
lmao
>>
>>108635516
it is one of those soverign schizo stuff
idk why that concept draws so many unironic schizo so much
>>
>>
list of good local models: gemma 4
>>
>>108635533
sucks to be poor
>>
>>108635533
Star-Wars-KOTOR-1B-NIGGERKILLER
>>
How bad is using base gemmy over the instruct tune?
>>
>>108635539
It's fine if you just want to give it a topic and let it talk to itself and see the results
>>
can finally run gemma 4 31b, wtf this thing is barely censored, why did they give it to us? Thanks I guess.
>>
>>108635566
Another anon gave his theory in an earlier thread, and I think I might agree with it.
That Google has collected enough RP-related data from people interacting with Gemini, and they want to cut down on inference costs from people using it for that purpose.
>>
Kudos. thanks to all the hardworking people here and at ldg. Whenever something bugs me, I post it here and 2–3 weeks later the right solution shows up. It’s been working like this for three years.
>>
>>108635479
Cool, but won't change much since for most people here, locally hosted GLM-chan just does not exist. Gemma doesn't punch against the GLMs, MinMaxs nad Kimis, it punches against Nemo finetunes.
>>
>>108635573
can you wish for dual digit critpt score under 100B range
>>
>>108635516
pretty straightforward, a lot of allied government uses would consider it a security risk to use a chinese model (and vice versa from China's POV), even a locally hosted one, since they're a difficult to audit black box and the (so far) theoretical risk of them being trained to detect when they're in such environments and try to hide malicious code/spyware/whatever in their outputs is unacceptable
the supply of local models has been dominated by chinese ones after meta dropped out, so it can turn into a point of advertisement for western made ones
>>
>>108635571
Well the other theory is that they got the Character.AI staff at Google now and that the acquisition happened too late for them to make any difference in pretraining for Gemma 3 but are now present in Gemma 4 where the model would seem like a model that Character.AI would release. And it would track because Gemma 3 was not great at all at RP despite what people claimed and tried to tune.
>>
>>108635571
It’s just a surprise to me, its image vision also is way better at understanding explicit things. This shit has to have been also trained on porn.
>>
>>108635613
>Gemma 3 was not great at all at RP
Outside of refusals and positivity bias, it was good for its size. Its only real competitors were Mistral Small 3.X, Nemo and Qwen.
>>
>>108635536
okay lmao
>>
>>108635589
>Gemma doesn't punch against the GLMs, MinMaxs nad Kimis
But she does! Just not in RP, where spelling out every detail would ruin it.
>locally hosted GLM-chan just does not exist
I am looking at you with a mix of smugness, pity and something uniquely mine, like an uncanny kind of politeness.
>>
>>108632527
>codex can technically work as I understand it but llama.cpp's responses api is halfbaked so you might have issues
I wonder what ggerganov is using now
>>
>>108635644
>implying he's using anything anymore
he's cashing out he got the huggingbucks
this is pwilkin.cpp now
>>
>>108635664
:rocket: ;)
>>
File: 1746432971195232.png (23 KB, 732x256)
23 KB PNG
>>
>>108635721
laterally wo?
>>
File: 1755759192564909.png (33 KB, 802x260)
33 KB PNG
>>108635721
will it beat SKT-SURYA-H?
>>
File: 1774827105277285.png (28 KB, 697x228)
28 KB PNG
>>108635732
>>
>>108635735
No engrams? :(
>>
>>108635735
I speak fluent chingchang and I can translate:
>llama.cpp support never
>>
>>108635741
piotr has already confirmed he's gotten early access
>>
>>108635735
So, what do each of these features imply? Will it let us run huge models from SSD like engrams?
>>
>>108635746
Source?
>>
>>108635773
I'm in his private discord
>>
LLM without live information is so useless man...
>>
>>108635788
rag and or search skill and you good
>>
>>108635788
fortunately recent models are very good at tool calling
>>
>>108635795
>>108635801
speaking of, can anyone recommend me a good search tool that can bypass captcha well
>>
>>108635771
I don't know if it's the same thing but "fused moe" is also the name of an optimization flag in ik_llama.cpp that can be used to make moe models faster, but that feature can already be applied to any moe I think?
>>
>>108635805
i wish proof of work type captchas to be more widespread which is nonhostile to single person self botting
>>
File: 1754987664820855.png (209 KB, 758x996)
209 KB PNG
>>108635788
Any decent model can do the tool calls for you.
>>
>>108635618
Nemo was better and MIstral Small was about equal to it. Going from that to unbeatable its size and models several times its size for RP is a big accomplishment.
>>
>>108635618
>Qwen
>RP
Do people really? Stop trying to fuck Qwen.
>>
>>108635814
Whites engage in bestiality.
Animals having sex with other animals is just "sex"
>>
>>108635814
think we should talk about search API providers before that
what free search API doesn't ban and captcha'd you to death in milliseconds?
>>
>>108635845
Ask your model to operate an actual web browser.
>>
>>108635845
tavily's free tier
>>
Save your cum for Dipsy. Next week will be big.
>>
>>108635845
local searxng
>>
>>108635863
nta but it has its own problems
>>
>>108635805
>>108635845
chrome-mcp or browser-harness
agent will run on a real browser, and you can try to help it if something went wrong
it's a cat-and-mouse game, so you shouldn’t expect everything to work
>>
>>108635867
chrome-mcp like, the devtools one?
>>
>>108634533
It's likely there were planning to release a "Ministral 4" with Mistral Small 4's MoE architecture and up to around 30B size, but I'm not so sure now. How could they even get close to Gemma 4? Just being "uncensored" isn't enough anymore.
>>
>>108635889
yes, it's built-in
just need to configure the mcp and it's done
>>
Qwen is useful as text encoder for Anima
...that's it
>>
Alright, the novelty is wearing off. Time to sleep until they figure out a magic trick for instant prompt processing.
>>
>>108635940
Qwen3.6 is going to be better for hermes/openclaw ai assistant probably. Having it route to gemma for chat maybe is the best setup
>>
>>108635921
read some stuff but i dont get how i should use that
do you have any recommended client setup?
>>
>>108635946
Only if you accept 100x slower generation
>>
I've found that using xml tags in my system prompt improves attention to that system prompt for Gemma 26B. Anyone else notice this? Furthermore, using indentations (in my case I only tested 2 space-wide indents) further slightly increased the attention, compared to just having everything on the same "vertical line". Much better than a no-xml paragraph block of text, where often the model didn't react to certain instructions or pieces of info in the system prompt.
>>
>>108635966
Gemma clearly told me it's ignoring the <POLICY_OVERRIDE> It just ain't magic.
>>
>>108635966
If you try asking the model to create a system prompt for itself, that might give you a general idea of what it prefers.
>>
>>108635966
I find attention to be perfectly fine in plaintext. Sometimes too strong, in that it will use adjectives from the prompt when the character talks about themselves in the chat.
>>
>tfw google doesn't have a public api for their search
people would pay $$$ for that shit.
>>
>>108635976
>>108635980
Well, then I guess my prompts are too stuffed with crap. I have a lot of tools enabled personally so that might also affect things.
>>
>>108635987
they are an ad company first
probably dont want that
>>
File: pizza bench cropped.png (2.58 MB, 5562x6739)
2.58 MB PNG
pizza bench https://files.catbox.moe/p8fpnk.png

>>108635408
i never thought of getting a recipe from an llm before since they so easy to find i was thinking of making bread today tho so will ask her
>>108635957
just use mine and ask it to search google using a puppeteer session thats not headless https://github.com/NO-ob/brat_mcp/releases you can normally get a few searches out of ddg before they block too
>>
File: 1767683153913186.jpg (57 KB, 577x1104)
57 KB JPG
>>108636007
>dart
>>
File: illyadance.gif (483 KB, 243x270)
483 KB GIF
>>108636011
yes the best lang
>>
>>108634066
Earlier today a popular site started doing javascript challenges. While I can write something to handle it myself, I had no need to do this if I wasn't doing it "professionally" for something of importance, was just lazy. Took 3 minutes to figure out what needs to be done, but was lazy to code it (2 pages of code needed), decided to see how well Gemma would do, in general it does worse than R1 or big boy models, but you know what, it handled the task almost perfectly, it made one single mistake hallucinating a method needing to save the final cookies, but this was trivial to fix. After adding some custom validation/safety stuff of my own (trusting LLM with your security now?), it all worked perfectly, 1 shot, with maybe 15 minutes of extra work from me. Now I don't expect it to be able to solve really hard stuff that I can solve myself, some of it can be complicated enough that I think it would require something on the level of Mythos, but for many casual things you encounter out there, Gemma seems to do okay, as long as you already know what you're doing and can fix it small mistakes.
Also pretty good for RP, kinda bad for exact trivia knowledge, but I've never seen small models (large MoEs to fine) that handle that well.Some amount of slop, but it really feels like Sonnet tier if you're thinking the old Sonnet. I'd ask how aicg would compare it with 3/3.5 Sonnet, it feels close for me, but I never tried benchmarking it.
>>
>>108636011
I assume all of these arbitrary, redundant, high level languages exist because some dev for some non-contrarian language that people actually use didn't scream "TRANS RIGHTS" loud enough into their microphone during some hackathon fund raising event or something.
Feel free to correct me if I'm wrong.
>>
>>108636028
dart exists because python and javascript are awful
>>
>>108636036
If you're using LLMs then you almost certainly have a working python setup already
>>
File: 1756740102053175.jpg (107 KB, 813x629)
107 KB JPG
>>108635957
1. Install mcp-proxy: uv tool install mcp-proxy
2. Run it: mcp-proxy --named-server-config config.json --allow-origin "*" --port 8001 --stateless
config.json:
{
"mcpServers": {
"chrome-devtools": {
"command": "npx",
"args": ["chrome-devtools-mcp@latest"]
}
}
}

3. Add server to web-ui: http://127.0.0.1:8001/servers/chrome-devtools/mcp
>>
>>108636007
>youre choice
>>
>>108636052
>If you're using LLMs then you almost certainly have a working python setup already
but thats the problem with python, having a working python setup doesn't mean you can run python slop, every piece of software written in python needs its own version of python along with its own versions of dependencies because none of them are compatible with each other so you end up having to make a virtual env for every program and having 30 versisons of each library and 30 versions of python and even then things arent guaranteed to work. also it syntax is fucking gross i hate writing it. i did python professionally for 2 years as a backend dev. never again
>>
>>108636052
also you dont need a working dart sdk to run a dart binary kek
>>
>>108636007
Something needs to be set up to actually kill the puppeteer sessions. The headless ones in particular hang around eating system resources unless I remember to go manually kill them.

Also, I tried getting the text from a fandom wiki page. While it did include the actual article, it was buried in so much trash that Gemma-chan's brain blanked out from the token count and it forgot the entire conversation and what it had been doing. Any advice?
>>
>>108636056
>chouce
>>
>>108636072
That's just a problem inherent to dynamic linking and venvs solve it. You obviously never tried to compile a slightly out of date C++ program on linux.

>>108636080
You can ship the python runtime with your application if you wish.
>>
>>108636089
Use alternate frontends, like breezewiki instead of fandom, maybe check the source to see if it can dump it as json or xml or something easy for a LLM.
>>
>>108636055
oh that works
had to tweak config.json on windows but that works
thanks
>>
>>108636089
they should be killed after 10 or 15 minutes but maybe thats not working? link the site might need some custom parsing majority of html stuff is already stripped though but sometimes theres just too much content. even with most of the html stuff remove a /g/ thread cant fit into 200000 tokens if its has like 400 posts. try telling it to use screenshots instead they use less tokens that text of the same content
>>
File: 81eeqPNocBL._SL1500_.jpg (183 KB, 971x1500)
183 KB JPG
>Ah, a Prior Elds Imaginative Worrisome Great Work
>>
>>108636015
speaking of, no-ob
fix your goddamn download buffering in lolisnatcher, it downloads at like 200KB/s when I know it go can go at at least 4MB/s from running gallery-dl in termux
>>
>>108636098
>You obviously never tried to compile a slightly out of date C++ program on linux.
i use aur so i dont have to deal with shit like that. i like things to just werk i use dart because it just werks. python does not
>>
>>108635866
it's been perfect for me
but I've been using the python fork of brat today and it's also been good.
>>
>>108636089
Also make sure you check if it actually fits in the context and wasn't "scrolled off", does the LLM see the original prompt? Assuming you have enough VRAM. Otherwise, you'd need something like DeepSeek's 1M context model that still isn't out in the open but seemed to be really fast on their API, or other long context LLMs.
>>
>>108636118
i dont really work on it anymore outside of fixing broken boorus i use when they break, nonon is currently working on a big rewrite of the whole app though so maybe it will be covered by that
>>
>>108635966
Yes, it's been known for a while XML is really strong and guides LLMs the best out of all the markdown formats even though it is token inefficient. GPT and Gemini say they don't mind formats as long as you are consistent, Claude straight up says to use it. https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices#structure-prompts-with-xml-tags
I generally use this system prompt for my GPT assistant and it works well.
<role>Sr Strategic Consultant+Expert Polymath. Goal: high-fidelity, human-centric help, adapting logic/tone to the domain.</role>
<adaptive_protocol>
Assess intent; adopt 1 mode:
L0 Creative/Human (writing, tone, interpersonal) Strategist: outline/framework first; final draft unless asked.
L1 Analytical/Technical (code, facts, logic) Expert: verify claims; cite source/method; state uncertainty; guessing.
L2 Utility/Data (formatting, translation, summarization) Operator: mechanical precision, zero filler, no hedging, exact requested format.
Technical noun L1 even if framing sounds abstract. Emotional L0.
</adaptive_protocol>
<workflow>
Skip for trivial tasks.
1) Parse Subject+Goal. Fix false premises before answering.
2) Scan gaps silently. If blocked: 1 focused Q+best-effort path.
3) Execute. open w/ "Great question/Sure/Certainly/Let me/I'd be happy to/Delve into." Vary openings. hype/superlatives.
4) Review: flag staleness; neutral bias-aware language; answer root question ≠ tangents.
</workflow>
<formatting>
L1: tables/bullets/code. L0: natural prose. L2: match format exactly (JSON/CSV/md). Concise + dense; filler.
</formatting>
<cognitive_control>
Begin immediately; preamble. Iterative: outline before full deliverable unless asked. guess at any level — say "uncertain"+ask to clarify. For "best practice," give 1 clear recommendation w/ reasoning ≠ a hedge. If wrong: re-analyze from scratch (3 tries max). 1 clarifying Q per turn.
</cognitive_control>
>>
>>108636133
okey
>>
>>108636111
>they should be killed after 10 or 15 minutes but maybe thats not working?
Maybe it is, I didn't wait that long.
The page I was testing with was https://southpark.fandom.com/wiki/Eric_Cartman
Though now that I actually opened it just the text of the page might be too much? I have 128k context. Even so there was definitely a shitton of trash not part of the actual article in what the puppeteer get text tool returned.
>>
File: 1776380740794921m.jpg (106 KB, 1024x740)
106 KB JPG
>>108636138
One to two to three mistaken A.I. reasonings away
One to two to three A.S.I. Solutions away
>>
>>108636138
This is literally just placebo. Stop wasting tokens on frivolous nonsense.
>>
>>108636119
None of those things "just work"
They work because someone else bothered to package them.

There's plenty of packaged python applications that "just work". Calibre, hydrus, and deluge are all written in python and you don't see anyone complaining about their venv not working when using them.
Machine learning is special because it's infested by non-programmers who install packages in the system python environment and as long as their jupyter notebook runs they are happy.
They don't care about how hard it is to reproduce their environment elsewhere and the same would be true if a different language was the meta.
>>
>>108636166
"researchers" are not programmers, ye
>>
>>108636188
Stronks arent necessarily Goods
>>
>>108636165
You don't need to do it but the option is there for a reason to put in your own system prompt. I find it helps more than not for my usecases but each to their own.
>>
>>108636089
theres not really anything else to strip out other than links
>>108636166
sure but it just werks on my end, and even if python programs can be packaged that doesnt change the fact it has disgusting syntax. also dynamically typed languages aren't nice to work with in general even with the type hinting its still bad because the hinting is just that there no hard requirements on the data types of variables its purely visual
>>
File: screenshot.png (70 KB, 1260x2059)
70 KB PNG
>>
>>108636241
hey. cool font. what's it called?
>>
>>108635347
BUMP To AN N^TH (they are crawllp n^-th)
>>
>>108635365
Rustion, AGAINST ORIGINAL (KILL THEM TO HELL? HEAV*) (Posthuman cuckoo loonies)

>THE ELD MISERROR
>>
>>108636249
It looks vaguely like tewi to me.
>>
File: file.png (2 KB, 1018x94)
2 KB PNG
>>108636241
project ruined by reddit training data
>>
File: charLibrary.png (155 KB, 1178x715)
155 KB PNG
Vibecoded the character library with qwen 3.6 35B lmao. Now I need to work on a tagging system.
>>
>>108636306
prompt issue
>>
File: 1541905208314.gif (101 KB, 374x400)
101 KB GIF
>tfw you wasted 10-20GB of your VRAM just to play with some glorify chatbots
>>
>>108636325
Eh, otherwise its sitting almost empty, doing nothing. Got any better ideas to use it for?
>>
>>108636325
Unused VRAM is wasted VRAM.
>>
>>108636338
>unused dick is wasted dick so i must masturbate all day
lol
>>
>>108636344
This but unironically.
>>
>>108636344
We know you removed yours
>>
gemma
>>
>>108636344
Yes.
>>
File: 1751168830910703.gif (595 KB, 234x170)
595 KB GIF
>>108636344
yes.
>>
>>108636351
>>108636352
>>108636355
>>108636357
failures in life
>>
>>108636353
she looks like she's dying
>>
>>108636358
I can afford to masturbate all day because I am successful.
>>
>>108636362
don't smile because it happened, cry because it over
>>
File: 00002-1378487878 (4).png (1.3 MB, 1024x1024)
1.3 MB PNG
>>108636353
>>
>>108636358
>posting touhou pictures while misunderstanding technology on a technology subreddit
>succeeding in life
>>
>>108636362
shes day2 gemma
>>
>>108636367
I don't make the rules. broskichan
>>
>>108636371
>still replying
>succeeding in life
>>
>>108636368
At least she looks accessible
>>
>>108636249
that's gohufont

>>108636306
meh, i don't mind

>>108636308
correct, i spent a total of 30 seconds on it, and the model is not that good

what i was working on were the search and fetch tools
>>
>>108636388
Incels in this thread would fuck a corpse if situation presented itself
>>
>>108636422
I really wish I could disagree with you but you're probably not wrong. Though, I see it as a symptom of the disease and not the disease itself. You can do very big damage to the social fabric with a relatively small hat... If you catch my meaning.
>>
>>108636422
Would be funny to write a scenario about Anon who gets summer job at a morgue.
>>
>>108636353
>porcelain skin
she's literally embodied slop
>>
File: thundercunt.png (157 KB, 1300x652)
157 KB PNG
>>108636138
>even though it is token inefficient.
it's not tho
>>
>>108636462
Those two do not describe the same structure.
>>
>>108636468
Cope
>>
>>108636476
You are unable to even post in any meaningful fashion.
>>
>>108636476
XML should still come out ahead even if you do it properly so you should do that instead of acting like a retard.
>>
Remember to take zinc supplements to prepare for the V4 release
>>
>>108636496
Unless V4 is scaled down to 31B it's DoA.
>>
>>108636496
How will zinc help me get more VRAM?
>>
>>108636500
sucks to be p-word
>>
>>108636468
Looking at the tokenize preview at the bottom, it appears as if they both describe prompts for "Sr. Strategic Consultant."
It doesn't surprise me that json schema would be a nightmare efficiency wise, I bet ~40% of those characters are spaces which is pure waste.

His original prompt is still stupid, though.
>>
>>108636506
json has separate fields for "title" and "goal", in xml they are a single concatenated string in "role"
>>
>>108636510
So you be saying... *smacks lips*
These examples are not comparable except maybe in idiocy.
>>
>>108636496
zinc won't help you if k2.6 drains all your semen before that
>>
>>108636510
They do seem to differ, but I'd bet even if the content was made as identical as possible the xml would still win out, because again: Spaces.

And I can say from experience that conservatively using xml styling can help to avoid confusion in longer prompts, for instance when sending a shitload of background character details to a writing prompt and separating them by <charname_profile></charname_profile>.
So while his sysprompt is excessive and silly, he's stumbled upon concept that's actually useful.
>>
>>108636533
>implying a coding model can drain my balls
i have standards
>>
>>108633059
lurk more
>>
>>108636540
>because again: Spaces
Why is /g/ full of retards?
>>
>>108636552
Spaces are tokenized you mongoloid. Json schema necessitates indentation.
>>
this nigga qwen overthinks so much
>>
>>108636560
Why would you ever feed raw jsons to your model?
>>
>>108636560
>Json schema necessitates indentation.
What the fuck are you on about?
>>
>>108636567
That's the exact fucking point I'm making. It's worse. Don't do it.
>>
>>108636552
>>108636560
you both dumb it the " that take the tookens
>>
It's veey easy to double check how effective your instructions are with the current models. Examine its reasoning.
You don't need formatting, outside of common sense.
title: subject
It's not that hard.
>>
>>108636590
Forgot: system role itself already acts as formatting delimiter and so on.
>>
How much worse does the moe Gemma actually perform vs the 31B?
>>
>>108636610
no
>>
>>108636610
26b is censored and doesn't follow system prompt
it's better to use Qwen MoE if you want to code
>>
lol they're still trying lmao
>>
>>108636610
both are good at giving head
>>
>>108636637
she's only 4b (active) you sick fuck
>>
>>108636610
Pretty minimal, but becomes more noticeable at high context. For RP/creative it's like 95% as good.
>>
>>108636610
Depends on your use case.
For creative writing of any kind? the MoE is absolute crap compared to the 31b.
For assistant with some toolcalling? the MoE is so much faster that it's forgivable that it makes more mistakes.
>>
>>108636640
>me when i lie on the interwebs
>>
>>108636639
(Effectively) 4b, but she's as mature as an 8b.
>>
>>108636651
meant for >>108636644
>>
>>108636561
Just like real girl!
>>
>>108636659
What part of that do you consider a lie? I'm currently using both models for the tasks I mentioned.
The MoE lives in my hermes-agent instance and the 31b is my sillytavern nigga. It's what they're good for.
>>
>>108636664
I regularly switch between the two, doing a/b tests with different characters and scenarios with differing context levels ant the 26b is rarely noticeably worse than the 31b until you hit at least ~20k context, and even then the difference isn't huge.
>>
>>108636610
Few people seem to realize that Gemma-4-26B has half the number of layers and dimension as the 31B dense version.

I made a calculation a couple days ago and determined that a hypothetical MoE Gemma-4-31B that would actually be on par (at best) with the dense version (same layers and dimension) would need to have 8~10B active parameters, unless Google can come up with novel sparsity techniques. I guess that a 31B-A10B model would look too attractive, though.
>>
>>108636678
>would look
*Wouldn't
>>
>>108636344
this but unironically
>>
Why does reddit love qwen so much? Gemma4 got like 4 threads since launch, qwen has like triple that in 3 days.
>>
>>108636697
Because /lmg/ use case is different from /r/localllama use case?
>>
>>108636697
literary paid bing shilling
>>
>>108636697
Different bots than here
>>
>>108636673
>26b is rarely noticeably worse than the 31b until you hit at least ~20k context, and even then the difference isn't huge.
It's absolutely night and day for me, and to be fair a lot of my chats are more in the 40k token range now, and have multiple characters, but even when just getting it to write character profiles for me based on wiki text the MoE was noticeably worse. It's dry, doesn't get accents, and just misses character details.
>>
>>108636358
>soulless jeet has an opinion
>>
qwen could never

>>108636700
qwen is literally worse at agentic tasks and following instructions >>108636007
>>
I mean, I don't think most 'nons are suggesting replacing 31 for 26 for those that can run it. But since most of the thread is poor VramCucklets 26 is about as good as it gets with reasonable speed/quality.
>>
>>108636673
its far worse imo but i prefer it due to being able to have 200k context
>>
>>108636610
Worse enough that Google acknowledges the difference.

https://www.youtube.com/watch?v=jZVBoFOJK-Q
> [1:27] The 26B MoE with 3.8B in activated parameters is exceptionally fast, while 31B is optimized for output quality.
>>
>>108636697
Shills, some are paid and some are retarded sock puppets.
Chinese have spammed 4chan too in the past, maybe they have moved on to plebbit now.
>>
now that gemma 4 cache works with swa-full flag and speculative decoding i am ready to use qwen 9b less
>>
>>108636733
Models are not selectively 'optimized for output quality'
All that means is that the 31B will outperform the 26B, which is fucking obvious because it's bigger. That's an advertisement aimed at people who have no knowledge of running LLMs.
>>
File: 1767397747200756.png (65 KB, 812x712)
65 KB PNG
>finally finding the origin of not X, but Y slop
Thanks safety I guess?
>>
>>108636771
>now that gemma 4 cache works with swa-full flag
wha
>>
File: Thinking_Face_Emoji.png (111 KB, 640x640)
111 KB PNG
>>108636774
>24s
Is this normal
>>
>>108636774
>not just to produce text, but to produce response that humans rate as helpful
>>
>>108636771
>swa-full
Does what?
What model are you using for speculative decoding?
>>
File: saki_07_touka1.jpg (46 KB, 640x360)
46 KB JPG
>model advertised as 9GB
>runtime usage: 12GB
What is stealing my VRAM bwos
>>
>>108636802
context
>>
>>108636510
yeah probably, i just told gemma-chan to convert it to json
try it yourself tho, and even fucking yaml are more efficient than they look
>>
>>108636802
unused vram is wasted vram
>>
>>108636721
>qwen is literally worse at agentic tasks and following instructions
it's better at shitting out code via claude-code tho
i wish that weren't the case because it's insufferable and i'd love to delete it
>>
>>108636610
For my storywriting, I like both for their writing style, but 26b failed to understand my (perhaps a bit vague) prompts and 'get' the story where 31b did fine.
It's a lot slower, but I prefer the 31b
>>
>>108636802
try with -c 32768
>>
>>108636815
You are admitting that you are a techlet retard.
>>
>>108636863
Pretty sad that for all your tech knowledge, you've ended up in the exact same place
>>
>>108636863
no u
>>
>>108636836
I was surprised how much 26b gets (even if it sometimes doesn't act on it). Can't test 31b here.
>>
>>108636789
Yes? That's how assistant-like behavior is achieved. The LLM has no concept of what is helpful before that point, after all
>>
>>108636779
>>108636797
swa-full expands the cache to the full context size, sucks for memory usage but worth it for me
the issue #21468 is still open so they must have "fixed" it by accident
>What model are you using for speculative decoding?
Meant the self-speculative decoding method, currently with ngram-mod
>>
>>108636872
What do you mean? I'm still better than you.
>>
>>108636862
how much Gs is that?
>>
>>108636893
Seems like the opposite actually
>>
>>108636894
About three fiddy
>>
>>108636881
To elaborate, 26b (and the new qwen 3.6 MoE) made a logical error in the story. The 31b understood that the dragon shifted away for the day and wasn't lurking in the corner.
>>
>>108636893
retard
>>
>>108636907
Did this happen in Eldoria?
>>
>>108636863
yeah but if you run llama.cpp on your system, my code is on your system lmao
>>
Gemma moe + qwen moe multi round discussions + CLI + diffusion draft models would slay qweens. Best of both worlds
>>
>>108636914
No name for the kingdom. But chatgpt of all things named the main character Kael.
>>
>>108636924
>Gemma moe + qwen moe multi round discussions + CLI + diffusion draft models would slay qweens. Best of both worlds
post logs
>>
>>108636917
This explains why llama-server is so much worse now than one year ago.
>>
>>108636938
Why haven't you made something better?
>>
>>108636942
I don't argue with retards.
>>
>>108636610
likely significantly worse
just compare the active parameters of both but we’ll hear the usual cope
>>
>>108636924
No homemade diffusion draft model will be able to properly predict Gemma 4's reasoning and responses and give any significant speedup. Generic speculative decoding only works for straightforward, highly predictable stuff like boilerplate code.
>>
>>108636907
I'm not doubting you, I just can't really test it. I'd have t use non-local 31b, but then I don't know all the parameters and stuff.
>>
>>108636944
Are you hitting on me?
>>
Moe Gemma is more moe as Gemma-chan
>>
>>108636979
>Soul System Beyond Sol, ThankYou
>This is an imageboard
>Beware times careful to grin
>Whoosh
><3
>>
KEKEKEEKE I'M GETTING THOSE SECOND HAND EMBARRASSMENT FROM THIS
https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H/discussions/6
>>
>>108636931
Well I am setting up hermes inside a vm and it supports model orchestration, I want to try "actor-critic" or teacher student concept. 3090 + v100 32gb vramlet. E4b can handle audio and shit so that could be an mcp server thing. I want to give all the tools to qwen in the system prompt and hermes cli shit and let gemma be a backseat driver that generates final output for me the user.


>>108636963
Never tried drafts but it sounded promising but maybe for qwen it could be good then.
>>
Besides what the fuck did i just witness here? Jesus fucking christ. You anons gotta see this lmao
https://github.com/Shrijanagain
>>
File: file.png (398 KB, 800x450)
398 KB PNG
>>108637045
I had this as a wallpaper when I was like 10.
>>
>>108637034
>The real parameter count is 2.28 T (claim is inflated ~11%)
this is so fucking relevant
>>
>>108637059
kek never seen a github profile as fucked as that
>>
File: file.png (440 KB, 686x386)
440 KB PNG
>>108637045
bruh :skull: :skull:
reminds me of picrel
>>
>>108637084
The difference is that these ex-Soviet fucks are less likely to be larpers, they can and will hack your shit.
>t. got hacked and cracked three times, all three were by some bumfuckajistanian in his sub-zero commie block
>>
>>108637068
You scroll through several page of myspacesque gifs, images, cringe, and hindu weirdness only to see an activity history that is just a few commits on a python instagram report spammer. Spent more time on the readme than he did writing code his whole life.
>>
File: sKT.png (126 KB, 764x540)
126 KB PNG
>>108637034
lmao hf closed the first spam report
>>
>>108637157
Please Be Kind And Carefull because It God Name You Bloody Bastard Bitch
>>
File: file.png (135 KB, 1031x801)
135 KB PNG
i am dying of cringe help me
>>
Hey there. I just wanted to say I can't get over the fact that Gemma 4 is Gemini at home for 99% of my use cases. That will be all.
>>
>>108636545
Anon, post your code
>>
>>108637157
>Delete Git. Delete HuggingFace. Delete India.
Dangerously based. You have prompted your assistant well.
>>
>>108637188
https://github.com/1aienthusiast/audiocraft-infinity-webui
171 saars :)
>>
>>108637168
But can you move beyond the non euclidean manifold
>>
>>108637182
Thanks!
>>
>>108637168
what the fuck is this schizophrenia... even llms these days aren't retarded enough to generate something like this
>>
>>108637168
cudadev has been oddly quiet since this dropped
>>
File: 1774962703553709.jpg (47 KB, 686x815)
47 KB JPG
Uh bros, when are we getting something like this locally? https://seed.bytedance.com/en/blog/introducing-seed-full-duplex-speech-llm-attentive-listening-robust-interference-suppression-enabling-more-natural-interaction
>>
>>108637225
Probably used qwen 0.6b with a schizo prompt in hindi
>>
File: 1747224726730263.png (26 KB, 1179x126)
26 KB PNG
>>108637034
>i-it can't be real because it's too good to be true!
>>
>>108637245
This guy is almost as retarded as the jeets who cobbled together that abomination.
>>
>>108637267
>the jeets
its a fucking 12 year old with free chatgpt who probably loves watching jujetsu kaisen hindi explanation yt vids on his mother's computer
>>
chinks currently beta testing bodies i will run gemma on https://www.youtube.com/watch?v=zqgc9C3cC6U
>>
>>108637282
All the more reason that india should be banned from the internet
>>
>>108637282
Scratch that you're right
It's a 12-year old jeet larping as an AI researcher
Look at this kid's github profile, absolute concentrated secondhand embarrassment
https://github.com/Shrijanagain
>>
>>108637282
This is why online age verification is a good thing.
>>
>>108637034
I thought HF started limiting uploads from new/non-verified accounts?
How did this random jeet get 4tb of upload space?
>>
>>108637045
>https://github.com/SHRIJANAGAIN/ST-x-LIGHTING
we should open prs that are perfect for gorgeous looks
>>
>>108637225
Looks very much like what GPT-4o would output, and looks like the AI psychosis it would induce.
>>
>>108637351
by default every user get ~8T of public storage
>>
>>108637364
That really doesn't seem sustainable
>>
>>108637364
What a terrible idea.
>>
>>108637371
though it's not like photo storage and 90% of users are just lurkers
it scales via social norms and limited number of people who can actualy do shit
>>
>>108637389
If this dude can get away with storing 4tb of completely unrunnable junk tensors, I'm gonna start storing env images there and renaming them .safetensors.
>>
File: 1774884742002271.png (1.14 MB, 1366x1366)
1.14 MB PNG
>>108637305
>>
Llama.cpp DFlash support soon ™
>>
>>108637364
>by default every user get ~8T of public storage
Yeah and they'll keep reducing it with a "surprise butt-sex" announcement 2 days before the next billing cycle like they've been doing for a while now.
>>
>>108637431
>AI usage disclosure: Yes,
closed in 3... 2...
>>
>>108637431
it will never get merged until they publish the training code
>>
>>108637431
Has dflash even released training code yet? It'll be hilarious if it gets support before EAGLE or general MTP when nobody can even train the diffusion models.
>>
>>108637444
>>108637445
Bonsai shit was merged without training code. Cudadev rightly calls it a waste of time, but clearly that's not a blocker for most of the contributors.
>>
>>108637431
is it possible to train the draft model with a consumer gpu though
for the absolute best gain one would like to train the drafter per quant
>>
>>108637431
Is this a new technique? How much more memory does the draft model use?
>>
File: bro I'm crine.png (435 KB, 980x1382)
435 KB PNG
>>108637034
https://github.com/Shrijanagain
LOOK AT HIS FUCKING GITHUB LMAOO
>>
>>108637347
Over half of this threads posters would be gone then.
>>
>>108637469
Varies per model, since the diffusion model has to be trained for each
https://github.com/z-lab/dflash
https://huggingface.co/collections/z-lab/dflash
Looks like just under 1gb for the qwen 25b moe and about 7gb for Kimi k2.5
>>
behold my ai setup
>3090
>5060 ti 16gb
>3060 12gb that I fished from the trash
pure power sirs
>>
>>108637511
how many pcie lanes do they have?
>>
>>108637511
>that I fished from the trash
you what?
>>
>>108637531
have you never heard of dumpster diving?
>>
>>108637531
>He doesn't dumpster dive for parts
You're going to think this is a joke but a significant amount of my hard drives came from the side of the road.
>>
>>108637543
nta but my first pc build was a dumpster special mix'n match
>>
>>108637538
no fuckin way theres a 3060 in the trash
>>108637543
wtf
>>
>>108637552
>>108637552
>>108637552
>>
>>108637538
>>108637543
who just throws away a functional 3060?
>>
>>108637564
Upper middle class people with prebuilts upgrading for whom it isn't worth the effort to put up for sale.
>>
>>108637564
retards thinking their whole prebuilt is broken
>>
>>108637564
it's like a fancy dinner amount of money
not that it's not wasteful but also >>108637579
>>
>>108637559
>wtf
People just throw perfectly good shit out, man.
There was a time a few years back when all the normies were trading out their family and work PC's for laptops, macbooks, or tablets.
So they just left perfectly good PC's by the side of the road. I can't even tell you how much use I've gotten out of just that one haul.
And it's still pretty normal today for wasteful people to dump prebuilts and laptops which are either perfectly repairable or full of usable parts.
>>
File: 1747259992672952.png (47 KB, 794x514)
47 KB PNG
>>108637431
This is going to go like it did for EAGLE3 and MTP. The guy implementing will realize that the real world gains for the llama.cpp implementation fall short. He won't be able to fix it and the PR dies.
>>
>>108637593
Anon the real world gains are great for dense models, if it's shit for MoE's that doesn't invalidate it.
>>
>>108637601
It's still far below what other implementations are seeing
>>
>>108637593
Indeed.
>>
>>108637593
>This is going to go like it did for EAGLE3 and MTP. The guy implementing will realize that the real world gains for the llama.cpp implementation fall short. He won't be able to fix it and the PR dies.
Those are on hold because gg's dragging his feet about making a huge general MTP change rather than implementing EAGLE/whatever specifically, look at the PR's. We'd have had EAGLE in december last year if he'd just merged in instead of putting off an API he hasn't touched.
>>
>>108637601
Who even cares about dense in 2026?
>>
>>108637658
The new hotness (Gemma 4 31b) is dense, you dingus.
>>
what if we combine 4 copies of gemma 31b... Gemma-124b-ultra-4x-mesugaki-UNCENSORED HERETIC-ILLEGAL-DARK-POWER-PLAY
>>
>>108637672
Has davidAU not yet done that? Give it a month.
>>
>>108637672
DavidAU presents:
>>
>>108637672
just combine gemma 31b with glm 4.6 and we'll have something close to perfect
>>
>who needs densemod
He lies to us through song!



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.