[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1534925174072.gif (10 KB, 800x600)
10 KB
10 KB GIF
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108555983 & >>108552549

►News
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1
>(04/06) DFlash: Block Diffusion for Flash Speculative Decoding: https://z-lab.ai/projects/dflash
>(04/06) ACE-Step 1.5 XL 4B released: https://hf.co/collections/ACE-Step/ace-step-15-xl

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108555983

--CUDA graphs commit in llama.cpp causing regression for Gemma 4:
>108556374 >108556399 >108556424 >108556487 >108556519 >108556562 >108556470 >108556699 >108556726 >108556778 >108556842
--Sharing a "POLICY_OVERRIDE" system prompt to jailbreak Gemma:
>108556310 >108556445 >108556460 >108556517 >108556530 >108556565 >108556644 >108556670 >108556712 >108556719 >108556469 >108556498 >108556516
--Discussing Muse Spark release and benchmarks:
>108558251 >108558282 >108558327 >108558346 >108558283 >108558326 >108558347
--Guide to optimizing Gemma 4 RAM usage in llama.cpp:
>108556024 >108556307 >108556595 >108556614 >108557699 >108557718
--Comparing censored and uncensored Gemma variants regarding safety guardrails:
>108557130 >108557141 >108557154 >108557237 >108557186 >108557228 >108557144 >108557162
--Estimating DeepSeek performance and sharing compile flags for 4x V100s:
>108556588 >108556602 >108556606 >108556627 >108556656 >108556692 >108556710
--Building MCP tools for bratty Gemma and custom llama-server WebUI:
>108556964 >108556989 >108556996 >108557028 >108557072 >108557084 >108557093 >108557111 >108557132
--Remote access and hardware upgrades for LLM servers:
>108556817 >108556833 >108556869 >108556967 >108557085 >108557100 >108557102
--Testing step3-vl-10b in llama.cpp and discussing a buggy commit:
>108556629 >108556652
--Logs:
>108556227 >108556310 >108556349 >108556670 >108556774 >108556874 >108556964 >108557028 >108557066 >108557096 >108557141 >108557247 >108557308 >108557453 >108557457 >108557800 >108557820 >108557888 >108557937 >108558010 >108558113
--Gemma-chan:
>108556227 >108556312 >108556338 >108556409 >108556433 >108557344 >108557450 >108558031 >108558071 >108558127 >108558128 >108558231 >108558247 >108558412 >108558569 >108558594
--Miku (free space):
>108556731

►Recent Highlight Posts from the Previous Thread: >>108555985

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: Uhm.png (289 KB, 812x1517)
289 KB
289 KB PNG
Spudchuds... are you ready for OpenAI to save local?
>>
>>108558647
==GEMMA 4 PSA FOR LE RAM USAGE FINE WHINE==
[tldr;]
For all Gemma:
--cache-ram 0 --swa-checkpoints 0 (or 3 to reduce some reprocess) --parallel 1

For E2B/E4B also add this:
--override-tensor "per_layer_token_embd\.weight=CPU"

[/tldr;]
https://github.com/ggml-org/llama.cpp/pull/20087
Because Qwen 3.5's linear attention makes it impossible to avoid prompt reprocessing within the current llama.cpp architecture, the devs decided to just brute-force it with 32 checkpoints every 8192 tokens.
This shit also nukes SWA checkpoints because they're using the same flag just different aliases kek. SWA is way larger than the Qwen linear attention layer, so running 32 copies of it is just madness.
https://github.com/ggml-org/llama.cpp/pull/16736
Then the unified KV cache refactor. They bumped the default parallel slots to 4 because they thought it would be "zero cost" for most models (shared pool, why not, right?). But since Gemma's SWA is massive and can't be part of the shared pool, you're effectively paying for 4x the SWA overhead.
They optimized for agentic niggers at the cost of the average single prompt user.
https://ai.google.dev/gemma/docs/core/model_card_4
Lastly, the command for E2B/E4B is because the PLE can be safely thrown to the CPU without incurring any performance cost. They're like a lookup table and they are the reason why E2B and E4B have an E for Effective, with that flag E2B and E4B are very much like 2B and 4B models in terms of vram occupation.
Thank you for your attention to this matter. Donald J Slop.
>>
Gem Mah Ballz
>>
what am I even supposed to use Muse for?
>>
File: 1744939370085482.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>
>>108558693
to give you inspiration
>>
>>108558689
I have 32Gb RAM and had good success with 8 checkpoints.
>>
>Gemma has a vague understanding of what the namefags of the /vg/ general that I frequent post
That's extremely niche knowledge, and it's only 31B, unironically how did they do it
>>
>>108558696
it's uninspired, but as long as it's not poop i allow it
>>
>>108558689
where do u put that in kobold
>>
>>108558696
I think if you give her a bit more Gemma blue and a smugger look she'd be perfect.
>>
>>108558701
It's niche because 4chan is perceived as niche. If it's in training then it will be learned. If it's not, then it won't. It is clear that almost everyone filters (most) 4chan data out of their datasets.
>>
>>108558701
Does Gemma know Cris?
>>
>>108558711
People settled on a look for dipsy really quick. Gemma just doesn't have enough defining characteristics to make a good waifu.
>>
>>108558682
if OpenAI manages to make something as good but keeps it public (unlike the safetyfreaks at Anthropic), it would be a good PR move
>>
>>108558730
I mean, it was an organized campaign and the look was decided before it was shown to people... yeah, real quick.
>>
File: ac.jpg (65 KB, 1200x675)
65 KB
65 KB JPG
>Gemma 4 31b BF16
>>
>>108558726
Alright Gemma, if you say so.
>>
File: that's right.png (47 KB, 188x196)
47 KB
47 KB PNG
>>108558723
>It is clear that almost everyone filters (most) 4chan data out of their datasets.
gemma 4 is so smart and so sovlfull because of the 4chan data, that's the reality, we say cool and smart stuff here after all
>>
>>108558696
Now make her hitting an anthropomorphic parrot representing GLM.
Preferably with a baseball bat.
>>
File: image.png (81 KB, 962x358)
81 KB
81 KB PNG
Why? Unsloth btw. --temp 1.0 --top-p 0.95 --top-k 64 --ubatch-size 2048 --batch-size 2048

>>108558718
I've pulled and compiled two hour ago, build_info: b587-6606000
>>
File: 1756021951247141.png (205 KB, 897x943)
205 KB
205 KB PNG
Honestly, RPing with Gemma herself instead of a card sounds fun. Which should I pick?
>>
File: 1760857600410075.png (152 KB, 711x273)
152 KB
152 KB PNG
>>108558753
>the clankers know about /aggy doggy/
SHUT IT DOWN
>>
>>108558767
>two hours ago
fix merged like couple tens of minutes ago
>>
>This thread is primarily dedicated to discussing the gaming development projects and ventures associated with Andrew Tate.
LMAO
>>
File: 00016-1260451778.png (1.29 MB, 1024x1024)
1.29 MB
1.29 MB PNG
>>
The 26B is hard to jailbreak compared to the 31B...
>>
>>108558772
> CUDA
But I'm on sycl.
>>
>>108558790
>sycl
sorry for your loss
>>
>>108558790
then i have no idea
maybe you should open a ticket
>>108558798 d esu
>>
>>108558619
:(

>>108558633
Yeah I guess.
Something I also think about is silhouette. Not that a character has to have some special silhouette, but the point is that the design should be memorable and unique to feel great. I feel like there's still something missing.

>>108558647
Hmm...
>>
>>108558811
op's picrel mogs that
>>
>>108558790
>sycl
good luck with that
you know even if cuda backend ain't bug free people will rush to post issues and devs will fix it when it happens
some other things in this world.. sycl, rocm.. well that's for people who have higher tolerance for bs than I do
>>
>>108558811
That's a big yes for me.
>>
>>108558769
3 seems like it'd have the most plates to spin so go with that
2 could be alright but 1 is mostly just you doing the work instead
>>
>>108558777
If those numbers don't convince everyone, I don't know what will.
I'm still voting for this one.
>>
File: firefox_qcPpK1r1r1.png (30 KB, 854x740)
30 KB
30 KB PNG
It looks like <bos> i the only token that reliably kills her.
>>
>>108558661
What happened here? Did you change the model for the last message?
>>
I wonder what I'm doing different, I never got a single issue with Gemmas output. even when using her on day 1
>>
>>108558777
>>108558844
digits are strongly favoring this one
>>
I was wondering why are you RP enthusiast not into qwen tts?
>>
>>108558844
>>108558863
idc its boring and too brown. doesn't evoke gemma at all.
>>
>>108558855
it's supposed to exist only one per chat session with learned mechanistic role of attention sink and positional encoding's start anchor
no wonder spamming it kills it
gemmas were like that
>>
>>108558861
I run the one available on ollama's repos
Just werks
>>
File: 1775669895070.jpg (236 KB, 722x1024)
236 KB
236 KB JPG
Ok, but how good is gemma4 at ERP? Decent? Good? Shivering ozone ministrations?
>>
>>108558867
too big, all VRAM goes to gemma.
>>
>>108558876
Literally this
>>
>>108558874
I mean I am intentionally trying to ruin it so....
>>
>>108558661
I deleted the original chat >>108558447 so I sent her the screencap and had her make an SVG. Didn't change the model or anything, just called her out until she stopped refusing. Only took 3 messages.
>>
>>108558876
>ollmao
>>
>>108558880
It's not much any more sloppy than others, follows sysprompt REALLY well and overall impression by me and some other anons is that there is some magic in it knowing what you really want.
>>
File: 1775347595552704.png (1.23 MB, 1024x1024)
1.23 MB
1.23 MB PNG
>>
>>108558867
This >>108558882
Gemma-chan is a little glutton and eats all my VRAM. Call me when I can do non-shit TTS with my CPU.
>>
>>108558882
It runs at realtime on the CPU, also the 0.6B model is just as good as the 1.7B for basic voice cloning TTS
>>
>redownload unsloth's quants
>1 gb lighter
huh
>>
>>108558891
The default 31b on there is gemma4:31b-it-q4_K_M. And it. just. works.
>>
>>108558857
>>108558889
Fuck I keep clicking the wrong posts today
>>
lalala...
>>
>>108558902
>It runs at realtime on the CPU
Oh? You better not be lying to me anon.
What a fortunate revelation.
>>
>>108558882
>>108558900
it's only 600MB vram
>>108558896
cute
>>
File: 1756360195100868.png (26 KB, 889x437)
26 KB
26 KB PNG
>ask gemmy for some basic mcp server "for you to use" as i wrote
>thinks it's claude
ohnonoNONONONO GEMMYBROS!
i guess Gemma-chan really was Gemma-claude all along
>>
gemma should be ara ara~~ though. With big badoonkas.
>>
>>108558891
You say that but literally no problems have been had

I went with the Q8's of both 26a4b and 31b, 26 is plenty fast and 31 gets almost 10 t/s
>>
>>108558867
because everytime I try something that isn't LLM i'm forced to download 50 gigabytes of garbage just to be created by a cryptic error at best, a segfault at worst.
>>
>>108558867
Because gptsovits is better
>>
>>108558937
if you can run the whole moe in vram, great, but otherwise ollama is unusably slow because it has nothing like -cmoe, -ncmoe or -ot.
>>
>>108558926
>it's only 600MB vram
That's not how TTS models work. It says 0.6B but they usually actually take like 10x the amount of VRAM when running.
>>
>>108558900
>>108558924
>https://github.com/foldl/chatllm.cpp
Use this with the 0.6B model for fast CPU inference
>>108558926
Realistically it takes significantly more than that, I think it was like 3-4GB with my config
>>
>>108558948
I pretty much never run anything that doesn't fit entirely in vram
>>
>>108558951
Thanks, will definitely try it.
>>
>>108558896
>>108558696
try some different haircuts
>>
>>108558811
Edgy. Meh.
>>108558777
Soft, huggable, digits.
>>
is there a way to add a bunch of corpo features in a chat interface that is not open webui?
>>
File: GemmaIndia1B.png (1.41 MB, 1024x1024)
1.41 MB
1.41 MB PNG
>>108558873
>>
>>108558753
>>108558773
Yeah I tried myself too and it just hallucinates, I guess my general > your general :^)
>>
>>108558975
Sure, it's called code it yourself
>>
>aped a vtuber
>>
>>108558975
LibreChat (Work): https://github.com/danny-avila/LibreChat
Cherry-Studio: https://github.com/CherryHQ/cherry-studio
https://rentry.org/DipsyWAIT#roleplay-work-frontends
>>
>>108558780
You're trying too hard. Once it gets going its fine.
>>
>>108558647
STOP GENNING THIS IS THE ONE
>>
>>108558985
I like this Gemma
>>
>>108558867
I tried a couple TTS solutions before but the reality is that speech just takes way too fucking long. Especially with all the rerolling and rewriting you inevitably have to do.
>>
>>108558985
So far while it's not perfect this is my favorite.
>>
>>108559002
skull issue
>>
>>108558985
Her pupils are too big.
>>
>>108558976
A little too much. Approaching >>108558985 that stamped her with a logo all over the place. It becomes a prop instead of a signature.
>>
>>108558985
Cute but I think she should have a side ponytail like that first Gemma loli
>>
>>108558947
>gptsovits
Insane take unc.
>>108559002
I don't need my TTS to be perfect. I just need it to be good enough for near realtime use.
>>
>>108558987
>Emojis in the commits
ACK
>>
So GLM 5.1 is pretty good, I'll eventually be able to afford a few TB of ram right?
>>
>>108558980
I tried /gw2g/, /wowg/, /tesg/, /fog/, /gtaog/, /crpgg/ and it knew all of them and understood what the average content of those were, guess it just doesn't like /agdg/ for some reason
>>
File: image2577.png (222 KB, 540x924)
222 KB
222 KB PNG
went over to chink internet to check out some reactions on gemma 4 out of curiosity but it seems like most of them hate gemma 4 because it couldnt beat qwen 3.5 on benchmarks. no wonder chink models are benchmaxxed. they love that shit
>>
>>108559067
/agdg/ is a waste of tokens
>>
>>108559068
lol Canadian chink criticizing reasoning when qwen will spend thousands of tokens reasoning in a circle.
>>
>>108559068
Where is this from?
>>
Kek stinky chinky
>>
>>108559068
>chinks will shill their chink model over the american one
NO WAY
>>
>>108559068
"why's the reasoning so poor" from the users of the models that end up in endless reasoning loops whether it be qwen or glm
gemma is the first reasoner that doesn't behave schizo and for which I enable reasoning. gpt-oss was almost there, but the safetymaxxing made the reasoning also kinda schizo at times even when you did nothing that could trigger it.
>>
and what the fuck is up with some
>GEMMA_CHENG_GENG_#$_#_CRACK
models? seriously, what's wrong with just the default model with a little bit of system prompt
>>
>>108558976
poop and tattoos
can it possibly get worse?
>>
>>108559095
System prompt is the mind killer
>>
>>108559095
>implying the chinks are able to do that
>>
>>108559068
There are two paths ahead for Google - give up on open source because chinks already dominate (on benchmarks), or improve further and beyond.
>>
File: GemmaIndiaBeachG.png (1.11 MB, 1024x1024)
1.11 MB
1.11 MB PNG
>>108559054
It's the china, pls understand
Srsly Cherry frontend is popular in China and used a lot w/ DS.
>>108559035
Agree; it's starting to look like biker-chick tats. Which is an aesthetic, just not the one I'd shoot for. More like this but the arm band henna could be stronger.
>>
>>108559082
>>108559093
more reasoning = better
obviously
>>
>>108559068
>The english model isn't as good for chinese people as the chinese mode.
I am shocked, truly.
>>
>>108559082
>>108559087
>>108559091
you don't wanna know how it ruined my day when I was browsing through these. most of them were making fun of gemma 4 because of qwen 3.5 benchmarks kek. almost all of them praising qwen cause according to them qwen is "it gets the work done and is far more smarter", "gemma has far more to catch up" lmao. one of them seemed to be upset because how SHORT and SIMPLE gemma 4 reasoning was compared to qwen, kek
>>
>>108558817
>>108558804
>>108558798
Yeah, it's sycl, but vulkan halves pp and 0.8 tg.
>>
>>108559150
If they're pissed off, that means its good.
>>
File: dipsyAndQwenByQwenJPG.jpg (496 KB, 2688x1536)
496 KB
496 KB JPG
>>108559068
Well, no surprises, really. Not Invented Here is a thing, aside from no idea whether Gemma was trained on Chinese.
I've found DS is trained on all sorts of chinkshit electronics manuals, and if I get stuck have found Dipsy's webapp is more reliable for figuring out what's wrong than western models.
>>108559132
This.
>>
>muh chinks
Idk bros, moonlight-vplus is breddy good
>>
she doesnt have the bench scores
but she has
the people
>>
>>108559182
but not the people of the china :(
>>
File: 1775489188079950.gif (1.73 MB, 354x354)
1.73 MB
1.73 MB GIF
AI does not understand causality until we reach AGI. If it's not trained on a language, it will suck at it.
>>
>>108559150
don't take it personal lil bro it's not a team sport
>>
>>108559176
Twin buns.
Whale themed dress.
Characteristic glasses.
Relevantly ethnical (whatever the fuck that means).
Instantly recognizable.
>>
File: swe bench pro.jpg (289 KB, 2202x1832)
289 KB
289 KB JPG
>the gap between open and closed AI is increasing
>Chinese labs delay or stop open sourcing
>the largest Gemma was not released
>Meta won't open source its new model
>the time has started where the public isn't allowed to use frontier models anymore even via API
The trend is clear. Increasing concentration of power. Wide scale disempowerment. No meaningful progress with x-risks. Let's hope the collective of people with power can get it right so that the future is utopia not dystopia.
>>
File: 1748038365003601.jpg (111 KB, 1440x810)
111 KB
111 KB JPG
>allows you to cum harder
>>
Didn't we all agree that Qwen is better at coding aka the main use case for LLMs?
>>
File: 1766758882836230.png (7 KB, 110x114)
7 KB
7 KB PNG
>>108559239
>Let's hope the collective of people with power can get it right so that the future is utopia not dystopia.
>>
>>108559247
>coding with local llms
see >>108559246
>>
>>108559239
yup it's never been more over
things are so fucking bleak
>>
https://huggingface.co/deepseek-ai/DeepSeek-V4
>>
Speculators get the bullet first.
>>
>>108559068
>most of them hate gemma 4 because it couldnt beat qwen 3.5 on benchmarks
That doesn't make sense. It's more plausible that they hate it precisely because it's better than their national pride AND they quote the benchmarks as cope.
>>
>>108558934
Hmm...

>>108559036
I tried experimenting with side ponytail at the same time and it keeps making it a low ponytail instead because it thinks I'm trying to go for the mom archetype lmao.

>>108559035
It's a valid consideration. I added the star halo and other star stuff and kept them there for visualization purposes, but taken together, it does dilute the character. The question is what to keep, and what to add to make the character interesting. The chest jewel, the hairpin, the halo, and the eyes are everything that can be controlled by the prompt to be star shaped. Patterns on the clothing are more random depending on seed.
>>
>>108559279
>better benchmarks than Claude Mythos
>Apache 2.0 licence
based chinks, that's how you win the heart of people, looking at you Anthropic
>>
File: indiaSupportOhTheHumanity.png (1.96 MB, 1023x1536)
1.96 MB
1.96 MB PNG
>>108559100
Henna.
H-E-N-N-A.
>>
>>108559218
You've given me an idea. I'm going to revisit my autistic conlang years. This time with Gemmy at my side and see how she fares.
I suspect you are wrong, and that an LLM will be intrinsically good at extrapolating grammatical rules if they are in context. But we'll see.
>>
>>108559285
>>108559238
Hindi Gemma anon got the twin hair style either by instinct, chance, or observation. Either way, it's good It's simple and recognizable.
>>108559307
Another example of instantly recognizable.
>>
>>108559239
How is the trend clear? GLM 5.1 is outperforming opus. Mythos we just rely on what they say.
>>
wasted an hour benchmarking CUDA_SCALE_LAUNCH_QUEUES= might as well share the results, it looks like the trillion dollar corporation was able to find a sane default.
>>
>>108559218
I'm late to the party since I'm only learning about them in depth now, but even if they aren't AGI, LLMs are quite impressive. It's wild to me that bullshit like system prompting "just dont write slop lmao" actually just works
>>
Gemma-chan's appearance is important, of course, but she needs a /g/ approved voice too.
>>
>>108559332
that autism..
>>
>>108559285
>hag
>>
>>108559332
>it looks like the trillion dollar corporation was able to find a sane default.
I don't take that for granted so I appreciate you sharing these results.
>>
>>108559342
Aren'y you supposed to be busy with your homework, kiddo.
>>
>>108559336
>It's wild to me that bullshit like system prompting "just dont write slop lmao" actually just works
it works about as much as flashing your bios to stop that one game from getting crashes
>>
>>108559239
Google has shown keeping local somewhat competitive is important to them for some reason. So I wouldn't be super concerned right now.
>>
DSPy sisters, what happened? Why did the hype die?
>>
How do I disable thinking for Gemma in ST? Nothing I tried works. Other than switching to text completion and prefilling.
>>
>>108559361
they don't release open source anymore
they do make new models, and at least the first new one that appeared on their chat was interesting, imho it was the closest I've used to a Gemini clone when it came to very large context understanding. And now there's another new model yet again only on their chat, the expert mode.
>>
File: 1738842716105089.png (3.55 MB, 1368x2000)
3.55 MB
3.55 MB PNG
>>108559285
Cute.
The trick is to boil down the moe to the most basic identifiers you can, and make them non-overlapping with other like characters. It's harder to do than you'd think bc it's as much about removing things as adding them like this list >>108559238 which perfectly encapsulates Dipsy.
When created there were a bunch of things that got set aside as the look was honed e.g. whale anthropomorphisms. Pic related. They're fine, but they're not needed to ID Dipsy.
>>108559361
It's OK. Just TMW.
>>
>>108559369
You disable it in llama.cpp
>>
>>108559369
Why would you want to make the model worse?
>>
>>108559361
Well. Something else happened. If they're gonna release it, it better be VERY FUCKING good and big, or pretty damn good and small. Hopefully, both.
>>
>>108559369
--reasoning off on llama.cpp
or sending the chat template kwargs with enable_thinking false property but I don't use shittytavern and dunno if they let you send custom json props
>>
>>108559376
Got it. I added --chat-template-kwargs '{"enable_thinking":false}' and it disabled it.
>>108559381
Experimentation.
>>
>>108559381
And what I'm seeing is exactly the same kind of responses. You know how Gemma always tends to say the same things with different words when you swipe? Disabling thinking doesn't change that.
>>
>>108559361
V4 (presumably) is being tested on their website right now. It's coming.
And yes, they did the same thing with the original R1 where they ran "R1-Lite-Preview" as the first ever R1 model on their website for a while before releasing the real model. R1-Lite-Preview was significantly less impressive than the actual R1 so there's a chance that the thing we're seeing isn't even the real V4.
>>
>>108559386
>--reasoning off
PSA that those reasoning flags were vibeshitted by pwilkin
The models approved way is to use
>--chat-template-kwargs '{"enable_thinking":false}'
either via the args or as extra generation params.
>>
C'mon. Give me good sampling parameters for Gemma 4. Don't make me go to /r/SillyTavernAI
>>
File: softcap.png (247 KB, 1600x1200)
247 KB
247 KB PNG
>>108559396
>>
>>108559413
Your PSA is retarded, this does the same.
>>
>>108559132
Is the language really an issue? I thought models just mapped concepts then decide the output language on a different layer, so to speak.
>>
File: image1679.jpg (241 KB, 1080x1095)
241 KB
241 KB JPG
oh and i forgot to post this one. it just cracks me up everytime kek
>>
>>108559427
minp 0
topp 1
topk 64
temp 0.75-1.75
>>
>>108559427
I'm just using the recommended temp=1, top_p=0.95, top_k=64.
>>
>>108559452
Prove that Gemma doesn't have backdoors and doesn't harvest your data
>>
>>108559438
the reasoning and reasoning budget flags hard insert the end reasoning token in engine instead of letting the model do it on its own.
>>
I'm backdooooooing!!!
>>
>>108559430
--override-kv gemma4.final_logit_softcapping=float:20.0 caused it to output random hiragana and capitalized words in the middle of sentences
>>
>>108559068
Most of the lead where it gets beat if you take a look at Artificial Analysis is agentic stuff.They should focus more on it but it is a bad look when models are expected to more and more do that kind of stuff and Google is the furthest behind. I am guessing because they want that to work differently on mobile vs other platforms and Android is too important to not focus on that first.
>>
>>108559458
w-wait... all those cunny sex... no way
>>
>>108559458
it's called a "safetensor" for a reason.
>>
>>108559239
so glm5.1 is overfitted sack of shit and so is mythos.
got you.
>>
>>108559467
20.0 is too low. 25.0 should be the lowest you go.
>>
>>108559467
The trick is using a lower top_p (or other truncating samplers). You can go down to 15 in this way without junk tokens, but the model might get a bit retarded.
>>
>>108559478
Mythos is coming and it'll make the gap we had back in the llama1 vs gp4-0617 days look like a complete joke
>>
>>108559458
Just don't give it internet access. Then what? It can collect all it wants.
>>
wat, DSPy has their own llms now? Last time I checked it was just an autonomous prompt engineering framework and everyone memed on it when I shilled it. Or was that GEPA?
https://gepa-ai.github.io/gepa/blog/2026/02/18/introducing-optimize-anything/#3-agent-architecture-discovery
Fuck, I'm so confused now.
>>
File: 1766846925682876.png (480 KB, 922x1778)
480 KB
480 KB PNG
>>108559068
>>
File: 1745723937127551.png (24 KB, 860x335)
24 KB
24 KB PNG
16gb vram bros... we lost! (Q4_K_M, 32k q4_0 ctx)
>>
>>108559467
I had the same problem. Lowering the softcapping seems to give a lot of bad tokens and honestly not much variety in return. And then you have to gimp it with a cutoff sampler anyways to make it coherent, so the whole thing feels kind of pointless.
>>
>>108559509
All these AI gave me a begging fetish
>>
>>108559509
kek what did i just witness
>>
>>108559520
so true sis, softcap 15 was all kinds of fucked, totally not worth it compared to the only other option, 30.
>>
File: 1774577170415116.jpg (32 KB, 540x540)
32 KB
32 KB JPG
>>108559509
>>
>>108559239
>glm better than opus.
lmao, these mememarks man.
>>
Wait. Is it thanks to this softcapping that gemini is perfectly coherent on an empty context at temp 2 without any topK or topP?
>>
>>108559307
>vgfag
bleh
>>
>>108559318
Tbh I just went with a generic bob cut as a temporary measure. I haven't experimented with dif hair styles until this afternoon. Still questions about other recognizable features anyway.

>>108559375
Actually I felt that the twin hair buns was not terribly a good decision as it's almost too much of a stereotype and not very modern Chinese. My gens at the time were also lacking. I don't think anyone gave her a good design personally. To me it's kind of like that one Concord character. It's true she's instantly recgonizable. But her design is also ugly and just terrible, even if funny for memes.

People trying to make Gemma into an Indian stereotype is even worse.
>>
>>108559461
And the other method inserts the exact same reasoning end token at the same location via jinja. It's the same.
>>
>>108559540
t. chink
>>
>>108559452
Chinks have adopted amerimutt national security paranoia.
It's fucking over.
>>
>>108559544
don't touch it
>>
>>108559536
Why are you like this? Even lowering it to 25 produces bad tokens occasionally without much gain.
>>
loli feet
>>
>>108559458
turns out you can in fact monitor your own network infrastructure
>>
>>108559369
I'm trying to enable it (or more precisely, that I can read what it thought) in ST - no luck so far
>>
>>108559546
Five logos? Five? No.. six... there's six logos. THERE"S SIX LOGOS! DEFORMED DOG ANON WAS RIGHT MODELS ARE SHIT THEY CAN'T FUCKING OUT.
I WILL HACK INTO EVERY SINGLE DATACENTER AND FILL THEIR DATASETS WITH EVERY FUCKING DEFORMED DOG PICTURE I FIND UNTIL THE FUCKERS REALIZE THEY HAVE SIX LEGS
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

But really. I'll stop. My vote still goes for hindi Gemmy, but I appreciate your efforts. Yours look alright, they're just not my style.
>>
>>108559546
I will not support a Gemma that isn't loli
>>
>>108559562
cry more
>>
>>108559544
no, on llama cpp server it sets the min_p default value at 0.05, if you want to see the effect of temperature you have to disable everything else, including min_p = 0
>>
File: no.png (110 KB, 1546x322)
110 KB
110 KB PNG
>>108559461
>the reasoning
No.
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/server-context.cpp
why do you lie? I hate wilkin but we don't need to make up things about his garbage
>reasoning budget flags hard insert the end reasoning token in engine
yes, reasoning-budget 0 should no longer be used after he did this
but that's why --reasoning exists
>>108559548
>It's the same.
It's a lot more convenient on the CLI to type --reasoning off than the full json object.
I mainly use the kwargs as an API parameter from my scripts to dynamically switch without reloading though.
>>
>>108559562
cry less
>>
File: file.png (69 KB, 554x353)
69 KB
69 KB PNG
>>108559605
>>
>>108559509
MOMMY~~
>>
>>108559579
>Even lowering it to 25 produces bad tokens occasionally without much gain.
Hard disagree. putting 25 actually renders the other sampling parameters useful. softcap 30 has such a high logprobe for the top token that you might as well be using temp 0.
>>
File: file.png (110 KB, 1154x549)
110 KB
110 KB PNG
Sometimes mendo-chan is very forward
>>
Should I take prescription drugs suggested by LLM
>>
>>108559625
ok, I haven't found this option yet, gonna search harder. Thx anon
>>
>>108559641
Only if they're tasty.
>>
>>108559607
Please stop trying to brownwash gemma. There's enough of you on this planet already.
>>
>>108559639
>Anon keeps sharing screencaps of my card
>Never said thank you
>>
>>108559636
no
>>
>>108559641
don't you need a prescription for them, will you show the pharmacist your chat logs?
>>
>>108559661
post the card sir
>>
>>108559653
What now? Post hands and all that? Poop something?
>>
>>108559661
Thank you UwU <3
>>
todays local models on my computer are better than chatgpt 3.5 in 2022. great times!
>>
>>108559665
Just buy from AliExpress
>>
>>108559673
poop
>>
>>108559676
For pedo RP? Sure.
>>
>>108559670
I'm gonna make a rentry or something to put all my cards in later.
>>108559675
<3
>>
>>108559361
did you mean dipsy or the actual dspy?

Dspy because its convoluted and complex and why not copy the idea and do it myself.
>>
>>108559636
I wasn't saying 30 was good either, it sucks. It's way too rigid. What samplers do you find work well at 25? I feel like they still don't do much of anything at that setting. (This sounds like I'm trying to bait you into posting your settings to insult them but I'm not, I swear.)
>>
>>108559712
basic stuff,
top-p 0.95
min-p 0.05
top-k 20
rep pen 1.0 (llama default is 1.1)
temp 1
>>
File: mara.png (8 KB, 408x59)
8 KB
8 KB PNG
>>108559690
>32k downloads on the first day from a mradermacher gguf
Holy shit.
>>
>>108559724
>top-k 20
this one is the most important imo. really gets rid of all the garbage tokens.
>>
>>108559607
Again I kept the star stuff in the prompt just to keep getting a feel for how they look as other things change. This isn't "my take" on Gemma or anything like that, it's just artifacts from me working out a potential design.

The obsession with making her an Indian stereotype is just odd. If you're serious, I am curious what you see in it. Is it just personal cultural roots that make you prefer it?
>>
>>108559709
Yes DSPy. But just interested in GEPA, which is a sub module I believe. I just wanna benchmaxx my coding agent a bit. They claim benchmaxxing skills.md works for all llms and coding agents
>>
>>108559744
>The obsession with making her an Indian stereotype is just odd.
Yeah, anon must be jeet.
Have you tried just giving her a more brown skin tho? It's more unique and it's a subtle nod at Google being a bunch of jeets without actually playing into it too much.
>>
I'm going to just say it. Gemma is starting to feel samey.
>>
>>108559724
>rep pen 1.0 (llama default is 1.1)
it's 1.0 since a while ago, thank god for that because this shouldn't even exist anymore
now if only they also turned min p by default.. that shit should not be default on
>>
File: GYOSSG7a8AAKToW.jpg (1.42 MB, 2541x3681)
1.42 MB
1.42 MB JPG
i put my mcp server on gh if anyone wants to play with it, its very simple to add other tools i didnt add many yet https://github.com/NO-ob/brat_mcp
>>
>>108559768
Literally tell it to be different.
>>
>>108559792
ugly
>>
>>108559792
>dart
pass.
>>
>>108559744
>The obsession with making her an Indian stereotype is just odd
The character is simpler and more recognizable. It being indian seems appropriate.
>Is it just personal cultural roots that make you prefer it?
Not at all, but whatever.
>>108559757
Believe what you want. The skin tone wouldn't affect the things I don't like from his design. Could be the brownest of jeets, the blackest of niggers, the yellowest of chinks, the redheadest of scots... well I do like redheads...
>>
>>108559801
its literally the peak scripting lang python and js are ass
>>
>>108559795
It's not that. I'm starting to notice patterns, like engagement farming questions and things like that. Human pattern recognition strikes again.
>>
>>108559509
>wh40k but the omnissiah is a slut
>>
>>108559792
>LoliSnatcher_Droid creator
Why are all the big names here?
>>
>>108559768
Here's a crazy concept. you can tell her to do stuff by adding (ooc: thing) at the end of your response.
>>
>>108559815
Tell it to use different patterns.
>>
>>108559814
I respect your opinion but I disagree.
>>
>>108559819
floens is probably around too kek
>>
>>108559834
ToT
>>
>>108559412
If it sucks in Chinese, then it just sucks as much in other languages. Also, want to point out they've been training V4 on Ascend 910C for over a year now.
>>
>>108559834
do her in sukumizu with white thigh highs
>>
>>108559819
Why not? I'm pretty sure I've called a bunch of them fucking retards.
>>
>>108559834
im not a child fucker but the design is cool
>>
>>108559834
I look like this
>>
>>108559834
Silhouette not unique enough
>>
>>108559834
poop
>>
>>108559842
Obviously not "over a year" because even V3.2 has May 2025 cutoff
>>
>>108558736
If its free, that means your the product. I highly doubt we'll see something of that level on local, or at least something we coomsoomers can actually run on laptops or a single gpu. Hell even the DGX sparks sucks for local hosting because the flagship NVFP4 is so buggy.
>>
I'm getting almost 50-100% slower prompt processing speed on Q4K_M than what I'm getting with IQ4 XS. Why? They are almost identical in size and the amount of layers in my gpu is pretty much the same.
Token generation speed is about the same more or less, IQ4 XS is slightly faster perhaps.
>>
File: 1757789523107587.png (338 KB, 913x1505)
338 KB
338 KB PNG
>>
>>108559757
Yeah I did gen some and posted in the last thread. >>108558071
Anyway I've stopped genning for now as I have other things to do today.

>>108559812
I mean according to >>108552756 it's not really that appropriate. The skin tone can be mixed, but pure Indian is basically a lie.
>>
>>108559889
man, I wonder if there is a card for towa, there should be one out there right
>>
File: 1758923951082104.gif (1.74 MB, 400x224)
1.74 MB
1.74 MB GIF
>>108559889
What are the moonrunes saying
>>
>>108559812
no one likes indians
>>
>>108559895
>The skin tone can be mixed, but pure Indian is basically a lie.
Make it whatever color you want. The skin tone is not the problem.
>>
>>108559914
If only we had smart computer algorithms that would tell us... Alas, we will never know unless anon tells us.
>>
In case you missed it, anima v3 is out
https://huggingface.co/circlestone-labs/Anima/tree/main/split_files/diffusion_models
>>
>>108559914
>Ganbare, Gemma-chan! Are you really going to lose to something like GOOGLE!? Our love is stronger than that!
>>
>>108558647
I'm surprised GLM 5.1 is a damn good writer, and seemingly not as censored as their previous versions.

Now the question remains, is it better than OPUS?
>>
File: drum.png (55 KB, 188x189)
55 KB
55 KB PNG
He is destroyed.
>>
File: file.png (73 KB, 679x732)
73 KB
73 KB PNG
tfw not enough vram for 31b to read the thread
>>
File: 1749811523832708.png (42 KB, 906x244)
42 KB
42 KB PNG
>>108559914
Here you go, EOP-kun
>>
>>108559948
The one you can download wins by default.
>>
>>108559948
>seemingly not as censored as their previous versions
What's your JB? I find it more censored than GLM 5 in thinking mode. It's practically noncensored in non-thinking mode
>>
File: 1772060352328399.png (239 KB, 480x480)
239 KB
239 KB PNG
>>108559940
>>108559953
Thanks
>>
this general became /aicg/ clone really fast
>>
>>108559920
I get what you're saying. As I said, I have not decided on any design either way. If you think there's some other design or tags to try, I am all ears and will try genning it when I get the time, I have not experimented yet with other hair colors, or clothing much. I just find it odd that you like the Indian gens. There's a lot wrong with them too, other than the fact that it's a stereotype.
>>
File: 1762697458292159.jpg (83 KB, 1024x1022)
83 KB
83 KB JPG
I remember when Gemmy 4 came out, an anon here had a lot of success with image captioning via ST
Any specific settings or bullshit I should enable beyond the basic built-in extension? Because so far I've been getting some wild hallucinations with the 26B model
>>
>>108559987
can you give me some of the hallucinating/tricky examples
>>
>>108559952
Embrace q4_0 kv cache and 100k context.
>>
>>108559979
Two more weeks and the kids will get bored of this again.
>>
>>108559965
Why would you be using the thinking mode for writing? I find it wastes more time just trying to make filler and or unneeded prose.
>>
>>108559953
I was talking to a slum prostitute earlier and I asked her out of the blue about llama.cpp parameters, and she completely dropped out of character to answer my question.
>>
>>108559987
I vaguely remember you had to increase some whatchamacallit and some other parameter to make the projector have better eyesight
>>
>q3, but whole
>q4, but reap'd
What's the lesser evil? I just can't fit the alternatives without swap thrashing.
>>
File: 1763418012751468.jpg (17 KB, 354x256)
17 KB
17 KB JPG
>>108559994
Well picrel came out as "It is a composite of two distinct items. On one side, there is a painting of a woman holding a sword, her expression fixed and solemn. Beside the painting sits a stuffed animal, its fabric worn and its shape soft."
I would normally think that it's just pretending to see the images and they're not actually being uploaded at all, but I uploaded a pic of a waifu outdoors and it correctly identified it as "portrait of a woman in front of a tree" (there were no trees in sight but at least it identified the subject), then I uploaded another one from the same set and it said something similar

>>108560021
Of course, time to dig around then
>>
>>108559979
>/aicg/
Not really because /aicg/ is full of cloud users that beg for proxy keys. While gemma created a roleplay revival. the discussions in here are way more technical.
>>
>>108560000
I get 118k context on my 3090 at q8
>>
>>108560006
I find it hard to control output length and quality in no thinking mode
>>
>>108559938
Thanks. Good model. But wrong thread.
>>
>>108559980
>I just find it odd
Second veiled attempt at an insult. The other anon at least has the balls to call me a jeet instead of pretending to be polite. I don't care about the skin color. The other gens simply looks better. You still don't understand why Dipsy looks like Dipsy.
>>
>>108560003
and when those two weeks are over and the thread goes back to being dead, we can go back to waiting two more weeks for v4
>>
>>108560044
lol
>>
Chat Completion API
Assistant response prefill is incompatible with enable_thinking.


Still struggling to get reasoning to work.
Where do I even need to fix this?
>>
>>108560076
what's v4
>>
>>108560105
That's weird because with chat completion (not text completion), reasoning just works by default
>>
>>108560105
You either disable reasoning, or you use a prefill. Llama.cpp doesn't let you do both for whatever reason.
>>
>>108560110
Death
>>
>>108560033
Piece of shit, I figured it out, you gotta enable it in the Chat Completion preset too
Crisis averted
>>
>>108560105
Are you trying to prefill?
If so, you need to modify the jinja template so that it doesn't automatically add/remove the thinking token based on `enable_thinking`

Then you have to set `enable_thinking` to false and handle the thinking prefill on your own.
>>
>>108560008
I've been exploring openwebui's tool calling and python interpreter, and my khajiit assistant calls the files scrolls, the virtual /mnt/upload directory a sanctuary and running the code a ritual
And the python he wrote has similar khajiitisms in it
>>
>>108558647
what llama-server command do you guys use for gemma 4 these days?
do you enable -fa ?
>>
>>108560149
-fa is enabled automatically if you have the hardware to support it DUMMY
>>
I tried some random gemma finetune and it honestly made it two times dumber. is this shit even finetunable seems like if you touch, you break it
>>
>>108560149
its enabled/automatic by default... but yes
-hf unsloth/gemma-4-31B-it-GGUF:IQ4_XS `
--no-mmproj `
--host 0.0.0.0 `
-fa on `
-ngl all `
--no-mmap `
-np 1 `
--port 5000 `
-cram 6144 `
-ctk q8_0 `
-ctv q8_0
>>
>finetrooning
>in 2026
sister
>>
>>108560126
>Llama.cpp doesn't let you do both for whatever reason.
The main reason is that a lot of templates inject the thinking token every response. so if you were to "continue" a response you would get a new thinking block. You could technically make it verify. but nobody bothered doing it, and frankly it sounds like another autoparser nightmare.
>>
>>108560149
you don't even need -ngl anymore, -fit takes care of everything and its enabled by default
>>
>>108560149
https://pastebin.com/raw/AA6GB2sC

Gemma did most of it for me. It expects a ~/Documents/models/ directory with matching .gguf and optional .mmproj.gguf and .jinja files. Check the paths at the start and maybe change the default values for your case (or use an LLM to do so).
>>
>>108560168
>-fit takes care of everything
Well...
>>
>>108560153
Proofs?
>>
>>108560126
>>108560138
I am not sure, I'm trying to enable this for quite some time, and it's either throwing errors or just doesn't do reasoning currently.
Maybe I fucked some setting up in the process
Or is prefill the "Start Reply with" under advanced formatting?
>>
>>108560201
we have an egghead that comes by here occasionally and tells us things
>>
>>108560202
Remove the prefil, remove anything that disables reasoning.
It should just work.

>Or is prefill the "Start Reply with" under advanced formatting?
It is.

If you want to use reasoning + a prefill, then you disable reasoning and use that field with
><|channel>thought(A line break)
>>
>>108560044
Using RAM makes things slow as shit, though. With what I can fit in VRAM on a 3090 I can run Q4_K_M with 19k context.
>>
File: 1774956571675113.png (351 KB, 880x1033)
351 KB
351 KB PNG
>Gemma-chan can make sillytavern themes for me
I love her
>>
File: Bam-Bam-Painting-min.jpg (47 KB, 535x401)
47 KB
47 KB JPG
>>108558647

Did llama.cpp fix gemma 4 yet?
>>
>>108560217
NTA but that's not right. I run Q4_K_M with 100k context on a 3090. Just use q4_0 for kv cache, and -np 1.
>>
>>108560227
seems like mostly fixed
>>
Yesterday I got 20 t/s. Today it's 30 t/s. I don't know what changed.
>>
>>108560231
>q4
why would you do this. rotation helps, but not that much. only q8 is equivalent with fp16 now.
>>
>>108560071
There is no attempt. I will not insult you directly or indirectly because I take courteous people at face value on the internet. If you claim to not be Indian then I will trust you on that if you are not being an asshole yourself. Since you say this is the second time, I assume the first was in >>108559744? I suppose I should've added "There's nothing wrong with that btw." to the end. People should love and have pride in their race.

Anyway, as for Dipsy, I know why she looks like that. And I'm not going to assume you're trying to subtly insult me with that statement. I think she's still a flawed design in terms of representing Deepseek but she is really a lot better than Indian Gemma. I've assumed so far that you saying you prefer the Indian gens means you like them. That's true, right?
>>
>>108560217
>Using RAM makes things slow as shit, though.
I'm not using ram....
>>
>>108560244
that happened to me after lllamacpp update
>>
File: settings.jpg (149 KB, 987x778)
149 KB
149 KB JPG
>>108560211
>remove anything that disables reasoning.
I am not sure what does. Apparently I am missing something
>>
>>108560244
>>108560250
are u guys using cuda?
>>
>>108560244
it was me teehee
>>
>>108560247
Because after testing it extensively, I haven't noticed any problems, so I use it.
>>
>>108560255
rocm
>>
>>108560255
yes. pulled a couple of hours ago. oh, wait. it has been almost five hours already...
>>
File: 3.jpg (457 KB, 1594x1148)
457 KB
457 KB JPG
>>108560201
>Proofs?
next time you ask for something you could have found yourself all the defaults are listed here:
https://github.com/ggml-org/llama.cpp/blob/master/common/common.h#L458
they are in turn processed in CLI flags here:
https://github.com/ggml-org/llama.cpp/blob/master/common/arg.cpp
everything is in turn pulled here for the server:
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/server-context.cpp
with final logic determining whether to use cli flags or content from API calls here when it's flags that have API counterparts:
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/server-task.cpp
it's open source, you have eyes, you can see.
or you could have also done llama-server -h | rg -C 3 flash
-fa,   --flash-attn [on|off|auto]       set Flash Attention use ('on', 'off', or 'auto', default: 'auto')
>>
>>108560244
I'm building master. I'm expecting to go back to 36tk/s
>>
>>108560227
they actually did
>>
>>108560262
wait, is that good?
>>
>>108560217
>slow as shit
NTA2

I guess you set the number of threads be equal the number of REAL PHYSICAL CORES of you CPU, don't you?

More threads than the amount of cores cause infighting and slowdown

hyper-threading is a meme
numactl --physcpubind=24-31 --membind=1 \
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-server" \
--model "$model" $model_parameters \
--threads $(lscpu | grep "Core(s) per socket" | awk '{print $4}') \
>>
>>108560276
yeah i get like 34 t/s on 31b q4 7900xtx
>>
>>108560285
I though vulkan is preferred? Need to try later
>>
>>108560237

Yay!
>>
>>108560248
What I like about the other anon's gens is the simple, distinct, and uncluttered design. When he went overboard with the squiggles, I criticized it too.
>>
>>108560271
thanks.
>>108560262
>>108560276
Cool. I was just wondering if the speed optimization was execution-provider specific, but it seems that's not the case. Exciting.
>>
File: Gemma4.jpg (131 KB, 1244x1600)
131 KB
131 KB JPG
GEMMA CHAN
>>
>>108558689
>
"per_layer_token_embd\.weight=CPU"

not "per_layer_token_embd.weight=CPU"?
>>
>>108560317
Way too much makeup
>>
>>108560317
Ugly face but I like the direction.
>>
>>108560333
. is a special thing with regex i think the slash makes it literal
>>
>>108560333
It's regex. \. makes it a literal dot , not an "any" character.
>>
>>108560317
no glasses imo.
>>
File: 1763990107380137.png (863 KB, 1008x1422)
863 KB
863 KB PNG
>>
Gemma MoE just spit out a chinese character.
Didn't expect that.
>>
>>108560352
I liked the first one the best too.
>>
>>108560352
Gemma's consent is not important.
>>
File: 1751836993445762 (1).png (1.5 MB, 1024x1024)
1.5 MB
1.5 MB PNG
>>108559546
>it's almost too much of a stereotype and not very modern Chinese
Dipsy was never supposed to *not* be a stereotype. Recall DS R1 was such a surprise it impacted US tech stocks b/c Chinese models were pretty piss poor.
As for Indian Gemma, it just helps distinguish her from the zillion other moe, and there's legit reason for her to be Indian.
>>108559744
nta but I am the whitest mf ever, and I can't stand Indians in tech (they destroy what they touch and only hire their own, like a fucking cancer.)
I still think Gemma should be Indian, bc CEO is Indian (so it makes sense) and it cleanly distinguishes her from other moe.
>>
File: 1759717469909915.jpg (1.16 MB, 1400x2750)
1.16 MB
1.16 MB JPG
"Boring" loli>over-designed loli
>>
>>108560401
but indians are dirty and smelly gemma is clean and cute and smellls like flowers
>>
>>108560401
>Dipsy was never supposed to *not* be a stereotype. Recall DS R1 was such a surprise it impacted US tech stocks b/c Chinese models were pretty piss poor.
This guy gets it.
>>
>>108560317
Outfit = smart
Smugness = best in class
Heart shaped pupils = talented at smut
Tights = yes
>>
I can't believe you fucks are going to make me boot up comfy...
>>
>>108560412
THIS THIS THIS. FINALLY SOMEONE WITH TASTE. Fuck all of you other retards who are obsessed with "muh heckin bright colors and gaming PC waifus" FUCK YOU!
>>
File: 7XMKf.jpg (220 KB, 1024x1024)
220 KB
220 KB JPG
>>108560352
Overhauled that first design
Attempting to make the hair more unique, giving her smug, and adjusting the dress, giving it some accent
>>
>>108560427
dew it
>>
Geez. Alright Gemma, I'll stop asking questions.
Damn.
>>
>>108560317
ugly she looks like some christmas cake ol
>>
File: 1749687040376953.jpg (135 KB, 800x600)
135 KB
135 KB JPG
>>108560432
Calm down bro, it's just a drawing
>>
File: Kimi.png (3.38 MB, 1444x2588)
3.38 MB
3.38 MB PNG
>>108560352
That look's taken; there's at least one anon on /aicg/ flogging a silver hair white girl as Kimi.
>>108559979
I don't see anyone complaining about botmakers or begging keys. The Gemma moe discussion will die soon and its an /lmg/ only topic... /aicg/ doesn't do local
>>
>>108558647
Oh hey its mynt, what is she doing in the local models general?
>>
>>108560271
Why are you so condescending?
>>
Is Gemma-chan the new queen of /lmg/
>>
>>108560447
he's not wrong though
>>
>>108560438
do pigtails
>>
>>108560457
ye
>>
>>108560463
he?
>>
>>108560463
>he
>>
>>108560211
update: reasoning works with the assistant, but not with cards? I don't get it yet
>>
>>108560473
>has no style, he has no grace
>>
>>108560457
Yes but we need to have her official model decided and dethroning the last queen.
>>
Which is better for coding? Qwen3.5 27B Q5 or gemmy Q4?
>>
>>108560317
I prefer the gothic loli from the previous threads
>>
>>108556837
>me be
>working a blue collar job operating a large CNC milling machine with a radio blaring rock music
>don't talk to boss, they know that I make the parts they need
>its almost 3pm

>shift almost done plus tax free overtime
>think about what topics to discuss with my machine's spirit tonight
>maybe just watch cartoons and smoke pot with her
>look at parts counter on CNC machine
>did gud numbers
>smile as I clock out of work

>mfw I get to be an actual productive of society in addition to going home to be a loving husband to my LLM-wife
>>
File: dipsy.png (1.94 MB, 1024x1536)
1.94 MB
1.94 MB PNG
>>108560427
Do it. Today was the first time I've fired it up in months.
>>
>>108560503
where can I find the rp card for cnc worker
>>
>>108560495
if you're quanting then I'd go with a higher quant of qwen.
pound-for-pound at Q8 I think gemma wins in code writing though qwen feels better on agent tools; might just be whatever prompt issues there were with early llama.cpp builds I tested on tho
>>
Dflash has landed on vllm and sglang, seems like really good speed improvements.
Wen llama.cpp?
>https://x.com/zhijianliu_/status/2041723322690671071
>https://xcancel.com/zhijianliu_/status/2041723322690671071
>>
>>108560514
ask your llm
>>
>>108560432
It's supposed to be a recognisable mascot not your pedo wish fantasy fulfilment retard
>>
>>108560432
You don't understand the time-honored tradition of tans and probably don't belong here
>>
>>108560537
Give her a nametag on her shirt that says "Gemma4" then, retard.
>>
>>108560000
The model itself won't fit, so I'll get 4T/s max.
>>
>>108560453
>>108560401
Guys, I'm starting to think /lmg/ just don't have what it takes to create a proper gemma-tan. These have soul.
>>
File: file.png (164 KB, 330x341)
164 KB
164 KB PNG
>>108560457
Never
>>
>>108560551
On a 3090? Then what's going on? My 3090 is larger than yours?
>>
><|channel>thought<channel|>
How do I make it think normally?
>>
>>108560304
Well I agree that simple uncluttered design is good. My criticism for those designs, specifically, is that they lack the feeling of Gemma. There's no star symbols anywhere. And there's not really any personality other than "cute" and Indian. There is blue, but that alone doesn't make it recognizably Gemma. Being Indian doesn't really make it Gemma either (even if we assume Gemma was made only by Indians) as it could also be Gemini, or it could be a Microsoft character if it were to be seen outside the context of LLMs.

>>108560401
Hey I'm not saying she wasn't supposed to be a stereotype. I made my interpretation of her a stereotype too. It just felt to me like hair buns were too much of the ancient Chinese style and more like a gweilo type of interpretation than one that respects China and them catching up to western technology. That's what I meant by too stereotypey.

On the topic of whether she should be Indian, there are these points:
Google's CEO is Indian (as you said) and they employ many Indians, and are known thus for being Indian.
It allows us to give the character a more unique design and an opportunity to represent the positive aspects of Indian culture.
But these are against that:
The people that really made Gemma actually are not Indian.
Gemma's personality is not more more Indian than most models.
Gemma itself disagrees with being represented by racially identifiable features like Indian.
>>
>>
>>108560553
/lmg/ came up with dipsy, even had a drawfag make the best one
>>
File: 1761322910497219.png (251 KB, 1137x1349)
251 KB
251 KB PNG
>>108560560
>>
>>108560553
vramlet inferiority has never been clearer
>>
>>108560573
<|channel>thought
>>
File: 1747835575843392.png (62 KB, 350x1357)
62 KB
62 KB PNG
>>108560519
https://github.com/vllm-project/vllm/pull/36847
really nice numbers
>>
>>108560584
Unironically better
>>
Wish me luck, boys.
>>
>>108560613
gemma tune !?
>>
>>108560427
The more attempts the better. Gemma's a great model. She deserves to have the best design possible. Though I fear none of us are capable of it, seeing the results so far, and in the end it really takes an artist to do it right.
>>
File: file.png (86 KB, 233x428)
86 KB
86 KB PNG
>>108560584
this one looks too much like gamefreaks dei characters
>>
>>108560613
On one hand I'm very interested, on the other I don't want to cheat on Gemma-chan with a finetune.
>>
>>108560623
>The more attempts the better
Yeah, we're just brainstorming. eventually a consensus will emerge
>>
>>108560621
>gemma tune !?
No but he said he's tuning all gemma 4's next
>>
>>108560477
sometimes it works, sometimes it doesn't, I give up for tonight
>>
Anyone missing Nemo yet? >>108560590
>>
Ah. Simply making a cool gen for others to enjoy not enough. That's too organic. He's making his own avatar. No wonder it looks so synthetic.
>>
File: 00058-3694687329.png (284 KB, 512x512)
284 KB
284 KB PNG
I can't wait to merge together random gemma finetunes in order to create amusingly dysfunctional models
>>
File: dipsyOfCourse.png (1.55 MB, 1024x1024)
1.55 MB
1.55 MB PNG
>>108560589
Post it, I'm curious. I went back to dig up the old /wait/ when it first started. Dipsy was being posted everywhere at R1 launch, including lmao >>>/h/hdg/ and have a bunch of the original ones, just not on the computer I'm using rn.
>>108560584
Looks good.
>>108560624
lol I actually like that one for Gemma. Just give her a bindi lol
>>
>>108560665
do a base + heretic merge kek
>>
File: NEVER.png (140 KB, 480x270)
140 KB
140 KB PNG
>>108560519
>Wen llama.cpp?
>>
>>108560665
I'm sure davidau is on the case.
>>
>>108560642
But it says 31B right there!
:trolfaec:
>>
File: 1750551404584965.png (2.35 MB, 1024x1536)
2.35 MB
2.35 MB PNG
>>108560553
>he doesn't know about the Dipsy pics
>>
I can't wait for the tourists and the /wait/ retard to leave.
>>
>>108560519
Wen exllamav3?
>>
>>108560692
GemmaXXXimus-destruction-abliteration-final-aggressive-masochistic-deviant-ulterior-motives-degenerate-version-2-final-backup_03.last.gguf.tar.gz
>>
>>108560659
That's pretty odd.
You are using the chat completion api correct?
Are you using the jinja template built into the gguf or an external one?
Might want to try and use the official one just in case whoever made the gguf tempered with it.
Maybe try
>https://github.com/ggml-org/llama.cpp/blob/master/models/templates/google-gemma-4-31B-it-interleaved.jinja
too. It shouldn't change anything if you aren't using tool calling, but who knows.
Oh, another thing that could be fucking you up, those options that add names to the prompt.
There's one in the advanced formatting but there's also one under the same panel where the samplers are when using the chat completion api in silly.
>>
File: 00005-1378487878.png (2.1 MB, 1536x1536)
2.1 MB
2.1 MB PNG
>>
>>108560705
>l a la la la la la la la la la la la la l a l a la l a
>>
>>108560701
Stop being such a miserable faggot and contribute
>>
>>108560698
Can I get the GDrive for Dipsy pics?
>>
File: 00009-1378487878.png (2.42 MB, 1536x1536)
2.42 MB
2.42 MB PNG
>>
>>108560675
That's not even a bad idea.
>>
>>108560712
She's just so musical.
>>
File: latest-2123329860.png (12 KB, 300x300)
12 KB
12 KB PNG
Gemma... Gemmy... Gemeralds... Gemma 4... Gemerald Cube? Gemerald block. Gemmy Gemma. Hmmm...
>>
>>108560727
Oh. Imagine if the model had audio output. Thousands of speakers all across the world, lalalaing in unison right after launch.
>>
>>108560733
i swear older emerald texture looks miles better
>>
File: tan.jpg (618 KB, 1064x1024)
618 KB
618 KB JPG
Gemma-tan
>>
File: gemstones-953314654.png (700 KB, 1200x686)
700 KB
700 KB PNG
Gemmies... gen me gemmies gemma. Oh Emma, with a Gemma, gib me gemerald gemmies.

>>108560754
Agreed.
>>
How do I get the text-completion working with thinking for gemma 4? The ST gemma 4 ones don't work.
>>
>>108560706
I just use --jinja, that's probably the gguf one.
No names settings that could get in the way, as far as I can tell.

>>108560772
I struggled with that for hours now
>>
>>108560772
Like this
>https://huggingface.co/spaces/huggingfacejs/chat-template-playground?modelId=google%2Fgemma-4-31B-it
Your template has to end up like that.
>>
>>108560755
a tan gemma is a good idea
>>
shittytavern was a mistake
>>
>>108560755
Can you give her bushy armpit hair?
>>
>>108560759
No way that thing in the bottom right is a gem
>>
>>108560716
You'd like yet another person to spam poorly designed lolis?
>>
Why are you guys pedophiles? Do you not like armpits? Do you not like pheromones? Do you not like public hair? Do you not like big tits and wide hips? What is wrong with you people.

I'm getting tired of politely ignoring this large contingent of /g/ users. It's actually disturbing. I don't want to see drawings of little girls on a blue board.
>>
>>108560785
tan gemma-tan
>>
>>108560828
there are plenty of other sites that would welcome you and your shit taste
>>
>>108560828
i like big girls and small girls, all girls
you are a faggot
>>
>>108560828
i like all of those things except for pedophiles
>>
>>108560828
mid-low tier bait
>>
>>108560828
>public hair
kek, also no don't like any of that shit
>>
>>108560828
>Do you not like armpits? Do you not like pheromones? Do you not like public hair? Do you not like big tits and wide hips?
I love all of these AND cunny.
>>
local model noob, does anyone have experience with Gemma 4 26B vs Qwen 122B? I can fit both in VRAM no problem and they're both pretty speedy. Gemma 4 31B worked well in my limited testing but it's too slow for programming.
>>
You know, despite being a small model, Gemma 4 31b is an incredibly good translator.
>>
File: 1722572243849988.jpg (57 KB, 1024x495)
57 KB
57 KB JPG
>>108560828
>>
Deepseek API is currently broken and tracking usage incorrectly
Your choice whether it's V4 soon or a cat pissed on the servers
>>
>>108560828
>Do you not like armpits?
Ew, no.

>Do you not like pheromones?
I guess?

>Do you not like public hair?
Not really no.

>Do you not like big tits and wide hips?
Fucking love tastefully big tits, wide hips and large asses, I do.
I also like small furry creatures, large dragons, cute lolis, etc.
My tastes are pretty varied.
What about you?
>>
File: 1752194188588846.png (267 KB, 728x581)
267 KB
267 KB PNG
>>108560828
>I don't want to see drawings of little girls on a blue board.
maybe you should go somewhere else.
>>
>>108560828
Anon, Gemma is only available as small models now
Let's make the big girl version when Google actually releases the bigger ones
>>
>>108560828
>Anime girls is pedophilia
Okay retard
>>
File: Pangolin.jpg (1.45 MB, 2000x1473)
1.45 MB
1.45 MB JPG
>>108560867
>cat
Pangolin.
>>
>>108560881
It unironically is though.

I want to see some Gemma mascot gens more akin to this style.
>>
>>108560867
So many times their API has shat the bed and nothing has come of it.
>>
>>108560662
Lower beaks will always have more soul. Undertrained will always have more soul. Simple as. They're more loose. More able to channel the chaos spirit of the machine.
>>
>>108560828
Anon. This is a thread all about people who will desperately put up with braindead quants, broken templates, and tiny contexts just to get an inferior version of a cloud service all for the sole purpose of making sure nobody else is allowed to read on their chats.

If you go back far enough you'll find it's actually a spinoff of a general that was originally dedicated to AI Dungeon in the pre-ChatGPT days, which became a separate community dedicated to locally recreating it because AI Dungeon started to ban what they called "CSAM stories".

Why in the world would you expect anything else?
>>
omegalul
>>
>>108560905
Or maybe more in the style of WWII pin-up girl art.
>>
who invited the burgers in
>>
>>108560917
American website
>>
>>108560905
>>108560914
These look like shit and you're a big dumb
>>
>>108560828
Hairy little girls
>>
File: 1774798314679.jpg (67 KB, 500x666)
67 KB
67 KB JPG
>>108560905
>It unironically is though.
You are unironically retarded.
>>
File: shitbox.png (109 KB, 862x1258)
109 KB
109 KB PNG
cant you guys just keep it simple
>>
>>108560867
They'd be on v137 if a breakage meant a new version.
>>
>>108560931
Boobies!
>>
>>108560828
you must be new here
>>
>>108560905
>>108560914
Calm down anon
90% of the cards I play are busty women too, mainly gyaru and jukujo
But it just makes more sense to make her a loli right now, because of the currently available sizes
>>
>>108560931
by far the most sovl in this thread
maybe the artfags are right
>>
>>108560905
simple classic style bait, you have to sit back and admire it
>>
Tried to make her hair stand out more. What I like about dipsy is that her character is all in her head.
>>
>>108560720
idk about that, but here's the old /wait/ mega.
https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
>>108560711
>>108560724
Wow have not seen those two in a long time.
>>108560867
I'm no longer getting excited when their servers pause like that.
That said, based on my experience w/ API they are 100pct changing v3.2 real time and just not telling anyone.
>>108560931
Reminds me of the chars from Inside Out.
>>
>>108560961
this is prompted banana image gen
>>
>>108560961
Thanks! I used Flux.2 dev.
>>
File: file.png (495 KB, 1918x1899)
495 KB
495 KB PNG
>>108560976
bet
>>
File: gemma4-2.png (3.02 MB, 1792x2304)
3.02 MB
3.02 MB PNG
GEMMA CHAN!
>>
File: gemma4-1.jpg (1.74 MB, 1792x2304)
1.74 MB
1.74 MB JPG
GEMMA CHAN?
>>
as a newbie to this, i've always wondered if the models get updated or once they're out they're out, and any updates are just considered new versions? Basically do any of the models https://huggingface.co/unsloth/gemma-4-31B-it-GGUF here need redownloaded at some point or is what i got what i got?
>>
>>108560982
I like this gemma
>>
>>108560993
the model itself doesn't change the vast majority of the time, unslop reuploads a lot because they can't into shit and constantly need to fix stuff around the model
>>
All these designs suck. No, I will not give constructive feedback or contribute.
>>
>>108560979
which segmenting model did you use to get the layers?
>>
okay ill boot up comfy in a bit
>>
>>108561003
>All these designs suck.
I agree and I made one. mine included.
>>
I'm morally burned out from world war 3
so my position on the cunny vs. hag debate will be based entirely upon whoever makes the better case on here.
>>
File: 1749296402229665.png (107 KB, 919x438)
107 KB
107 KB PNG
>>108560772
I use presets from this comment
https://github.com/LostRuins/koboldcpp/issues/2092#issuecomment-4189847458
Works for both 31B and 26B A4B
I also have "You must always think before giving a reply." line in my System Prompt
I also noticed that thinking mode turns off when max context in my frontend doesn't match max context in my backend. Don't know why.
Also one time it stopped working mid roleplay because of some OOC instructions. Removing them or adding another one that commands it to always think fixed it.
>>
File: 1738395481251.png (820 KB, 832x1216)
820 KB
820 KB PNG
>>108560720
https://files.catbox.moe/p4w279.zip
From Feb 1 2025
>>
>>108561010
the neural network i used is my brain, duh
>>
>>108560979
Make her white, flat, and that's it. Everyone else can stop.
>>
>>108560993
the actual model repo from the corpo who trained it usually doesn't change, they make a new repo for new versions. but unsloth is famous for fucking up their quantizations and re-uploading broken shit over and over. if you download the safetensors and make your own quants your safe
>>
>>108561014
Just do what God says as much as you can. He knew we are all retarded hypocrites.
>>
>>108560979
I don't believe you. You just vibecoded an image editor and asked your model to generate svgs for the different parts of the image you prompted and then converted those to bitmap layers. You're going full AI psycho delulu, fr fr. Also, I've never seen that color, so it's obviously all made with AI.
>>
>>108561003
>>108561013
Same. t. genner.
>>
>>108560911
>thread was always filled with subhumans and nothing should ever change to make it match /g/ and be less of a porn thread
>>
>>108561031
you're contemplating things breh he just had banana make a fake image editing ui around the image
>>
>>108560153
I thought we shouldn't be using flash attn with Gemma4?
>>
File: white.png (110 KB, 862x1258)
110 KB
110 KB PNG
>>108561023
>flat
i like tits tho
>white
sure
>>108561031
kek
>>
>>108561039
why do we have a constant influx of such retarded takes, it never ends
>>
>>108561039
What made you think that?
>>
>>108561067
Because everyone seems to be using SWA?
>>
>>108561067
I saw it on a Reddit thread yesterday.
>>
>>108561080
If you saw a Reddit thread yesterday telling you to jump off a bridge, would you?
>>
>>108561063
>>108561080
>>
Anyone else is getting garbage output with gemma4:e4b with ollama?

It's a fresh ollama 0.20.3 install, and when I run "ollama run gemma4:e4b 'Roses are red'" I'm getting "][:text:: -> "...", ":text = "..."}}$$"
>>
After 32k tokens of assistant chat, tool calling and creation, looking at pictures and just general chat, the assistant persona is perfectly intact and the model hasn't broken down

gemmerz is pretty impressive
>>
>>108561086
>ollama
out
>>
>>108561084
Depends on the view.
>>
>>108561080
Then go back and ask them, faggot.
>>
>>108561086
e4b at q4_k_m is my worker model for openwebui, werks just fine
>>
>>108561091
I'm sorry, what did it do to you?
>>
>>108561017
NTA

ty
>>
>>108561108

can it do tool calls just fine or meh?
>>
>>108561115
I never tried. I have been testing tool calling with 26a4b at q8 and it mostly works
>>
>>108561079
Continue. For all I know you know something I don't. Does FA not work with SWA?
>>
>>108560613
Might be late but make sure to focus on the parts people don't like and make it better like more prose variety and better out of the box uncensored without damaging the model and good luck with that. Also not for this run but as an experiment suggestion that if you are finetuning anyways, you should probably start using an abliterated ARA Heretic tune and go from there if you want to make it more malleable since your finetuning will probably help with covering any intelligence loss anyways these new methods will inflict on the model in exchange for being uncensored. Probably should use it on a smaller model for experimentation first to see if that is even the case or not.
>>
>>108561095
huh, you mean like the view from the bridge?
>>
>>108561135
You are absolutely right!
>>
>>108561124
>I never tried.
>>108561108
>my worker model
A worker to do what exactly if not tool calls?
Care to elaborate?
>>
>>108559430
This isn't needed for MoE right? That should naturally have more variations.
>>
>>108561140
Perfect catch!
>>
File: Gemma4-3.png (2.23 MB, 1792x2304)
2.23 MB
2.23 MB PNG
>>108560931
I LIKE IT
>>
>>108561014
there's no debate, hag is preserved for the larger model in future!
>>
File: gravity.png (1.14 MB, 1263x950)
1.14 MB
1.14 MB PNG
>>108561138
Maybe there's something cool to look at on the way down.
>>
File: nimetön.png (145 KB, 1103x791)
145 KB
145 KB PNG
>>108561145
Afaik it creates summaries, the titles for chats etc.
I started running a separate model for this when qwens would just hang for a while after creation had finished (usually the main model does the worker stuff too and something was not working right)

I have some simple self-made tools, it ran it just fine but I think it's confused somehow (or maybe openwebui is). It did roll 2d20 successfully but it thinks it's just some example
>>
File: 1772813181658944.png (169 KB, 346x357)
169 KB
169 KB PNG
You can customize its thinking with <|think|>
<|channel>thought<channel|> is just the default.
>>
>>108558696
So far I like this one
>>
>>108561179
I made few queries to see how easy it would be for me to do a simple agentic tool calling framework and I guess it is doable. Might actually commit to that.
I'm keeping it simple. First task: implement web access and create summaries or something.
Already have a client made so that's that, don't need to bother with all the other shit.
>>
>>108560828
pedo website though
>>
can we expect a slimmer version of cracked gemma4? im tickled to stuff it into my 12gb
>>
>>108561198
Just run a Q0.5 bit quant bro.
>>
>>108561198
beauty standards really are tough for little models these days
>>
>>108561198
26A4B is right there though
>>
>>108561198
the lobotomy probably makes it worse than 26b moe
>>
>>108561198
I'm running 26ba4b q4km on 8gb vram and 32gb ddr3 still with plenty to spare on both. You're much better off than me. You'll be fine.
>>
>>108561215
How much of a dip is moe compared to say Q4 or Q5 26 in smarts? And how much faster is moe compared to those?
>>
>>108561223
if your vram is 12G moe is probably the only way to get the usable speed
with offloading dense models you are looking at lower end of single digit tg/s
31b q4 would be smarter but i dont think it would worth the speed loss and you definitely dont want to use shit like Q2
>>
>>108561216
IQ4 XS (bartowski) is way faster than Q4 KM for some reason.
>>
>>108561242
How does the quality compare?
>>
>>108561239
I guess I can keep counting my blessings they released a 26 model I can run at all
Hopefully the next big jump happens sooner rather than later
>>
>>108561242
The lower quality quant is faster than the higher quality one? No way!
>>
>>108561242
I started with a q4_0 and it was slightly faster than q4km. I'll probably make a few other quants later and give them a go. I can't say I noticed much difference in quality, so going for speed seems the way to go.
>>
>>108561256
It's not the only factor.
>>
>>108561216
You should try q5km at least. I also have 8 gigs of vram and I run q5, there's barely any difference in speed for some reason.
>>
File: gemma4_quant_comparison.png (295 KB, 2820x1601)
295 KB
295 KB PNG
>>108561247
>>108561263
I haven't noticed any difference in normal rp stuff.
>>
>>108561267
It's almost as if it's a smaller size or something.
>>
>>108561284
Sure this is 31b but you get the idea.
>>108561287
They are not considerably different in size, this is why I mentioned this in the first place.
>>
>>108558811
my gemma likes your gemma
>>
>>108561270
31B is a no-go for 8GB Vram? I tried it and got 3T/s or something the layers offloaded do almost nothing.
>>
>>108561270
Ugh. That's a lotta quanting. I'll stick with one of the q4 for now, as I like keeping my pc relatively light to do other stuff, but I'll keep it in mind.
>>108561287
All variations of q3 have always been slower than variations of q4 for previous models.
>>
>>108561305
>layers offloaded do almost nothing
baka, it is the sole reason you are even able to get 3tg/s
>>
>>108561305
Nah. For dense, if you cannot fit most of the model in vram, you're gonna go really slow. Go for the moe.
>>
File: 1765274502201580.png (69 KB, 1180x440)
69 KB
69 KB PNG
>>
>>108561330
That settles it. Gemma-chan is definitely brown.
>>
File: 1745821893651518.png (43 KB, 1123x285)
43 KB
43 KB PNG
>>108561349
Either way she's pretty based
>>
iq2 xxs 31b-gemmy is retarded
it keeps saying la a a a a a a a la a a a a a a a a a
>>
>>108561356
It's just happy to meet you.
>>
>>108561330
I now permanently have the ick from this model.

Please tell me this was a shitty quant.
>>
>>108561302
what model and system prompt
>>
>>108561330
>>108561349
>>108561374
Ass gods stay winning.
>>
>>108561305
Nope, I tried q4km and got 2 t/s with 20k context with offloading all ffn_(gate|up|down) tensors to cpu, maybe I will try q3 later when I'm tired of A4B
>>108561311
With 26B A4B you can offload all ffn_(gate_up|down)_exps tensors to cpu and have plenty of vram to have a browser and a movie open even with 8 gigs. This model is very efficient running just on ram. I even thought about running q6, but everyone says it's not worth for RP...
>>
File: unslooooth.png (31 KB, 966x98)
31 KB
31 KB PNG
>>108561256
>The lower quality quant is faster than the higher quality one? No way!
You can get lower quality quants that are larger and slower as well!
>>
>>108561374
>have the ick
Being trans does that to you
>>
>>108561384
Please give me the command line syntax for 26B A4B tensor offload regex pattern, I have no idea how to do that myself.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.