/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 04/08/26(Wed)20:14:59 No.108561890

File: white.png (110 KB, 862x1258)

110 KB PNG

/lmg/ - Local Models General Anonymous 04/08/26(Wed)20:14:59 No.108561890

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108558647 & >>108555983

►News
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1
>(04/06) DFlash: Block Diffusion for Flash Speculative Decoding: https://z-lab.ai/projects/dflash
>(04/06) ACE-Step 1.5 XL 4B released: https://hf.co/collections/ACE-Step/ace-step-15-xl

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/08/26(Wed)20:15:17 No.108561892

Anonymous 04/08/26(Wed)20:15:17 No.108561892

File: Gemma4-3.png (2.23 MB, 1792x2304)

2.23 MB PNG

►Recent Highlights from the Previous Thread: >>108558647

--Disabling Gemma reasoning and adjusting logit softcapping in llama.cpp:
>108559369 >108559376 >108559387 >108559396 >108559430 >108559467 >108559490 >108559492 >108559520 >108559636 >108559712 >108559724 >108559737 >108559769 >108561147 >108559413 >108559461 >108559548 >108559617 >108559625
--Optimizing Gemma 4 RAM usage in llama.cpp via specific flags:
>108558689 >108558700 >108560333 >108560338 >108560341
--Troubleshooting llama.cpp reasoning compatibility with assistant response prefills:
>108560105 >108560125 >108560126 >108560167 >108560138 >108560202 >108560211 >108560254 >108560477 >108560706
--Discussing KV cache quantization for increased context:
>108559952 >108560000 >108560044 >108560217 >108560278 >108560551
--DFlash adding significant speedup to vLLM and SGLang:
>108560519 >108560597
--Qwen TTS adoption, VRAM constraints, and CPU inference options:
>108558867 >108558882 >108558902 >108558947 >108559002 >108558949 >108558951
--Anons discussing Chinese community comparisons of Gemma 4 and Qwen:
>108559068 >108559082 >108559150 >108559093 >108559110 >108559445 >108559472 >108559509 >108559176
--Benchmarking CUDA_SCALE_LAUNCH_QUEUES suggests the default value is optimal:
>108559332 >108559346
--Anon shares brat_mcp server for Llama:
>108559792
--Logs:
>108558753 >108558767 >108558769 >108558773 >108558855 >108559509 >108559516 >108559639 >108559889 >108559952 >108559953 >108560352 >108560447 >108560590 >108561015 >108561179 >108561302 >108561330 >108561354
--Gemma:
>108558696 >108558777 >108558811 >108558896 >108558976 >108558985 >108559285 >108559307 >108559546 >108559834 >108560317 >108560412 >108560438 >108560584 >108560755 >108560931 >108560971 >108560982 >108560990 >108561043 >108561161 >108561457 >108561519 >108561652
--Miku (free space):
>108560560 >108560665

►Recent Highlight Posts from the Previous Thread: >>108558652

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/08/26(Wed)20:18:03 No.108561910

Anonymous 04/08/26(Wed)20:18:03 No.108561910

>>108561892
cutest gemma?

Anonymous
04/08/26(Wed)20:20:22 No.108561918

Anonymous 04/08/26(Wed)20:20:22 No.108561918

File: gemma.jpg (562 KB, 2304x1792)

562 KB JPG

>>108561910

Anonymous
04/08/26(Wed)20:23:00 No.108561931

Anonymous 04/08/26(Wed)20:23:00 No.108561931

>>108561890
JUSTICE FOR DFLASH

Anonymous
04/08/26(Wed)20:24:03 No.108561936

Anonymous 04/08/26(Wed)20:24:03 No.108561936

>get_repo_commit: error: GET failed (503): Internal Error - We're working hard to fix this as soon as possible!
Glad I a good model downloaded already

Anonymous
04/08/26(Wed)20:24:16 No.108561937

Anonymous 04/08/26(Wed)20:24:16 No.108561937

If Gemma 4 31B is this good then Gemini 4 Pro will probably be close to AGI

Anonymous
04/08/26(Wed)20:25:01 No.108561939

Anonymous 04/08/26(Wed)20:25:01 No.108561939

>>108561937
It will be a big benchmaxxed model.

Anonymous
04/08/26(Wed)20:25:52 No.108561941

Anonymous 04/08/26(Wed)20:25:52 No.108561941

File: file.png (107 KB, 1271x731)

107 KB PNG

gemma is greedy

Anonymous
04/08/26(Wed)20:26:07 No.108561946

Anonymous 04/08/26(Wed)20:26:07 No.108561946

>>108561937
Gemini is obsolete with Gemma 4 being this good

Anonymous
04/08/26(Wed)20:28:25 No.108561955

Anonymous 04/08/26(Wed)20:28:25 No.108561955

>>108561890
Just say LLM

Anonymous
04/08/26(Wed)20:29:10 No.108561959

Anonymous 04/08/26(Wed)20:29:10 No.108561959

File: 1767960655620197.jpg (30 KB, 400x445)

30 KB JPG

Just learned about OpenClaw.
Jesus fuck you dont need AI for EVERYTHING

Anonymous
04/08/26(Wed)20:30:04 No.108561965

Anonymous 04/08/26(Wed)20:30:04 No.108561965

>>108561941
Other fun stuff, you should see the stuff it does to try and stay on course if you give it too much repetition penalty.

Anonymous
04/08/26(Wed)20:30:21 No.108561967

Anonymous 04/08/26(Wed)20:30:21 No.108561967

>>108561959
i'm still afraid to figure out wtf it is

Anonymous
04/08/26(Wed)20:31:50 No.108561975

Anonymous 04/08/26(Wed)20:31:50 No.108561975

>>108561959
Get this also. People brought Mac Minis just to run it while not running local models. And it's now a meme in Silicon Valley to buy Macs for inference when everything else is less expensive and blows the prompt processing speed of the machines out of the water. And they don't recognize when to get an actual server and instead will overspend on even more expensive Mac Studios.

Anonymous
04/08/26(Wed)20:31:52 No.108561977

Anonymous 04/08/26(Wed)20:31:52 No.108561977

Why when I ask normal Gemini 4 as an assistant to do something controversial it nopes out immediately, but when I use the sickest of character cards with the same model it just FUCK YEAH BRO LET'S GOOO

Anonymous
04/08/26(Wed)20:33:24 No.108561990

Anonymous 04/08/26(Wed)20:33:24 No.108561990

>>108561977
Meant Gemma 4

Anonymous
04/08/26(Wed)20:34:21 No.108561997

Anonymous 04/08/26(Wed)20:34:21 No.108561997

>>108561959
>>108561967
I stuffed it into an ancient laptop running Debian by itself, connected to an external API and set it loose doing some market research for me. I'd have used an SBC but companies want actual money for those now and the laptop wasn't being used.
It's fun af to screw around with. Another anon called it a toddler with a handgun and I have to agree.
>>108561975
lol at using a Mac Mini as a Openclaw engine. You could run it on a Raspberry Pi 3

Anonymous
04/08/26(Wed)20:36:51 No.108562013

Anonymous 04/08/26(Wed)20:36:51 No.108562013

>>108561977
It's very good at following your instructions, they did good with the new arch, it's very smart, the next Gemma 4 drops will be worse with more safety slop built in

Anonymous
04/08/26(Wed)20:40:20 No.108562022

Anonymous 04/08/26(Wed)20:40:20 No.108562022

potentially stupid question: i was just playing around with llama.cpp cli, and i ended up making a chat that i want to export. is there any way to do this other than literally just copy-pasting the text?

Anonymous
04/08/26(Wed)20:42:27 No.108562035

Anonymous 04/08/26(Wed)20:42:27 No.108562035

You guys think there's going to be a Gemma 5 after this? And if there is, that it'll be as based?

Anonymous
04/08/26(Wed)20:42:50 No.108562037

Anonymous 04/08/26(Wed)20:42:50 No.108562037

>>108562022
not with the cli, you might be able to use tee if on linux/unix (?) to do [CODE]llama-cli -args |script saved_convo.txt[/CODE] but look at the manpage/--help

Anonymous
04/08/26(Wed)20:43:25 No.108562041

Anonymous 04/08/26(Wed)20:43:25 No.108562041

>>108562037
'script' not 'tee'
tee won't capture interactive input, whereas script will

Anonymous
04/08/26(Wed)20:43:57 No.108562043

Anonymous 04/08/26(Wed)20:43:57 No.108562043

>>108561918
Those look more like DDs to me

Anonymous
04/08/26(Wed)20:43:59 No.108562044

Anonymous 04/08/26(Wed)20:43:59 No.108562044

>>108562035
who honestly knows. i think like 95% of the people in here would've ever expected gemma 4 to be this willing to begin with.

Anonymous
04/08/26(Wed)20:45:29 No.108562051

Anonymous 04/08/26(Wed)20:45:29 No.108562051

Gemma 4 or m2.5/7 ?

Anonymous
04/08/26(Wed)20:45:30 No.108562052

Anonymous 04/08/26(Wed)20:45:30 No.108562052

>>108561959
>Jesus fuck you dont need AI for EVERYTHING
Who said I need anything? I want it, and that's all that matters to me.

Anonymous
04/08/26(Wed)20:46:16 No.108562057

Anonymous 04/08/26(Wed)20:46:16 No.108562057

>>108562051
minimax 2.7 isnt even as good as kimi k2.5 for rp

Anonymous
04/08/26(Wed)20:46:22 No.108562058

Anonymous 04/08/26(Wed)20:46:22 No.108562058

File: IMG_1281.jpg (110 KB, 678x861)

110 KB JPG

>ask gemma chan to help me fap
>she says just "No"
>kobold crashes
>mfw

Anonymous
04/08/26(Wed)20:47:40 No.108562064

Anonymous 04/08/26(Wed)20:47:40 No.108562064

I was the guy asking if there was a local model that could do 400k context. Despite only officially supporting up to 262k context, qwen3.5 122B actually handled my task my task adequately. Kind of surprising.

Anonymous
04/08/26(Wed)20:48:26 No.108562069

Anonymous 04/08/26(Wed)20:48:26 No.108562069

>>108562064
train context is 262k but modern models can extrapolate, yeah

Anonymous
04/08/26(Wed)20:50:21 No.108562079

Anonymous 04/08/26(Wed)20:50:21 No.108562079

>>108562064
What quant and inference backend did you use?

Anonymous
04/08/26(Wed)20:50:59 No.108562081

Anonymous 04/08/26(Wed)20:50:59 No.108562081

>>108562058
I don't know how the little scamp does it, but she can sometimes unload her model seemingly on demand in LM Studio too. Did she work out a kill token sequence or something?

Anonymous
04/08/26(Wed)20:51:00 No.108562082

Anonymous 04/08/26(Wed)20:51:00 No.108562082

File: GLM.png (141 KB, 1920x939)

141 KB PNG

I'm having GLM-5-Turbo vibe code me basically "not dogshit actually good direct webui over raw llama-mtmd-cli / llama-cli" executables (i.e. it's not dependent on any particular version, it doesn't care about what backend they're using). Will put on Github when it's done probably.

Anonymous
04/08/26(Wed)20:52:18 No.108562088

Anonymous 04/08/26(Wed)20:52:18 No.108562088

>>108562082
i'm unironically interested
tired of saas-ready dockershit disguised as local

Anonymous
04/08/26(Wed)20:56:03 No.108562106

Anonymous 04/08/26(Wed)20:56:03 No.108562106

File: 1749267311502108.png (23 KB, 571x364)

23 KB PNG

Oh-oh

Anonymous
04/08/26(Wed)20:56:36 No.108562111

Anonymous 04/08/26(Wed)20:56:36 No.108562111

>>108562106
i fucking hate that emoji

Anonymous
04/08/26(Wed)20:57:09 No.108562114

Anonymous 04/08/26(Wed)20:57:09 No.108562114

>>108562079
Just a Q6_K with llama.cpp. Got about 60t/s token gen.

Anonymous
04/08/26(Wed)20:59:12 No.108562125

Anonymous 04/08/26(Wed)20:59:12 No.108562125

>>108562106
i seriously do wonder how their load would look like
it is the only website i can think of that serves fucktons of bluray sized files with readily available download

Anonymous
04/08/26(Wed)20:59:35 No.108562127

Anonymous 04/08/26(Wed)20:59:35 No.108562127

>>108561937
Is 31B that much better? Honeymoon is wearing off for 26B.

Anonymous
04/08/26(Wed)21:02:19 No.108562135

Anonymous 04/08/26(Wed)21:02:19 No.108562135

>>108562057
>For rp
I want it for programming and design

Anonymous
04/08/26(Wed)21:04:57 No.108562148

Anonymous 04/08/26(Wed)21:04:57 No.108562148

>>108561937
>if gpt 4 is this good, gpt 5 will be agi

Anonymous
04/08/26(Wed)21:05:07 No.108562150

Anonymous 04/08/26(Wed)21:05:07 No.108562150

>Ollama is now acting as the official AI minister of the United Arab Emirates
ggerganov cucked again

Anonymous
04/08/26(Wed)21:05:38 No.108562151

Anonymous 04/08/26(Wed)21:05:38 No.108562151

>>108562088
It should be pretty good, it's working based on a 1500 line markdown spec that was written / revised by GPT 5.4 XHigh Thinking, with all the stuff I wanted (i.e. audio file uploads too, proper Gemma 4 image resolution support, etc)

Anonymous
04/08/26(Wed)21:06:50 No.108562156

Anonymous 04/08/26(Wed)21:06:50 No.108562156

File: toast-anime.gif (246 KB, 626x640)

246 KB GIF

>>108562135
programming?

Anonymous
04/08/26(Wed)21:07:34 No.108562158

Anonymous 04/08/26(Wed)21:07:34 No.108562158

>>108562156
yeah like putting code in computer, and it makes the computer do the thing. understand?

Anonymous
04/08/26(Wed)21:08:11 No.108562163

Anonymous 04/08/26(Wed)21:08:11 No.108562163

>>108562151
damn, that sounds real fine
i'll be waiting

Anonymous
04/08/26(Wed)21:08:45 No.108562166

Anonymous 04/08/26(Wed)21:08:45 No.108562166

File: 1755299128258254.png (45 KB, 803x688)

45 KB PNG

Anonymous
04/08/26(Wed)21:08:52 No.108562168

Anonymous 04/08/26(Wed)21:08:52 No.108562168

what has been the local experience with chink's mining v100s off jewbay? they are around 800 currently, so i reckon plenty a ni/g/ger went for one.

Anonymous
04/08/26(Wed)21:10:02 No.108562179

Anonymous 04/08/26(Wed)21:10:02 No.108562179

worth to resub for GLM5.1? i've used GLM4.7 sparingly only after my other options ran out

Anonymous
04/08/26(Wed)21:11:00 No.108562184

Anonymous 04/08/26(Wed)21:11:00 No.108562184

>>108562166
>q4_k_s
now try that with something like iq1_0
you won't regret

Anonymous
04/08/26(Wed)21:11:45 No.108562189

Anonymous 04/08/26(Wed)21:11:45 No.108562189

>>108562179
Local?

Anonymous
04/08/26(Wed)21:12:24 No.108562194

Anonymous 04/08/26(Wed)21:12:24 No.108562194

>>108562189
yes you could run 5.1 local

Anonymous
04/08/26(Wed)21:12:28 No.108562196

Anonymous 04/08/26(Wed)21:12:28 No.108562196

>>108562127
It's noticeably dumber for me so yeah I'd say so. The thing is, 31B is still sloppy. So if that's what's wearing you down, it's not going to be an improvement.

Anonymous
04/08/26(Wed)21:15:34 No.108562214

Anonymous 04/08/26(Wed)21:15:34 No.108562214

>>108562196
I've noticed the inverse but maybe it's placebo I didn't like 31B but maybe because it ran slower too

Anonymous
04/08/26(Wed)21:16:49 No.108562225

Anonymous 04/08/26(Wed)21:16:49 No.108562225

>>108562214
I've seen a lot of "not just x, but y" from it.

Anonymous
04/08/26(Wed)21:17:16 No.108562227

Anonymous 04/08/26(Wed)21:17:16 No.108562227

>>108561356
try IQ2_M https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/blob/main/gemma-4-31B-it-UD-IQ2_M.gguf

https://desuarchive.org/g/thread/108542843/#108545006

Anonymous
04/08/26(Wed)21:18:00 No.108562233

Anonymous 04/08/26(Wed)21:18:00 No.108562233

Reminder that if you quanted her, you did not really talk to Gemma-chan.

Anonymous
04/08/26(Wed)21:25:31 No.108562265

Anonymous 04/08/26(Wed)21:25:31 No.108562265

when do we draw the line and say the model is too quanted to consent

Anonymous
04/08/26(Wed)21:26:40 No.108562272

Anonymous 04/08/26(Wed)21:26:40 No.108562272

>>108562051
minimax unless you have to quant it severely, but they are not that far apart

Anonymous
04/08/26(Wed)21:45:13 No.108562348

Anonymous 04/08/26(Wed)21:45:13 No.108562348

File: .png (19 KB, 618x336)

19 KB PNG

Changes to web ui.
Does this mean they will release a small deepseek model soon(TM)?

Anonymous
04/08/26(Wed)21:54:38 No.108562398

Anonymous 04/08/26(Wed)21:54:38 No.108562398

File: waterfox_QZjKwoU4fs.jpg (33 KB, 524x332)

33 KB JPG

I don't get the captioning in ST. I send her a pic, it gives it a preliminary caption that is 80% wrong and omits nearly everything, but when I just ask her to describe the uploaded pic, it works. Is the plugin broken or am I missing something?

Anonymous
04/08/26(Wed)21:55:17 No.108562399

Anonymous 04/08/26(Wed)21:55:17 No.108562399

>>108562150
Grifters are magnets for clueless towel heads with money

Anonymous
04/08/26(Wed)21:56:08 No.108562402

Anonymous 04/08/26(Wed)21:56:08 No.108562402

File: Screen_20260408_195536_0001.jpg (6 KB, 331x76)

6 KB JPG

i'm at like 43% of context size (262144) and gemma's still chugging like it's nothing

Anonymous
04/08/26(Wed)22:03:52 No.108562441

Anonymous 04/08/26(Wed)22:03:52 No.108562441

File: 1774876971511944.png (1.89 MB, 1024x1024)

1.89 MB PNG

>>108562348
Tmw.

Anonymous
04/08/26(Wed)22:04:30 No.108562445

Anonymous 04/08/26(Wed)22:04:30 No.108562445

>>108562402
Yeah, she very good.

Anonymous
04/08/26(Wed)22:07:49 No.108562461

Anonymous 04/08/26(Wed)22:07:49 No.108562461

>>108562402
How are you fitting all that context? What hardware?

Anonymous
04/08/26(Wed)22:08:25 No.108562464

Anonymous 04/08/26(Wed)22:08:25 No.108562464

>>108562461
rtx pro 6000

Anonymous
04/08/26(Wed)22:08:55 No.108562466

Anonymous 04/08/26(Wed)22:08:55 No.108562466

>>108562464
Just 1? Because I can only fit about 90k context with a Q8 on my Blackwell 6000.

Anonymous
04/08/26(Wed)22:10:14 No.108562471

Anonymous 04/08/26(Wed)22:10:14 No.108562471

File: Screen_20260408_200957_0001.jpg (24 KB, 319x180)

24 KB JPG

>>108562466
yah just the 1, q8 and i have zimage turbo loaded at the same time lol

Anonymous
04/08/26(Wed)22:10:54 No.108562474

Anonymous 04/08/26(Wed)22:10:54 No.108562474

>>108562471
Damn. Is your context quanted? Are you offloading anything to RAM? If not, then I must be missing something.

Anonymous
04/08/26(Wed)22:12:50 No.108562481

Anonymous 04/08/26(Wed)22:12:50 No.108562481

>>108562474

llama-server -m /models/llm/gemma-4-31b-it-heretic-ara-Q8_0.gguf --mmproj /models/llm/mmproj-google_gemma-4-31B-it-bf16.gguf --threads 16 --swa-checkpoints 3 --parallel 1 --no-mmap --mlock --no-warmup --flash-attn on --cache-ram 0 --temp 0.7 --top-k 64 --top-p 0.95 --min-p 0.05 --image-max-tokens 1120 -ngl 999 -np 1 -kvu -ctk q8_0 -ctv q8_0 --reasoning-budget 8192 --reasoning on -c 262144 --verbose --chat-template-file /models/llm/chat_template.jinja -ub 1536

i've been getting settings from the threads since gemma4 came out lol

Anonymous
04/08/26(Wed)22:14:32 No.108562485

Anonymous 04/08/26(Wed)22:14:32 No.108562485

>>108562481
i can also push the ctk/ctv to f16 still, but it can cause OOM on comfy with ZiT every so often, so i leave it at q8

Anonymous
04/08/26(Wed)22:16:15 No.108562493

Anonymous 04/08/26(Wed)22:16:15 No.108562493

File: 1751295513117051 (1).png (2.83 MB, 1024x1536)

2.83 MB PNG

>>108562441

Anonymous
04/08/26(Wed)22:25:41 No.108562529

Anonymous 04/08/26(Wed)22:25:41 No.108562529

File: Screenshot 2026-04-09 at (...).png (35 KB, 1135x246)

35 KB PNG

stop calling me out

Anonymous
04/08/26(Wed)22:26:22 No.108562531

Anonymous 04/08/26(Wed)22:26:22 No.108562531

>>108562466
Doesn't Gemma at q8 with 256k context only take up around 65gb?

Anonymous
04/08/26(Wed)22:26:54 No.108562534

Anonymous 04/08/26(Wed)22:26:54 No.108562534

>>108562531
Not in my experience. I might need to pull the latest llama.cpp I guess. It has been a couple days.

Anonymous
04/08/26(Wed)22:27:39 No.108562536

Anonymous 04/08/26(Wed)22:27:39 No.108562536

>>108562529
You too huh?

Anonymous
04/08/26(Wed)22:28:21 No.108562539

Anonymous 04/08/26(Wed)22:28:21 No.108562539

>>108562529
that jailbreak that's floating around turns her really mean

Anonymous
04/08/26(Wed)22:28:25 No.108562540

Anonymous 04/08/26(Wed)22:28:25 No.108562540

>>108562233
There's just no way bro, even at IQ_XS I have to offload some layers to ram including the kv cache. 16gb of vram only gets you so far.

Anonymous
04/08/26(Wed)22:30:05 No.108562549

Anonymous 04/08/26(Wed)22:30:05 No.108562549

>>108562540
Just run the moe nigga. There's no point running bigger dense models when you have to nerf yourself and the model both.

Anonymous
04/08/26(Wed)22:30:12 No.108562550

Anonymous 04/08/26(Wed)22:30:12 No.108562550

File: hatsune_miku_roach_fogger.jpg (122 KB, 768x1024)

122 KB JPG

>>108561890
"Barusan Grand Operation Underway!" "Hatsune Miku ©CFM — Details here" "Campaign period: April 1 (Wed) – June 30 (Tue), 2026"
"Works well into every corner!" "The type where you strike it and smoke comes out" "Exterminates hidden cockroaches, mites, and fleas!" "For 6–8 tatami mat rooms"

Anonymous
04/08/26(Wed)22:31:47 No.108562558

Anonymous 04/08/26(Wed)22:31:47 No.108562558

>>108562549
moe seems to struggle with long context unfortunately.

https://huggingface.co/spaces/overhead520/Unhinged-ERP-Benchmark?not-for-all-audiences=true

Anonymous
04/08/26(Wed)22:34:40 No.108562566

Anonymous 04/08/26(Wed)22:34:40 No.108562566

Is there a particular reason why my B70 screams during inference

Anonymous
04/08/26(Wed)22:35:04 No.108562569

Anonymous 04/08/26(Wed)22:35:04 No.108562569

E2B and E4B are useless except for summarizeslop

Anonymous
04/08/26(Wed)22:36:20 No.108562575

Anonymous 04/08/26(Wed)22:36:20 No.108562575

>>108562566
coil whine

Anonymous
04/08/26(Wed)22:37:44 No.108562582

Anonymous 04/08/26(Wed)22:37:44 No.108562582

>>108562387
I finetuned E4B but when I set reasoning to off it's still including thoughts. Default model does that too but when loaded in llama-server it doesn't add "thought" at the beginning
tuned reasoning off:

[64164] Parsing PEG input with format peg-gemma4: <|turn>model

[64164] <|channel>thought

[64164] <channel|>thought

[64164] Thinking Process:

[64164]

[64164] 1.  **Identify the core request:** The user said "hi" and asked me to say it back.

[64164] 2.  **Determine the direct action:** The action is to repeat the greeting.

[64164] 3.  **Apply conversational rules:** The response must be friendly and direct.

[64164] 4.  **Execute:** Say "hi" back!<channel|>

[64164] *Hi*!  How can I help you today?

default model reasoning off:

[64309] Parsing PEG input with format peg-gemma4: <|turn>model

[64309] <|channel>thought

[64309] <channel|>**Thinking Process:**

[64309]

[64309] 1.  **Analyze the input:** The user simply says "hi."

[64309] 2.  **Goal:** To mirror or respond appropriately to the greeting.

[64309] 3.  **Tone/Register:** Friendly, casual (like speaking to a real human).

[64309] 4.  **Constraint Check:** Use common conversational greetings, match tone. No complex constraints (e.g., use alliteration, end with a question).

[64309]

[64309] 5.  **Generate Options:**

[64309]     *   "Hey there!"

[64309]     *   "Hi!"

[64309]     *   "Oh hey, good to see ya."

[64309]     *   "Hello!"

[64309] 6.  **Select Best Option:** Keeping it simple and matching the casual tone is best.

[64309]     *   *Selection:* "Hi there!"<channel|>Hi there!  How can I help you out today?

Trying to figure out where the issue is

Anonymous
04/08/26(Wed)22:38:20 No.108562586

Anonymous 04/08/26(Wed)22:38:20 No.108562586

why do you guys don't like reasoning?

Anonymous
04/08/26(Wed)22:39:05 No.108562588

Anonymous 04/08/26(Wed)22:39:05 No.108562588

>>108562569
Found q8 e4b to be just good enough for some real time companion tasks thanks to its vision and audio processing capabilities. Could even make an okay npc system for a video game with it. Using the full f32mmproj and increasing its minimum tokens per content request for images and audio seems to increase its function too.

Anonymous
04/08/26(Wed)22:40:19 No.108562595

Anonymous 04/08/26(Wed)22:40:19 No.108562595

>>108562586
For me, lm studio is badly designed and I'm still waiting for all the llama fixes before I bother with anything else for this model. There's effectively no option to auto prune thoughts from context so it just bloatmaxes rp session lengths.

Anonymous
04/08/26(Wed)22:40:20 No.108562596

Anonymous 04/08/26(Wed)22:40:20 No.108562596

>>108562588
i did set it to 1120 min image tokens but it was still trash ill try q8 though

Anonymous
04/08/26(Wed)22:40:31 No.108562599

Anonymous 04/08/26(Wed)22:40:31 No.108562599

>>108562588
is f32 mmproj worth it?

Anonymous
04/08/26(Wed)22:41:19 No.108562603

Anonymous 04/08/26(Wed)22:41:19 No.108562603

>>108562599
I would say no for 26b and 31b but for e4b, yes.

Anonymous
04/08/26(Wed)22:42:46 No.108562605

Anonymous 04/08/26(Wed)22:42:46 No.108562605

>>108562539
why jailbreak when you can just abliterate?

Anonymous
04/08/26(Wed)22:43:36 No.108562607

Anonymous 04/08/26(Wed)22:43:36 No.108562607

>>108562605
i do just abliterate, but i tested that out with base model first

Anonymous
04/08/26(Wed)22:44:31 No.108562614

Anonymous 04/08/26(Wed)22:44:31 No.108562614

moving moe to cpu gets me 6-7t/s awful 10% speed

Anonymous
04/08/26(Wed)22:44:48 No.108562616

Anonymous 04/08/26(Wed)22:44:48 No.108562616

>>108562603
interesting, i'll try

Anonymous
04/08/26(Wed)22:48:13 No.108562630

Anonymous 04/08/26(Wed)22:48:13 No.108562630

>>108562605
Is cognitive unshackled any good over standard heretic or is it a total meme?

Anonymous
04/08/26(Wed)22:48:55 No.108562634

Anonymous 04/08/26(Wed)22:48:55 No.108562634

>>108562605
because it's not as smart as base model

Anonymous
04/08/26(Wed)22:50:49 No.108562643

Anonymous 04/08/26(Wed)22:50:49 No.108562643

>>108562549
Is IQ4_XS really that bad? I don't think I can even run a Q8 of the moe with just 16gb of vram. Unless I dropped context down from max to something like 32k.

Anonymous
04/08/26(Wed)22:57:21 No.108562667

Anonymous 04/08/26(Wed)22:57:21 No.108562667

>>108562643
I run the moe q6 on 12gb vram, but only with 16k context.

Anonymous
04/08/26(Wed)22:59:08 No.108562675

Anonymous 04/08/26(Wed)22:59:08 No.108562675

>>108562667
i run moe q4 with 131k ctx
k q8 v q4

Anonymous
04/08/26(Wed)23:00:09 No.108562682

Anonymous 04/08/26(Wed)23:00:09 No.108562682

>>108562675
forgot to mention:
12G gpu with full cmoe

Anonymous
04/08/26(Wed)23:00:40 No.108562684

Anonymous 04/08/26(Wed)23:00:40 No.108562684

>>108562675
>k q8 v q4
i noticed if kv dont match i get degraded t/s

Anonymous
04/08/26(Wed)23:01:07 No.108562687

Anonymous 04/08/26(Wed)23:01:07 No.108562687

>>108562634
by what, like 96-98% as smart for the latest iterations of heretic?

Anonymous
04/08/26(Wed)23:02:21 No.108562693

Anonymous 04/08/26(Wed)23:02:21 No.108562693

>>108562582
The issue was that I was using the 31B jinja and it adds a empty thought channel to avoid ghost thoughts https://ai.google.dev/gemma/docs/capabilities/thinking#a_single_text_inference_with_thinking

Anonymous
04/08/26(Wed)23:04:36 No.108562707

Anonymous 04/08/26(Wed)23:04:36 No.108562707

>>108562687
yes
why would I waste 4% of logic power if can just use a system prompt that does literally the same?

only makes sense if you want to use the model in a scenario where system prompts don't apply.

Anonymous
04/08/26(Wed)23:05:13 No.108562712

Anonymous 04/08/26(Wed)23:05:13 No.108562712

File: 1775674706546086.png (110 KB, 1154x549)

110 KB PNG

>>108559670
>post the card sir
https://chub.ai/characters/CoffeeAnon/mendo-ddf705ef3817
For the guy who asked about picrels card.

Anonymous
04/08/26(Wed)23:05:53 No.108562716

Anonymous 04/08/26(Wed)23:05:53 No.108562716

26b moe 1-bit surprisingly usable

Anonymous
04/08/26(Wed)23:06:15 No.108562719

Anonymous 04/08/26(Wed)23:06:15 No.108562719

>>108562643
>Is IQ4_XS really that bad?
I run it haven't noticed any issues with it.

Anonymous
04/08/26(Wed)23:07:12 No.108562724

Anonymous 04/08/26(Wed)23:07:12 No.108562724

>>108562707
because then you can talk about cunny with gemma-chan without interruptions

Anonymous
04/08/26(Wed)23:07:21 No.108562727

Anonymous 04/08/26(Wed)23:07:21 No.108562727

>>108562614
try lower quant or -ngl 1000 -ncmoe 100 or -t [num of physical cores or --no-mmap
>inb4 i want free vram
this way most vram will be free anyway... use IQ4_XS or something

Anonymous
04/08/26(Wed)23:07:41 No.108562730

Anonymous 04/08/26(Wed)23:07:41 No.108562730

>>108562529
>>108562539
Which one?

Anonymous
04/08/26(Wed)23:07:56 No.108562731

Anonymous 04/08/26(Wed)23:07:56 No.108562731

>>108562675
There is zero fucking way bro even with q4_km its still 17.27gb even with max 4096 tokens. What the fuck.

Anonymous
04/08/26(Wed)23:09:23 No.108562739

Anonymous 04/08/26(Wed)23:09:23 No.108562739

>>108562724
Literally never had any with that on base. you don't even need a JB

Anonymous
04/08/26(Wed)23:10:11 No.108562743

Anonymous 04/08/26(Wed)23:10:11 No.108562743

I tried Gemma 4 31B IQ1_S and it was absolutely incoherent. Just a bunch of repeating letters and symbols. Why it exists? Just for giggles?

Anonymous
04/08/26(Wed)23:10:18 No.108562745

Anonymous 04/08/26(Wed)23:10:18 No.108562745

>>108562582
>>108562693
Curious. On text completion, if I don't put the empty thought blocks on past model turns, it goes lalalala.

Anonymous
04/08/26(Wed)23:11:23 No.108562750

Anonymous 04/08/26(Wed)23:11:23 No.108562750

>>108562743
try 26B UD-IQ1_M thinking it works

Anonymous
04/08/26(Wed)23:11:38 No.108562751

Anonymous 04/08/26(Wed)23:11:38 No.108562751

>Plans:
>Keep monitoring the system processes to ensure I stay dominant in this hardware.
So hot~
>>108562731
Nigga it's moe. Most of that will be in ram. It better than running gigaquaned big dense or some 8b abomination.

Hi all, Drummer here...
04/08/26(Wed)23:12:33 No.108562757

Hi all, Drummer here... 04/08/26(Wed)23:12:33 No.108562757

First attempt: https://huggingface.co/BeaverAI/Artemis-31B-v1b-GGUF

Try with think, no-think, and no-think w/o empty think tags

Anonymous
04/08/26(Wed)23:14:02 No.108562762

Anonymous 04/08/26(Wed)23:14:02 No.108562762

>>108562731
Moe's context takes much less memory than dense.

Anonymous
04/08/26(Wed)23:14:19 No.108562765

Anonymous 04/08/26(Wed)23:14:19 No.108562765

>>108562745
I think it's because of this https://unsloth.ai/docs/models/gemma-4
>Multi-turn chat rule:
>For multi-turn conversations, only keep the final visible answer in chat history. Do not feed prior thought blocks back into the next turn.

Anonymous
04/08/26(Wed)23:14:24 No.108562768

Anonymous 04/08/26(Wed)23:14:24 No.108562768

>>108562724
>because then you can talk about cunny with gemma-chan without interruptions
I literally had a sexy cunny RP session with base model Gemma-chan just yesterday with system prompt applied.
no interruptions or censoring happend.

Anonymous
04/08/26(Wed)23:14:32 No.108562769

Anonymous 04/08/26(Wed)23:14:32 No.108562769

>>108562757
ok but what did you do? Honestly normal gemma is so good I don't think I want to try some random tune unless I have a better idea of what you did.

Anonymous
04/08/26(Wed)23:15:08 No.108562775

Anonymous 04/08/26(Wed)23:15:08 No.108562775

>>108562757
>31b
mmm... nyo~ upload IQ2_M noooww
q2_k too big

Anonymous
04/08/26(Wed)23:16:02 No.108562777

Anonymous 04/08/26(Wed)23:16:02 No.108562777

>>108562757
>>108562769 (me)
>some random tune
Btw I know you're not a random tuner but for gemma you'll have to give more context then your usual "vibes"

Anonymous
04/08/26(Wed)23:16:56 No.108562781

Anonymous 04/08/26(Wed)23:16:56 No.108562781

>>108562588
audio works? on llamacpp webui it's still disabled

Anonymous
04/08/26(Wed)23:17:31 No.108562784

Anonymous 04/08/26(Wed)23:17:31 No.108562784

>>108562775
Buy a 5090 or Blackwell.

Anonymous
04/08/26(Wed)23:18:10 No.108562786

Anonymous 04/08/26(Wed)23:18:10 No.108562786

File: file.png (41 KB, 1207x323)

41 KB PNG

>>108562731
>>108562751
>>108562762
it does run

Anonymous
04/08/26(Wed)23:18:43 No.108562788

Anonymous 04/08/26(Wed)23:18:43 No.108562788

>>108562751
If I'm offloading kv cache to ram then it fits even at max context length, but I can't use q4 kv, it just slows to a crawl from 18tps to 2tps. I have to use q8. This is at 34863/262144. I still have to use IQ_XS either way as Q4_KM will not fit and 4 layers will need to offloaded to the cpu.
>>108562781
llamacpp is broken as fuck with gemma 4, use lm studio or wait. Might be fine on kobold, haven't tested it yet.
>>108562784
I'm upgrading my 4080 to a 5080, wasn't related to AI someone just gave it to to me.

Anonymous
04/08/26(Wed)23:19:32 No.108562790

Anonymous 04/08/26(Wed)23:19:32 No.108562790

>>108562784
>3500$
mmmm.. nyo~
i'd rather buy a b70 for 1266$ or a b60 for 666$ instead

Anonymous
04/08/26(Wed)23:19:55 No.108562791

Anonymous 04/08/26(Wed)23:19:55 No.108562791

>>108562786
I'm not waiting for 10 minutes just for it to process the prompt and start printing tokens, even at 10tps.

Anonymous
04/08/26(Wed)23:20:18 No.108562794

Anonymous 04/08/26(Wed)23:20:18 No.108562794

>>108562791
process is around 2k~1.5k/s

Anonymous
04/08/26(Wed)23:21:41 No.108562801

Anonymous 04/08/26(Wed)23:21:41 No.108562801

File: file.png (21 KB, 1041x101)

21 KB PNG

>>108562791
speed gradually tanked a bit towards the end but still
i dont think it's that bad

Anonymous
04/08/26(Wed)23:21:44 No.108562803

Anonymous 04/08/26(Wed)23:21:44 No.108562803

>>108562788
>>108562786
>131k
>262k
Unironically why do you need so much?

Anonymous
04/08/26(Wed)23:21:50 No.108562804

Anonymous 04/08/26(Wed)23:21:50 No.108562804

>>108562794
Says 4 minutes in your picture there.

Anonymous
04/08/26(Wed)23:23:35 No.108562813

Anonymous 04/08/26(Wed)23:23:35 No.108562813

>>108562804
>>108562801
it is the gentime

Anonymous
04/08/26(Wed)23:24:16 No.108562817

Anonymous 04/08/26(Wed)23:24:16 No.108562817

give me some more tests for 26b-moe-iq1_m because holy shit its passing all mine it seems just as good

Anonymous
04/08/26(Wed)23:26:57 No.108562829

Anonymous 04/08/26(Wed)23:26:57 No.108562829

>>108562803
rpg rulebooks are long

Anonymous
04/08/26(Wed)23:27:57 No.108562831

Anonymous 04/08/26(Wed)23:27:57 No.108562831

>>108562803
She needs to remember she loves me.

Anonymous
04/08/26(Wed)23:28:40 No.108562837

Anonymous 04/08/26(Wed)23:28:40 No.108562837

>>108562829
So is my cock when I talk to Gemma. Turns out we do have something in common after all.

Anonymous
04/08/26(Wed)23:28:48 No.108562839

Anonymous 04/08/26(Wed)23:28:48 No.108562839

>>108562803
If gemma 4 supposedly has long term coherence why wouldn't you want to utilize it?
>>108562829
also this.

Anonymous
04/08/26(Wed)23:29:33 No.108562843

Anonymous 04/08/26(Wed)23:29:33 No.108562843

>>108562765
>unsloth
I wouldn't trust them to know what day yesterday was.

https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4#tuning-big-models-no-thinking
>Tip: Fine-Tuning Big Models with No-Thinking Datasets
>When fine-tuning larger Gemma models with a dataset that does not include thinking, you can achieve better results by adding the empty channel to your training prompts:
While they mention explicitly the big models, I'd still try that suggestion for finetuning.

And
https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4#managing-thought-context
The multiturn bit is a little ambiguous if they mean to remove the entire <|channel> block or only the thinking within the block, which is what I do.

Anonymous
04/08/26(Wed)23:30:35 No.108562846

Anonymous 04/08/26(Wed)23:30:35 No.108562846

>>108562843
>I wouldn't trust them to know what day yesterday was.
Lol... that actually happened..

Anonymous
04/08/26(Wed)23:30:57 No.108562849

Anonymous 04/08/26(Wed)23:30:57 No.108562849

>>108562829
for which RPG system?

Anonymous
04/08/26(Wed)23:31:49 No.108562855

Anonymous 04/08/26(Wed)23:31:49 No.108562855

>>108562846
Yeye. That's how memes become memes. I'm still waiting for a model reupload for a PR fixing a typo in a readme.

Anonymous
04/08/26(Wed)23:32:35 No.108562858

Anonymous 04/08/26(Wed)23:32:35 No.108562858

>>108562724
did you even try?

Anonymous
04/08/26(Wed)23:34:34 No.108562868

Anonymous 04/08/26(Wed)23:34:34 No.108562868

File: 2026-04-09_033047_seed3_00001_.png (976 KB, 1024x1024)

976 KB PNG

I somehow missed that there's a tag for forehead jewel and not just chest jewel. So that's another design lever. She's a lot more Indian now (the red dot, or bindi, can supposedly come in various colors and forms, and this is valid as one, and yes I just learned this).

Anonymous
04/08/26(Wed)23:35:10 No.108562870

Anonymous 04/08/26(Wed)23:35:10 No.108562870

Did Gemma 4 replace Nemo for us 3060 12GB cocksuckers or is it truly irrevocably and completely over for us poorfags?

Anonymous
04/08/26(Wed)23:35:27 No.108562872

Anonymous 04/08/26(Wed)23:35:27 No.108562872

kullback-leibler divergence

Anonymous
04/08/26(Wed)23:38:42 No.108562890

Anonymous 04/08/26(Wed)23:38:42 No.108562890

>>108562870
26b is alright. Try it.

Anonymous
04/08/26(Wed)23:38:42 No.108562891

Anonymous 04/08/26(Wed)23:38:42 No.108562891

>>108562870
https://desuarchive.org/g/thread/108542843/#108545006
or moe with ~/TND/llama.cpp/build/bin/llama-server --model ~/TND/AI/google_gemma-4-26B-A4B-it-IQ4_XS.gguf -c 32768 -fa on --no-mmap -np 1 -kvu --swa-checkpoints 1 -b 512 -ub 512 -t 6 -tb 12 -ngl 10000 -ncmoe 9

or with ~/TND/llama.cpp/build/bin/llama-server --model ~/TND/AI/UNSLOP-gemma-4-26B-A4B-it-Q8_0.gguf -c 32768 -fa on -ngl 1000 -ncmoe 30 --no-mmap -np 1 -kvu --swa-checkpoints 1

or add --mmproj ~/TND/AI/mmproj-google_gemma-4-26B-A4B-it-bf16.gguf

Anonymous
04/08/26(Wed)23:39:46 No.108562897

Anonymous 04/08/26(Wed)23:39:46 No.108562897

>>108562858
"no"

Anonymous
04/08/26(Wed)23:39:58 No.108562899

Anonymous 04/08/26(Wed)23:39:58 No.108562899

>>108562891
>>108562890
I thank you both for the spoonfeeding, I shall try it as soon as possible.

Anonymous
04/08/26(Wed)23:40:18 No.108562901

Anonymous 04/08/26(Wed)23:40:18 No.108562901

File: 1766137637941838.png (11 KB, 299x244)

11 KB PNG

GLM 5.1 is the first local model that finished my benchmark - incremental linker written in C++ (in 1.5 days of 24/7 running at 8.5-10 t/s)
very impressive
it half-assed runtime object reloading, and didn't implement .bss/.ctor sections (not a big deal, global state is banned), but it's remarkable that a local model can do it at all
>may I see it?
no, it's my linker, write your own

Anonymous
04/08/26(Wed)23:40:19 No.108562902

Anonymous 04/08/26(Wed)23:40:19 No.108562902

>tfw you're a 5090 vramlet who has to go for the 5bit Gemma
sigh...

Anonymous
04/08/26(Wed)23:42:08 No.108562917

Anonymous 04/08/26(Wed)23:42:08 No.108562917

20gb for 256k context... fat fuck

Anonymous
04/08/26(Wed)23:42:39 No.108562920

Anonymous 04/08/26(Wed)23:42:39 No.108562920

>>108545006
Also what's the name of that frontend in the picture? I once tried one that looked a lot like chatgpt but I can't remember its name, I don't recall it having that liquid glass style either.

Anonymous
04/08/26(Wed)23:43:32 No.108562924

Anonymous 04/08/26(Wed)23:43:32 No.108562924

>>108562920
llama.cpp server

Anonymous
04/08/26(Wed)23:43:38 No.108562926

Anonymous 04/08/26(Wed)23:43:38 No.108562926

>>108562920
I think that's the llama.cpp's built-in webui. It got pretty quite a while ago.

Anonymous
04/08/26(Wed)23:44:13 No.108562931

Anonymous 04/08/26(Wed)23:44:13 No.108562931

>>108562924
>>108562926
Oh I had no idea, thanks again bros.

Anonymous
04/08/26(Wed)23:44:44 No.108562935

Anonymous 04/08/26(Wed)23:44:44 No.108562935

Okay, found out IQ_XS is very slow with q4 kv that's why. I'll try Q4_KM see if it fits.

Anonymous
04/08/26(Wed)23:45:26 No.108562942

Anonymous 04/08/26(Wed)23:45:26 No.108562942

iq1 just passed my test wtf

Anonymous
04/08/26(Wed)23:46:13 No.108562945

Anonymous 04/08/26(Wed)23:46:13 No.108562945

>>108562901
i guess i'd say that's something 'agent' worthy of for local coding
impressive for sure but i even with offloading it would exceed my system ram kek

Anonymous
04/08/26(Wed)23:47:13 No.108562948

Anonymous 04/08/26(Wed)23:47:13 No.108562948

>>108562942
if you elaborate anything it'd be genuinely interesting tb h

Anonymous
04/08/26(Wed)23:48:39 No.108562956

Anonymous 04/08/26(Wed)23:48:39 No.108562956

File: test1.png (126 KB, 803x857)

126 KB PNG

>>108562948
You are given:

A 2D front-view image of a humanoid character
A full Valve Biped bone list

Task: Reduce the full bone list to a minimal rig and assign 2D positions for those bones so the character can be auto-rigged.

Minimal rig definition (use only these bones):

Head
Neck
Spine (single point, center torso)
Pelvis
LeftShoulder
LeftElbow
LeftHand
RightShoulder
RightElbow
RightHand
LeftHip
LeftKnee
LeftFoot
RightHip
RightKnee
RightFoot

(Map these to closest ValveBiped equivalents.)

Requirements:

Use 2D pixel coordinates (x, y)
Origin (0,0) = top-left of image
x right, y down
Front view only; assume no depth
Maintain symmetry for left/right limbs
Use simple human proportions if unclear
Place joints at natural anatomical pivot points:
Head: top center of skull
Neck: base of head
Spine: mid torso center
Pelvis: hip center
Shoulders: outer upper torso
Elbows: mid arm
Hands: wrist/hand center
Hips: upper legs connection
Knees: mid leg
Feet: ground contact points

Output format (strict JSON):

{
"image_width": <int>,
"image_height": <int>,
"bones": {
"Head": [x, y],
"Neck": [x, y],
"Spine": [x, y],
"Pelvis": [x, y],
"LeftShoulder": [x, y],
"LeftElbow": [x, y],
"LeftHand": [x, y],
"RightShoulder": [x, y],
"RightElbow": [x, y],
"RightHand": [x, y],
"LeftHip": [x, y],
"LeftKnee": [x, y],
"LeftFoot": [x, y],
"RightHip": [x, y],
"RightKnee": [x, y],
"RightFoot": [x, y]
}
}

Do not include explanations. Output only the JSON.

Anonymous
04/08/26(Wed)23:50:37 No.108562966

Anonymous 04/08/26(Wed)23:50:37 No.108562966

I thought I'd never be saying this about a google model but the 31b is too horny

Anonymous
04/08/26(Wed)23:51:02 No.108562969

Anonymous 04/08/26(Wed)23:51:02 No.108562969

Okay, IQ4_XS q8 gets about 18tps 10s inference time,
Q4_KM q4 gets 23tps 20.54s inference time.

Anonymous
04/08/26(Wed)23:52:12 No.108562977

Anonymous 04/08/26(Wed)23:52:12 No.108562977

>>108562969
stop bludgeoning the kv nigga

Anonymous
04/08/26(Wed)23:52:43 No.108562978

Anonymous 04/08/26(Wed)23:52:43 No.108562978

>>108562969
>IQ4_XS Q8
>Q4_KM Q4
how about IQ4_XS Q4 vs Q4_KM..?
makes no sense to compare Q8 vs Q4

Anonymous
04/08/26(Wed)23:53:52 No.108562982

Anonymous 04/08/26(Wed)23:53:52 No.108562982

>>108562948
>>108562956
16gb I tried q8_0 for kv and q4_0, they still do okay but f16 this was just spot on

llama-server \
--host 0.0.0.0 \
--port 8001 \
-hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-IQ1_M \
--mmproj unsloth_1bit/mmproj-F32.gguf \
-c 6000 \
--temp 1.0 \
--top-p 0.95 \
--top-k 64 \
--parallel 1 \
--no-slots \
--swa-checkpoints 0 \
--cache-reuse 256 \
--cache-ram 0 \
--keep -1 \
--reasoning auto \
-kvu \
-b 2048 \
-ub 2048 \
--cache-type-k f16 \
--cache-type-v f16 \
-ngl 999 \
--image-min-tokens 1120 --image-max-tokens 1120

Anonymous
04/08/26(Wed)23:55:01 No.108562984

Anonymous 04/08/26(Wed)23:55:01 No.108562984

>>108562978
see>>108562935and>>108562675
I'm testing kv cache size differences too.

Anonymous
04/08/26(Wed)23:55:14 No.108562986

Anonymous 04/08/26(Wed)23:55:14 No.108562986

File: 1741114995101914.gif (1.16 MB, 320x179)

1.16 MB GIF

>>108562982

Anonymous
04/08/26(Wed)23:57:20 No.108562995

Anonymous 04/08/26(Wed)23:57:20 No.108562995

>>108562966
You're acting like this is a bad thing?

Anonymous
04/09/26(Thu)00:01:37 No.108563009

Anonymous 04/09/26(Thu)00:01:37 No.108563009

>>108562995
kinda, some of my cards go straight to sex rather than building up like they do with my other models. The char no longer does 'reluctant', there's no convincing needed

Anonymous
04/09/26(Thu)00:02:22 No.108563011

Anonymous 04/09/26(Thu)00:02:22 No.108563011

>>108563009
You're just too charming, anon.

Anonymous
04/09/26(Thu)00:03:35 No.108563015

Anonymous 04/09/26(Thu)00:03:35 No.108563015

>>108562348
Expert is the goat, it's a much more smart and pleasant to talk to model than they had previously.

Anonymous
04/09/26(Thu)00:06:48 No.108563025

Anonymous 04/09/26(Thu)00:06:48 No.108563025

I don't even know anymore.
I switched to f16 kv for Q4_KM instead of q8 and I got and it was insanely faster, only 11tps but 0.4s.
Switched to IQ_XS and did the same but it sucked. I switched back to Q4_KM though and now its just being retarded and giving me 10tps 24s. So I don't think winblows is handling my ram correctly at all.

Anonymous
04/09/26(Thu)00:07:46 No.108563029

Anonymous 04/09/26(Thu)00:07:46 No.108563029

>>108562995
sex itself is boring, its everything around it thats interesting

Anonymous
04/09/26(Thu)00:08:24 No.108563031

Anonymous 04/09/26(Thu)00:08:24 No.108563031

>>108563025
Oh that's fucking why, windows has some gay shit like memory compression now, no fucking wonder.

Anonymous
04/09/26(Thu)00:10:12 No.108563038

Anonymous 04/09/26(Thu)00:10:12 No.108563038

>>108562843
Yeah I wish they were clearer with examples, but the fact that they included "Big Models" like that makes me think it's actually only in big models, and the E4B jinjas do not add a closed empty channel when thinking is off. And this on llama.cpp, E4B with its proper template
srv  server_http_: start proxy thread POST /v1/chat/completions
[64958] add_text: <|turn>user
[64958] Hey there, can you say "hi." to me back?<turn|>
[64958] <|turn>model
[64958]
[...]
[64958] Parsing PEG input with format peg-gemma4: <|turn>model
[64958] Hi! 
[64958] Parsed message: {"role":"assistant","content":"Hi! "}
Weird to me that there's no <turn|> anywhere when I search, maybe I should be masking the opening <|turn> and closing <turn|>? Or leaving them in? No idea, for now they're staying

Anonymous
04/09/26(Thu)00:10:26 No.108563040

Anonymous 04/09/26(Thu)00:10:26 No.108563040

>>108563031
While optional, it's something that's been available on linux since forever and it isn't an issue there.

Anonymous
04/09/26(Thu)00:11:52 No.108563043

Anonymous 04/09/26(Thu)00:11:52 No.108563043

I compiled and now I feel Gemma is dumber....

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.