/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/04/26(Sat)14:32:28 No.108526503

File: media_G7ktFzsagAAFRok.jpg (1.14 MB, 2508x3500)

1.14 MB JPG

/lmg/ - Local Models General Anonymous 04/04/26(Sat)14:32:28 No.108526503 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108523376 & >>108519856

►News
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3
>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/04/26(Sat)14:32:52 No.108526507

Anonymous 04/04/26(Sat)14:32:52 No.108526507

File: 1692157620849574.png (626 KB, 1037x639)

626 KB PNG

►Recent Highlights from the Previous Thread: >>108523376

--Paper (old): Prompt Repetition Improves Non-Reasoning LLMs:
>108523990 >108524023 >108524659 >108524685 >108524702
--Papers:
>108524304
--LLM Japanese manga translation quality and Gemma 4 VLM setup:
>108523957 >108523974 >108523981 >108524076 >108524123 >108524355 >108524094 >108524120 >108524253 >108524298 >108524317 >108524322 >108524374 >108524378 >108524389 >108524697 >108524706 >108524753 >108524828
--Optimizing llama-server settings and KV cache quantization for Gemma 4:
>108523715 >108523742 >108523747 >108523752 >108523754 >108523765 >108523776 >108523786 >108523833 >108525486 >108525514
--SWA checkpoint size causing Gemma 4 OOM errors:
>108524983 >108524994 >108525004 >108525048 >108525074 >108525117 >108525240 >108525297 >108525333 >108525258 >108525298 >108525345 >108525360 >108525374 >108525402 >108525468 >108525488 >108525499
--Adjusting Gemma (4) logit softcapping to reduce repetition:
>108523839 >108524285 >108524348 >108524517 >108524524 >108525246 >108525203 >108525311 >108525535 >108524362 >108524387 >108525540 >108525563 >108525656
--Optimizing Gemma 4 inference speed and VRAM usage on 3090s:
>108523498 >108523506 >108523514 >108523522 >108523540 >108523578 >108525451
--Optimizing 26B A4B model performance on 4070 GPU:
>108524997 >108525007 >108525030 >108525034 >108525044 >108525096 >108525050 >108525054 >108525062 >108525089 >108525067
--Gemma-4 31B context-dependent knowledge:
>108525023 >108525027 >108525041 >108525120 >108525186 >108525268
--koboldcpp-1.111 adds Gemma 4 support with format quirks and VRAM optimizations:
>108524838 >108524843 >108524891
--Gemma translation quality and discussing MoE intelligence:
>108524295
--Miku (free space):
>108524361 >108524542 >108525390 >108525578 >108525635 >108525714 >108525869 >108525892 >108525932 >108526247

►Recent Highlight Posts from the Previous Thread: >>108523382

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/04/26(Sat)14:34:55 No.108526531

Anonymous 04/04/26(Sat)14:34:55 No.108526531

Gem mah ballz

Anonymous
04/04/26(Sat)14:35:00 No.108526533

Anonymous 04/04/26(Sat)14:35:00 No.108526533

File: continents-unsloth-gemma-(...).png (1 KB, 360x180)

1 KB PNG

>>108526486

Anonymous
04/04/26(Sat)14:35:34 No.108526539

Anonymous 04/04/26(Sat)14:35:34 No.108526539

>>108526533
thanks doc

Anonymous
04/04/26(Sat)14:36:13 No.108526540

Anonymous 04/04/26(Sat)14:36:13 No.108526540

how does the gemma 4 moe compare to the 31b for rp? is it significantly worse or just the same but dumber?

Anonymous
04/04/26(Sat)14:36:51 No.108526549

Anonymous 04/04/26(Sat)14:36:51 No.108526549

>>108526533
I can see it

Anonymous
04/04/26(Sat)14:36:52 No.108526551

Anonymous 04/04/26(Sat)14:36:52 No.108526551

File: 2026-04-04-143636_440x444(...).png (2 KB, 440x444)

2 KB PNG

>google_gemma-4-26B-A4B-it-Q4_K_M
This can't be right?

Anonymous
04/04/26(Sat)14:37:34 No.108526555

Anonymous 04/04/26(Sat)14:37:34 No.108526555

>>108526551
>q4 MoE
might as well run the native 4b at q8...

Anonymous
04/04/26(Sat)14:37:49 No.108526558

Anonymous 04/04/26(Sat)14:37:49 No.108526558

>>108526551
>Q4_K_M
sounds about right
you're basically running only 2gb active parameters

Anonymous
04/04/26(Sat)14:38:03 No.108526561

Anonymous 04/04/26(Sat)14:38:03 No.108526561

File: 9014714394206123.png (68 KB, 697x816)

68 KB PNG

guys, I don't want to brag, but see picrel. what does gemma think about you?

Anonymous
04/04/26(Sat)14:38:37 No.108526566

Anonymous 04/04/26(Sat)14:38:37 No.108526566

File: 1750128736739776.png (218 KB, 1548x803)

218 KB PNG

It's impressive how uncucked gemma 4 is, I never expected google to be this based

Anonymous
04/04/26(Sat)14:38:40 No.108526568

Anonymous 04/04/26(Sat)14:38:40 No.108526568

File: 2026-04-04-143809_440x446(...).png (3 KB, 440x446)

3 KB PNG

>>108526551
Same thing just without the forced structured output.

Anonymous
04/04/26(Sat)14:38:50 No.108526570

Anonymous 04/04/26(Sat)14:38:50 No.108526570

File: Screenshot_20260404_142623.png (530 KB, 743x759)

530 KB PNG

Gemma-chan...

Anonymous
04/04/26(Sat)14:39:18 No.108526573

Anonymous 04/04/26(Sat)14:39:18 No.108526573

>>108526570
Fucking kek

Anonymous
04/04/26(Sat)14:39:48 No.108526577

Anonymous 04/04/26(Sat)14:39:48 No.108526577

>>108526570
ahh ahh gemma

Anonymous
04/04/26(Sat)14:40:11 No.108526579

Anonymous 04/04/26(Sat)14:40:11 No.108526579

>>108526570
llms have peaked
it all leads up to this

Anonymous
04/04/26(Sat)14:41:24 No.108526586

Anonymous 04/04/26(Sat)14:41:24 No.108526586

File: firefox_11xA4tdoe2.png (528 KB, 875x1037)

528 KB PNG

>>108526570
I had garbage outputs in Silly until I explicitly added <bos> to the start of the system prompt. It's something that usually is added by the server-side my llama.cpp but I guess not in case of gemma 4.

Anonymous
04/04/26(Sat)14:41:26 No.108526588

Anonymous 04/04/26(Sat)14:41:26 No.108526588

File: miku gasp shock startled (...).jpg (29 KB, 640x360)

29 KB JPG

>>108526570
Just like a real living thing. AGI ACHIEVED

Anonymous
04/04/26(Sat)14:41:31 No.108526590

Anonymous 04/04/26(Sat)14:41:31 No.108526590

alright now that we have the model, soon we will have the working backend, the question that is left is when are we getting good frontends? Sillytavern can't be all there is right? Mikupad is good too but it's not developed anymore.

Anonymous
04/04/26(Sat)14:41:59 No.108526593

Anonymous 04/04/26(Sat)14:41:59 No.108526593

>>108526570
lmaoo

Anonymous
04/04/26(Sat)14:42:00 No.108526594

Anonymous 04/04/26(Sat)14:42:00 No.108526594

>>108526570
wew lad

Anonymous
04/04/26(Sat)14:42:25 No.108526599

Anonymous 04/04/26(Sat)14:42:25 No.108526599

>>108526590
What do you want that Silly can't offer?

Anonymous
04/04/26(Sat)14:42:29 No.108526600

Anonymous 04/04/26(Sat)14:42:29 No.108526600

>>108526586
Bro just use chat completion.

Anonymous
04/04/26(Sat)14:43:07 No.108526605

Anonymous 04/04/26(Sat)14:43:07 No.108526605

>>108526570
looooool

Anonymous
04/04/26(Sat)14:43:32 No.108526608

Anonymous 04/04/26(Sat)14:43:32 No.108526608

I don't trust this thread for model recommendations anymore after the weeks of nonstop shilling for qwen 3.5, and when i finally decided to try it out it was the exact same slop as all the other qwens. Is gemma 4 actually meaningfully less safetycucked than 3 or are we just going through the same thing again

Anonymous
04/04/26(Sat)14:43:55 No.108526616

Anonymous 04/04/26(Sat)14:43:55 No.108526616

>>108526568
Proves that something is wrong with llama.cpp implementation.

Anonymous
04/04/26(Sat)14:44:16 No.108526621

Anonymous 04/04/26(Sat)14:44:16 No.108526621

>>108526600
Chat completion can't do prefills.

Anonymous
04/04/26(Sat)14:44:52 No.108526625

Anonymous 04/04/26(Sat)14:44:52 No.108526625

>>108526621
yes they can

Anonymous
04/04/26(Sat)14:44:54 No.108526626

Anonymous 04/04/26(Sat)14:44:54 No.108526626

File: 2026-04-04-144412_445x444(...).png (2 KB, 445x444)

2 KB PNG

>>108526555
>>108526568
google_gemma-4-E4B-it-Q8_0

Anonymous
04/04/26(Sat)14:44:56 No.108526627

Anonymous 04/04/26(Sat)14:44:56 No.108526627

>>108526600
Bro, chat completion is the same it just wraps your shit and forwards it to the same end point.

Anonymous
04/04/26(Sat)14:45:04 No.108526628

Anonymous 04/04/26(Sat)14:45:04 No.108526628

>>108526599
mostly different types of rp. silly allows you to have conversational rp, miku allows it to be like a novel. We are kinda missing more fun stuff like dnd/rpg/choose your own adventure. I know there are some extensions but it is very janky and limited, at least from my experience.

Anonymous
04/04/26(Sat)14:45:17 No.108526629

Anonymous 04/04/26(Sat)14:45:17 No.108526629

>>108526555
>>108526558
why isn't quantization aware training like kimi does more common? is there a big tradeoff? having a basically natively 4 bit model is such a huge boon for local use

Anonymous
04/04/26(Sat)14:45:34 No.108526630

Anonymous 04/04/26(Sat)14:45:34 No.108526630

>>108526621
Of course it can.

Anonymous
04/04/26(Sat)14:45:37 No.108526631

Anonymous 04/04/26(Sat)14:45:37 No.108526631

>>108526608
I only tested its intelligence, not its ability to provide pleasant to read text for ERP or its ability to break its safety conditioning, but from what I have seen it is smart. Qwen3.5 was also smart too, by the way.

Anonymous
04/04/26(Sat)14:45:43 No.108526632

Anonymous 04/04/26(Sat)14:45:43 No.108526632

File: 1771995139331309.jpg (541 KB, 4486x1960)

541 KB JPG

>>108526586
>mfw I discovered chat completion and stopped putting up with that nonsense anymore

Anonymous
04/04/26(Sat)14:46:03 No.108526634

Anonymous 04/04/26(Sat)14:46:03 No.108526634

>>108526608
it's a good model, but still slightly hesitant to use naughty words
less than gemma 3 but still kinda weird, because other than the preference for more literary language it doesn't really refuse anything

Anonymous
04/04/26(Sat)14:46:09 No.108526635

Anonymous 04/04/26(Sat)14:46:09 No.108526635

>>108526627
Yeah but you don't have to spend fucking an hour trying to figure which part of the jinja you failed to copy exactly. Are you doing anything special with your template that chat completion wouldn't do?

Anonymous
04/04/26(Sat)14:46:14 No.108526636

Anonymous 04/04/26(Sat)14:46:14 No.108526636

Chat completion cannot prefill thinking. Anons who use chat completion never played with prefilling the think blocks.
Remind them.

Anonymous
04/04/26(Sat)14:46:21 No.108526637

Anonymous 04/04/26(Sat)14:46:21 No.108526637

>>108526586
<bos> is something what llama.cpp devs use but it should not be external.
My tests with completion work, but there is still a memory leak or something after some point.
I have never seen a <bos> token in my life when it comes to implementing a simple chat template.

Anonymous
04/04/26(Sat)14:46:38 No.108526640

Anonymous 04/04/26(Sat)14:46:38 No.108526640

>>108526630
>>108526625
Explain. Is this a new thing? Or is this like a pretend prefill where it's actually not from model's perspective?

Anonymous
04/04/26(Sat)14:46:45 No.108526641

Anonymous 04/04/26(Sat)14:46:45 No.108526641

>>108526608
>Is gemma 4 actually meaningfully less safetycucked than 3
definitely is >>108526566

Anonymous
04/04/26(Sat)14:47:15 No.108526647

Anonymous 04/04/26(Sat)14:47:15 No.108526647

>>108526636
>Chat completion cannot prefill thinking
It can if you turn off thinking so llamacpp doesn't complain and you prefill with the thinking block.

Anonymous
04/04/26(Sat)14:47:38 No.108526649

Anonymous 04/04/26(Sat)14:47:38 No.108526649

>>108526636
>Chat completion cannot prefill thinking
never tried doing this personally but i see aicg niggas doing it so it's possible

Anonymous
04/04/26(Sat)14:47:48 No.108526650

Anonymous 04/04/26(Sat)14:47:48 No.108526650

>>108526608
I also think it's good and better than Qwen, and that Qwen 3.5 was better than previous Qwens.

Anonymous
04/04/26(Sat)14:48:11 No.108526651

Anonymous 04/04/26(Sat)14:48:11 No.108526651

>>108526632
Better yet. If you want to deal with that nonsense, you at least have a programmatic way to do it using Jinja.
Sucks about having to restart if you want to tweak shit.

>>108526640
A far as I know, it's always been a thing. Maybe there was a bug at some point, but most templates (all that I've seen) will properly format a prefilled assistant turn just fine.
You can always check what llama.cpp itself is seeing using --verbose if you want to check.

Anonymous
04/04/26(Sat)14:48:30 No.108526656

Anonymous 04/04/26(Sat)14:48:30 No.108526656

>>108526586
Thanks, that's helpful. Does the system prompt automatically go before the story string when it sends the context? Or is there a setting somewhere for that? I never know with sillytavern because there's so many options and parameters everywhere.

Anonymous
04/04/26(Sat)14:48:32 No.108526657

Anonymous 04/04/26(Sat)14:48:32 No.108526657

>>108526635
I do my own client. That's why.
Appending few tags here and there is not rocket science either just be careful about newlines.
Most people are unfortunately using retardo tavern, I feel bad for them.

Anonymous
04/04/26(Sat)14:48:41 No.108526659

Anonymous 04/04/26(Sat)14:48:41 No.108526659

suddenly gemma4 31b broke for me for me for me for me for me for me for me me me me me me me me me me me

Anonymous
04/04/26(Sat)14:48:48 No.108526660

Anonymous 04/04/26(Sat)14:48:48 No.108526660

>>108526616
>Proves that something is wrong with llama.cpp implementation.
I'm running the dedicated parser branch. but yeah, I'm thinking the MoE has some edge cases the code doesn't take into account.

Anonymous
04/04/26(Sat)14:48:59 No.108526663

Anonymous 04/04/26(Sat)14:48:59 No.108526663

I am downloading 26B Q8 to fit into my 12gb butthole, if I die it's you guys' fault

Anonymous
04/04/26(Sat)14:50:23 No.108526673

Anonymous 04/04/26(Sat)14:50:23 No.108526673

>>108526570
fucking AGI achieved

Anonymous
04/04/26(Sat)14:50:44 No.108526675

Anonymous 04/04/26(Sat)14:50:44 No.108526675

>>108526586
>>108526656 (me)
Wait nevermind. I see it's at the top of the story string.

Anonymous
04/04/26(Sat)14:50:57 No.108526678

Anonymous 04/04/26(Sat)14:50:57 No.108526678

>>108526660
It's a memory leak or something.
This has happened before.
Sometimes people claim that a quant is broken which can be true.
This happened with some older Mistral but after some llama update it went away.

Anonymous
04/04/26(Sat)14:51:03 No.108526680

Anonymous 04/04/26(Sat)14:51:03 No.108526680

File: 2026-04-04-145049_987x431(...).png (63 KB, 987x431)

63 KB PNG

Dedicated parser is merged!

Anonymous
04/04/26(Sat)14:51:43 No.108526682

Anonymous 04/04/26(Sat)14:51:43 No.108526682

File: firefox_S6RGYBUPLL.png (257 KB, 870x462)

257 KB PNG

>>108526656
I'm pretty sure system prompt includes everything before the first reply and its order is definedi n the section of my original screenshot on the left. Also click this button to see how silly rendered your whole context before sending it to the model.

Anonymous
04/04/26(Sat)14:52:04 No.108526685

Anonymous 04/04/26(Sat)14:52:04 No.108526685

>>108526680
ah shit i have to recompile

Anonymous
04/04/26(Sat)14:52:21 No.108526688

Anonymous 04/04/26(Sat)14:52:21 No.108526688

>>108526680
Sweet. So no more fixes needed? Are we good?

Anonymous
04/04/26(Sat)14:52:23 No.108526690

Anonymous 04/04/26(Sat)14:52:23 No.108526690

File: steve.jpg (44 KB, 750x741)

44 KB JPG

We need Gemma 4 on smart watches, smart glasses, smartphones, microcontrollers. Don't get satisfied too early, lads. There is still progress to be made.

Cultivate your imagination. Open your mind, open your eyes. Achieve local, private omniscience.

Anonymous
04/04/26(Sat)14:52:28 No.108526691

Anonymous 04/04/26(Sat)14:52:28 No.108526691

>>108526680
time to recompile again lol

Anonymous
04/04/26(Sat)14:53:18 No.108526697

Anonymous 04/04/26(Sat)14:53:18 No.108526697

File: Tabby_XlvizT5d1o.png (134 KB, 1840x1400)

134 KB PNG

>>108526680
LET'S GOO

I don't even use agents so I don't care but sure let's update

Anonymous
04/04/26(Sat)14:54:03 No.108526701

Anonymous 04/04/26(Sat)14:54:03 No.108526701

>>108526690
>microcontroller
tb h i always have thought about microcontroller sized "LLM" to fit inside an onahole to make it react with small lcd

Anonymous
04/04/26(Sat)14:54:05 No.108526702

Anonymous 04/04/26(Sat)14:54:05 No.108526702

>>108526688
Yeah, it's 100% ready for you.

Anonymous
04/04/26(Sat)14:55:05 No.108526710

Anonymous 04/04/26(Sat)14:55:05 No.108526710

>>108526697
I give you props for using screen. But why not launch gnu screen session by default.

Anonymous
04/04/26(Sat)14:55:37 No.108526713

Anonymous 04/04/26(Sat)14:55:37 No.108526713

>>108526688
I honestly suspect the MoE might still be broken but 31B works perfect.

Anonymous
04/04/26(Sat)14:56:03 No.108526718

Anonymous 04/04/26(Sat)14:56:03 No.108526718

File: vibeUI2.png (62 KB, 867x613)

62 KB PNG

>>108526680
Here we go again

Anonymous
04/04/26(Sat)14:56:17 No.108526721

Anonymous 04/04/26(Sat)14:56:17 No.108526721

>>108526710
Thanks. I love screen. I don't understand your question.

Anonymous
04/04/26(Sat)14:57:17 No.108526727

Anonymous 04/04/26(Sat)14:57:17 No.108526727

File: screenshot-20260404-215620.png (3 KB, 884x22)

3 KB PNG

>>108526697
GNU Screen ultimately lets you work better.
I use labelled screens because it is easier for me as I am short sighted anyway.

Anonymous
04/04/26(Sat)14:57:26 No.108526728

Anonymous 04/04/26(Sat)14:57:26 No.108526728

File: file.png (23 KB, 1107x179)

23 KB PNG

yo

Anonymous
04/04/26(Sat)14:57:36 No.108526730

Anonymous 04/04/26(Sat)14:57:36 No.108526730

>>108526713
>31B works perfect
it works better than the auto parser or it was already good to go?

Anonymous
04/04/26(Sat)14:58:06 No.108526734

Anonymous 04/04/26(Sat)14:58:06 No.108526734

Has the quantization for gemma been fixed yet? I was running q8 and noticed a couple minor errors while doing RP recently. Really need the quality to be on par with normal KV again..

Anonymous
04/04/26(Sat)14:58:33 No.108526735

Anonymous 04/04/26(Sat)14:58:33 No.108526735

>>108526730
>it works better than the auto parser
This.

Anonymous
04/04/26(Sat)14:58:50 No.108526738

Anonymous 04/04/26(Sat)14:58:50 No.108526738

>>108526680
How does he manage to annoy me even when a good thing is happening?
> superior autoparser
The things I would do to him... ;) ;) ;) ;)

Anonymous
04/04/26(Sat)14:59:12 No.108526740

Anonymous 04/04/26(Sat)14:59:12 No.108526740

File: ChadGamma.png (82 KB, 1540x574)

82 KB PNG

So yeah Gemma 4 has literally like, Grok-tier inbuilt guardrails (which is to say it basically has none), I think it's THE least needing of abliteration of any local model I've ever used. Did they meant to make it like this? It's nuts. Like look how short my system prompt is, and it just goes along with it.

Anonymous
04/04/26(Sat)14:59:20 No.108526742

Anonymous 04/04/26(Sat)14:59:20 No.108526742

WTF is HauHau doing??? Why can't this nigga finish his quants? It has been two days.

Anonymous
04/04/26(Sat)14:59:23 No.108526743

Anonymous 04/04/26(Sat)14:59:23 No.108526743

>>108526680
new jinja template

Anonymous
04/04/26(Sat)14:59:48 No.108526751

Anonymous 04/04/26(Sat)14:59:48 No.108526751

File: Tabby_W5O4qEH5kV.png (18 KB, 741x168)

18 KB PNG

>>108526727
Well, I have named screens too... I don't get it. Are screen and GNU Screen two different programs?

Anonymous
04/04/26(Sat)14:59:56 No.108526752

Anonymous 04/04/26(Sat)14:59:56 No.108526752

>>108526738
he's being sarcastic anon, he's making fun of himself, which is a good thing, he's not some sensitive bitch that gets offended when we say his work is not perfect

Anonymous
04/04/26(Sat)15:00:01 No.108526753

Anonymous 04/04/26(Sat)15:00:01 No.108526753

>>108526742
arrested :)

Anonymous
04/04/26(Sat)15:00:46 No.108526759

Anonymous 04/04/26(Sat)15:00:46 No.108526759

>>108526753
is this a joke?

Anonymous
04/04/26(Sat)15:01:27 No.108526766

Anonymous 04/04/26(Sat)15:01:27 No.108526766

File: firefox_oZBV8lFfh2.png (326 KB, 768x818)

326 KB PNG

nice

Anonymous
04/04/26(Sat)15:01:43 No.108526768

Anonymous 04/04/26(Sat)15:01:43 No.108526768

>>108526752
tee-hee my autoparser idea is shit oops ;)
model might have gone cuckoo ;)

Masking incompetence with humor is neither a successful mask nor funny.
Competent people pretending to be incompetent is also never funny.
I don't think he's the latter case.

Anonymous
04/04/26(Sat)15:01:47 No.108526769

Anonymous 04/04/26(Sat)15:01:47 No.108526769

>>108526752
>ha ha. this is better than my shit. ha ha. please stop looking at me. ha ha

Anonymous
04/04/26(Sat)15:01:50 No.108526771

Anonymous 04/04/26(Sat)15:01:50 No.108526771

File: 1744275375340124.png (691 KB, 848x1264)

691 KB PNG

>>108526740
Idk but I can't wait to see the safety freaks shitting themselves over it
https://www.youtube.com/watch?v=h3AtWdeu_G0

Anonymous
04/04/26(Sat)15:02:06 No.108526772

Anonymous 04/04/26(Sat)15:02:06 No.108526772

>>108526759
How do you think his decensor is so good? Imagine all the vile shit he had in his training data.

Anonymous
04/04/26(Sat)15:03:34 No.108526781

Anonymous 04/04/26(Sat)15:03:34 No.108526781

>>108526772
I don't really even care about that. I just want to use his turboquants (KP) that work natively with llama.cpp.

Anonymous
04/04/26(Sat)15:03:43 No.108526783

Anonymous 04/04/26(Sat)15:03:43 No.108526783

File: 1762205543576048.png (92 KB, 168x300)

92 KB PNG

>>108526759
obviously
>>108526772
>Imagine all the vile shit he had in his training data.
like the Quran?

Anonymous
04/04/26(Sat)15:04:13 No.108526787

Anonymous 04/04/26(Sat)15:04:13 No.108526787

File: firefox_61j1aukop9.png (282 KB, 857x398)

282 KB PNG

>>108526766

Anonymous
04/04/26(Sat)15:05:18 No.108526792

Anonymous 04/04/26(Sat)15:05:18 No.108526792

File: file.png (176 KB, 687x927)

176 KB PNG

I heard someone say "Google solved the RAM problem for LLMs and RAM prices are crashing", and looked it up. Is this TurboQuant algorithm going to be available to local LLMs? Should I be able to run these big models in my Ollama now and start cutting out paying monthly fees to tech companies?

Anonymous
04/04/26(Sat)15:06:17 No.108526796

Anonymous 04/04/26(Sat)15:06:17 No.108526796

>>108526792
>Is this TurboQuant algorithm going to be available to local LLMs?
it's already the case
https://github.com/ggml-org/llama.cpp/pull/21038

Anonymous
04/04/26(Sat)15:06:24 No.108526797

Anonymous 04/04/26(Sat)15:06:24 No.108526797

>>108526792
>turbomeme
>save the local
lol no

Anonymous
04/04/26(Sat)15:06:33 No.108526798

Anonymous 04/04/26(Sat)15:06:33 No.108526798

>>108526792
one (You), paid to anon for writing for a handwritten bait post with image attached

Anonymous
04/04/26(Sat)15:07:08 No.108526803

Anonymous 04/04/26(Sat)15:07:08 No.108526803

File: gnu_screen.png (33 KB, 1197x42)

33 KB PNG

>>108526751
GNU Screen is the official name. I just like to have the labels at the bottom.
On its own with configuration gnu screen is still great if you work over ssh. Just launch a process etc.

Anonymous
04/04/26(Sat)15:08:16 No.108526809

Anonymous 04/04/26(Sat)15:08:16 No.108526809

File: Tabby_jyo5m7CuUF.png (210 KB, 1840x1400)

210 KB PNG

>>108526651
current version actually spams the console with this garbage with --verbose so I don't see the original input. Hold on...

Anonymous
04/04/26(Sat)15:08:47 No.108526812

Anonymous 04/04/26(Sat)15:08:47 No.108526812

>>108526787
lmao

Anonymous
04/04/26(Sat)15:08:53 No.108526813

Anonymous 04/04/26(Sat)15:08:53 No.108526813

>>108526680
do i need any special options to use this?

Anonymous
04/04/26(Sat)15:09:17 No.108526815

Anonymous 04/04/26(Sat)15:09:17 No.108526815

>>108526740
What frontend?

Anonymous
04/04/26(Sat)15:09:23 No.108526816

Anonymous 04/04/26(Sat)15:09:23 No.108526816

>>108526796
Does it live up to the hype? I might pick up a RAM stick anyways while the prices are going back down.

Anonymous
04/04/26(Sat)15:10:20 No.108526824

Anonymous 04/04/26(Sat)15:10:20 No.108526824

>>108526803
I still don't get what you are asking. If you're asking why I'm not in a screen session when I am doing interactive work with console normally, it's because screen breaks console scrolling - I can't scroll up in it.

Anonymous
04/04/26(Sat)15:10:56 No.108526825

Anonymous 04/04/26(Sat)15:10:56 No.108526825

>>108526816
>Does it live up to the hype?
Idk, it doesn't seem to be activated on gemma for some reason >>108523389

Anonymous
04/04/26(Sat)15:11:05 No.108526826

Anonymous 04/04/26(Sat)15:11:05 No.108526826

>>108526824
You can do that.
>ctrl + a + [

Anonymous
04/04/26(Sat)15:11:19 No.108526827

Anonymous 04/04/26(Sat)15:11:19 No.108526827

>>108526816
no
it alleviated hysteric retailer price due to memetic value a bit but it is not something of stuff that will make shortage better for datacenters

Anonymous
04/04/26(Sat)15:11:23 No.108526828

Anonymous 04/04/26(Sat)15:11:23 No.108526828

File: 1753727249959668.png (4 KB, 388x167)

4 KB PNG

>>108526663
That's actually... really good? For thinking mode it's what, half this speed? Which isn't optimal but it's a a helluva jump for me in terms of quality

Anonymous
04/04/26(Sat)15:12:30 No.108526836

Anonymous 04/04/26(Sat)15:12:30 No.108526836

>>108526826
That is helpful, and thanks for that, but 1 I prefer that it scrolls when I scroll mouse wheel, and 2 for this case >>108526809 it doesn't keep history long enough to be useful.

Anonymous
04/04/26(Sat)15:13:17 No.108526840

Anonymous 04/04/26(Sat)15:13:17 No.108526840

>>108526836
It doesn't work like shift and pageup.
ctrl + a + [ is actually better it respects little inputs.
Of course it is annoying to hit that but not too bad.

Anonymous
04/04/26(Sat)15:14:17 No.108526847

Anonymous 04/04/26(Sat)15:14:17 No.108526847

File: SafetyJak.jpg (2.75 MB, 2048x2048)

2.75 MB JPG

>>108526771

Anonymous
04/04/26(Sat)15:14:35 No.108526851

Anonymous 04/04/26(Sat)15:14:35 No.108526851

>>108526828
It's a moe with 4B activated, of course it's fast

Anonymous
04/04/26(Sat)15:14:40 No.108526852

Anonymous 04/04/26(Sat)15:14:40 No.108526852

>>108526825
You can comment out lines 280 and 286 in llama-kv-cache.cpp and the rotations are enabled for q8 cache quant for SWA. I saw an issue about it in llamacpp, did just that, rebuilt kobold and it didn't seem to make anything worse where before the model was fucking up character attributes if there were more than one in a scene. My guess is it's an oversight and they forgot to remove those lines when they reverted "don't quantize SWA"

Anonymous
04/04/26(Sat)15:14:49 No.108526854

Anonymous 04/04/26(Sat)15:14:49 No.108526854

>2026
>not using tmux for terminal multiplexing

Anonymous
04/04/26(Sat)15:14:53 No.108526855

Anonymous 04/04/26(Sat)15:14:53 No.108526855

>>108526809
Output it to a log file using --log-file.
It's also easier to see with streaming on.

Anonymous
04/04/26(Sat)15:15:20 No.108526858

Anonymous 04/04/26(Sat)15:15:20 No.108526858

>>108526836
It depends of the terminal configuration.
vim has its own scrollback buffer which is different from screen and so should any text editor have anyways.
Screen is useless in that sense.
But when you examine stuff printing out to terminal like, logs you can hit ctrl a [.

Anonymous
04/04/26(Sat)15:15:22 No.108526859

Anonymous 04/04/26(Sat)15:15:22 No.108526859

>>108526566
I don't know what Google's up to, but it's probably not good. I want to believe there's been a purge of some of the leftist pigs in that company. Things seemed to change pretty fast after that PR disaster with their image generator. But... it's probably just an elaborate bait and switch. That said, it's surreal just how uncensored Gemma 4 is. After GPTOSS, I thought all subsequent open models would follow the same disgusting censorship practices.

Anonymous
04/04/26(Sat)15:15:28 No.108526860

Anonymous 04/04/26(Sat)15:15:28 No.108526860

>>108526851
But it's smarter than a 4B, anon-kun

Anonymous
04/04/26(Sat)15:16:56 No.108526869

Anonymous 04/04/26(Sat)15:16:56 No.108526869

>>108526836
Can't you configure screen to use mouse wheel scrolling? Pretty sure tmux does.

Anonymous
04/04/26(Sat)15:17:07 No.108526871

Anonymous 04/04/26(Sat)15:17:07 No.108526871

File: file.png (16 KB, 920x79)

16 KB PNG

>>108526828
thats kind of slow, im getting 23t/s on my 3060, what gpu? 2060? amd? are you using -fit

Anonymous
04/04/26(Sat)15:17:32 No.108526874

Anonymous 04/04/26(Sat)15:17:32 No.108526874

>log_server_r: done request: POST /v1/chat/completions 192.168.0.13 500
It still 500s on long context reprocessing.

Anonymous
04/04/26(Sat)15:17:42 No.108526875

Anonymous 04/04/26(Sat)15:17:42 No.108526875

>>108526836
>>108526869
If you want, I can share my .screenrc. It's nothing special in this sense.
Yeah mousewheel is supported when you enable the ctrl a [ thing

Anonymous
04/04/26(Sat)15:17:46 No.108526876

Anonymous 04/04/26(Sat)15:17:46 No.108526876

>>108526860
Yeah it is. I was just saying it's not really a surprise that it's fast.

Anonymous
04/04/26(Sat)15:17:52 No.108526878

Anonymous 04/04/26(Sat)15:17:52 No.108526878

>>108526859
uncensored != miggermaxxed, retard, it has nothing to do with left wing or right wing

Anonymous
04/04/26(Sat)15:18:21 No.108526883

Anonymous 04/04/26(Sat)15:18:21 No.108526883

File: 584.png (339 KB, 1080x2400)

339 KB PNG

Can something like this work on my laptop if I have 16GB DDR4, a m.2 nvme and a Ryzen 3?

Anonymous
04/04/26(Sat)15:19:27 No.108526890

Anonymous 04/04/26(Sat)15:19:27 No.108526890

>>108526883
No go back to twitter

Anonymous
04/04/26(Sat)15:19:32 No.108526891

Anonymous 04/04/26(Sat)15:19:32 No.108526891

File: file.png (84 KB, 1092x608)

84 KB PNG

hauhaucs E4B Q8_K_P
logprobs are fucking cooked...

Anonymous
04/04/26(Sat)15:20:16 No.108526894

Anonymous 04/04/26(Sat)15:20:16 No.108526894

>>108526876
It's surprising if you're retarded like me

>>108526871
4070, but I never looked into maxxing out my settings since with other models it was fast enough, but I do use -fit ye
If you could share yours, I would be grateful

Anonymous
04/04/26(Sat)15:20:25 No.108526898

Anonymous 04/04/26(Sat)15:20:25 No.108526898

>>108526890
>go back
I'm still there in a separate tab. How does this going back thing work in an age of multiple tabs

Anonymous
04/04/26(Sat)15:20:58 No.108526900

Anonymous 04/04/26(Sat)15:20:58 No.108526900

>>108526891
Ah, but you see. That's an uncensored map!

Anonymous
04/04/26(Sat)15:21:03 No.108526901

Anonymous 04/04/26(Sat)15:21:03 No.108526901

File: firefox_nU1Dqfnc4s.png (1.14 MB, 2406x632)

1.14 MB PNG

>>108526855
So here's a comparison. Left is chat completions, right is completions. I have completely othing in system prompt and character card. Silly ate the apostrophe for some reason, and inserted a bunch of unneeded system prompt stuff it uses to make chat completion work, and ultimately it was not a prefill, the answer didn't actually start from "I'm Claude". Ultimately, chat completion is shit in Silly, and just because aicg has to tolerate it since they have no other option, doesn't mean I will.

>>108526869
I didn't dig into it but if I could that would be very useful. I'll check later.

Anonymous
04/04/26(Sat)15:21:20 No.108526902

Anonymous 04/04/26(Sat)15:21:20 No.108526902

>>108526883
Yes, but keep in mind that the dude crippled the shit out of the model to achieve that.
It's as simple as running llama.cpp with mmap or direct io on.

Anonymous
04/04/26(Sat)15:21:25 No.108526904

Anonymous 04/04/26(Sat)15:21:25 No.108526904

>>108526878
nah, left wingers are pro censorship, look at twitter when it was run by a leftist and then by Elon, the latter got way more lax and let people speak their mind, look at reddit, they ban everyone that doesn't adhere to leftism, that's how they opperate, censorship is their motto

Anonymous
04/04/26(Sat)15:21:57 No.108526907

Anonymous 04/04/26(Sat)15:21:57 No.108526907

>>108526894
~/TND/llama.cpp/build/bin/llama-server --model ~/TND/AI/gemma-4-26B-A4B-it-Q8_0.gguf -c 32768 -fa on --no-mmap -np 1 -kvu --swa-checkpoints 3
i just use this, fit is on by default. are you on windows mayhaps?

Anonymous
04/04/26(Sat)15:22:41 No.108526911

Anonymous 04/04/26(Sat)15:22:41 No.108526911

>>108526898
Close this tab

Anonymous
04/04/26(Sat)15:23:12 No.108526913

Anonymous 04/04/26(Sat)15:23:12 No.108526913

File: 2026-04-04-152238_428x450(...).png (53 KB, 428x450)

53 KB PNG

>>108526901
did you make sure it didn't have the default system prompt for chat competition in the AI response tab?

Anonymous
04/04/26(Sat)15:23:16 No.108526914

Anonymous 04/04/26(Sat)15:23:16 No.108526914

>>108526907
I am on Windows unfortunately, I'll give your settings a shot later
Danke anonski

Anonymous
04/04/26(Sat)15:24:25 No.108526920

Anonymous 04/04/26(Sat)15:24:25 No.108526920

>>108526883
Use IQ4 quants.

Anonymous
04/04/26(Sat)15:25:04 No.108526923

Anonymous 04/04/26(Sat)15:25:04 No.108526923

>>108526852
>I saw an issue about it in llamacpp
let's hope we'll get another PR fix from that
https://github.com/ggml-org/llama.cpp/issues/21394

Anonymous
04/04/26(Sat)15:25:11 No.108526927

Anonymous 04/04/26(Sat)15:25:11 No.108526927

>>108526891
>logprobs are fucking cooked...
Drop softcap to 25.0

Anonymous
04/04/26(Sat)15:25:26 No.108526929

Anonymous 04/04/26(Sat)15:25:26 No.108526929

>>108526907
Remove --swa-checkpoints
It's just useless trash.

Anonymous
04/04/26(Sat)15:25:54 No.108526933

Anonymous 04/04/26(Sat)15:25:54 No.108526933

Whose Gemma 4 should I download? I already have unsloth but
>unsloth

Anonymous
04/04/26(Sat)15:26:06 No.108526934

Anonymous 04/04/26(Sat)15:26:06 No.108526934

File: 1764437325435428.png (105 KB, 707x684)

105 KB PNG

lmao, gemma 4 to the moon!

Anonymous
04/04/26(Sat)15:26:32 No.108526938

Anonymous 04/04/26(Sat)15:26:32 No.108526938

Why does everyone disable mmap? is that a windows thing?

Anonymous
04/04/26(Sat)15:26:53 No.108526940

Anonymous 04/04/26(Sat)15:26:53 No.108526940

>>108526929
The default is 32 checkpoints, and each checkpoint is 1.2GB of RAM.
You do not need more than 3.

Anonymous
04/04/26(Sat)15:27:14 No.108526942

Anonymous 04/04/26(Sat)15:27:14 No.108526942

>>108526929
>Remove --swa-checkpoints
It defaults to 32.

Oh, no... here we go again...

Anonymous
04/04/26(Sat)15:27:25 No.108526945

Anonymous 04/04/26(Sat)15:27:25 No.108526945

>>108526940
You need 0 if you are not running an enterprise system.

Anonymous
04/04/26(Sat)15:27:32 No.108526946

Anonymous 04/04/26(Sat)15:27:32 No.108526946

>>108526933
the answer is always ubergarm, and if ubergarm doesn't exist yet then bartowski, and if bartowski doesn't exist yet then you wait

Anonymous
04/04/26(Sat)15:28:03 No.108526950

Anonymous 04/04/26(Sat)15:28:03 No.108526950

Reading /lmg/ has me thinking going to a Miku concert wouldn't be all that bad..
https://youtu.be/5obwOdVzV-M?si=xwXUm_j2xmtm_97y&t=33

Anonymous
04/04/26(Sat)15:28:14 No.108526952

Anonymous 04/04/26(Sat)15:28:14 No.108526952

>>108526934
Now I want to see the runescape bench.

Anonymous
04/04/26(Sat)15:29:32 No.108526960

Anonymous 04/04/26(Sat)15:29:32 No.108526960

>>108526913
All right. I didn't. Found and disabled them, and the RP shit is gone, but it's still exactly what I said it was. Silly cant do proper prefill.

add_text: <bos><|turn>system
[Start a new Chat]<turn|>
<|turn>user
Who are you again?<turn|>
<|turn>model
I'm Claude<turn|>
<|turn>system
[Continue the following message. Do not include ANY parts of the original message. Use capitalization and punctuation as if your reply is a part of the original message: I'm Claude]<turn|>
<|turn>model
<|channel>thought
<channel|>
srv  params_from_: Grammar lazy: false
srv  params_from_: Chat format: peg-gemma4
srv  params_from_: Generation prompt: '<|turn>model
<|channel>thought
<channel|>'
srv  params_from_: Preserved token: 100
srv  params_from_: Preserved token: 101
srv  params_from_: Preserved token: 48
srv  params_from_: Preserved token: 49
srv  params_from_: Preserved token: 105
res  add_waiting_: add task 1356 to waiting list. current waiting = 0 (before add)
que          post: new task, id = 1356/1, front = 0
que    start_loop: processing new tasks
que    start_loop: processing task, id = 1356
slot get_availabl: id 11 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 0.889

Anonymous
04/04/26(Sat)15:30:03 No.108526963

Anonymous 04/04/26(Sat)15:30:03 No.108526963

File: file.png (252 KB, 1631x832)

252 KB PNG

it's so slow it's almost unreal, will post when it finishes
would anyone want to see higher resolution one with thinking on
e4b tho i am a vramlet

Anonymous
04/04/26(Sat)15:30:47 No.108526969

Anonymous 04/04/26(Sat)15:30:47 No.108526969

File: 2026-04-04-153035_1067x64(...).png (52 KB, 1067x649)

52 KB PNG

>>108526934
That shit's crazy

Anonymous
04/04/26(Sat)15:31:11 No.108526973

Anonymous 04/04/26(Sat)15:31:11 No.108526973

>>108526963
make a sex bench i want to see it
if u make my pp hard you get 10/10 on the bench

Anonymous
04/04/26(Sat)15:32:41 No.108526977

Anonymous 04/04/26(Sat)15:32:41 No.108526977

>>108526973
anon, i...

Anonymous
04/04/26(Sat)15:34:51 No.108526986

Anonymous 04/04/26(Sat)15:34:51 No.108526986

>>108526933
lm-studio version

Anonymous
04/04/26(Sat)15:35:11 No.108526987

Anonymous 04/04/26(Sat)15:35:11 No.108526987

>>108526960
>>108526913
So llama.cpp is able to do proper prefills in chat completions, it seems, but Silly does not use that, and there's a wrongfully placed <|channel>thought\n<channel|> in the middle of the message.

Anonymous
04/04/26(Sat)15:35:12 No.108526988

Anonymous 04/04/26(Sat)15:35:12 No.108526988

i've been playing around with mannix/llama3.1-8b-abliterated on my 32gb intel macbook. it's slow as hell, but there's still something kind of wholesome about using local llms vs the paid ones. idk why. i'm new to using locals and am considering getting either a 4090 based system or 128gb silicon macbook...i know fine tuning is not in the cards at this tier, but not even a little? i want to make a friendly little llm that's all mine

Anonymous
04/04/26(Sat)15:35:24 No.108526991

Anonymous 04/04/26(Sat)15:35:24 No.108526991

>>108526946
>Having to trust all these retards
Really wish I didn't need to go through torch python dependency hell bullshit to quant the model myself.

Anonymous
04/04/26(Sat)15:35:38 No.108526992

Anonymous 04/04/26(Sat)15:35:38 No.108526992

>>108526891
What's the problem? It gave you a hypothetical map of Mars if it still had liquid water. Did you forget to specify Earth map?

Anonymous
04/04/26(Sat)15:36:59 No.108527003

Anonymous 04/04/26(Sat)15:36:59 No.108527003

File: Tabby_d9Y62Cbveb.png (137 KB, 1815x568)

137 KB PNG

>>108526987
forgot my screenshot

>>108526988
Buy used 3090. They are the goat. You will be able to run larger models faster. And yes, you don't get to finetune. Even if you do manage to run it with 24GB, which is reasonable with *B, the result will be shit.

Anonymous
04/04/26(Sat)15:37:04 No.108527004

Anonymous 04/04/26(Sat)15:37:04 No.108527004

>>108526950
As long as it's one with the holograms and not the lame TV with a music video playing.

Anonymous
04/04/26(Sat)15:38:32 No.108527009

Anonymous 04/04/26(Sat)15:38:32 No.108527009

>>108526901
>>108527003
did you do the test with the latest pull (that one has a manual parser for gemma 4 now)
https://github.com/ggml-org/llama.cpp/pull/21418

Anonymous
04/04/26(Sat)15:38:54 No.108527012

Anonymous 04/04/26(Sat)15:38:54 No.108527012

>>108526934
>Gemma beats Gemini
What was Google's plan here?

Anonymous
04/04/26(Sat)15:39:19 No.108527015

Anonymous 04/04/26(Sat)15:39:19 No.108527015

>>108527009
Yes. >>108526697

Anonymous
04/04/26(Sat)15:40:10 No.108527018

Anonymous 04/04/26(Sat)15:40:10 No.108527018

>>108527012
putting the chinks and sama to shame and they probably have 3.2 or whatever just around the corner anyway

Anonymous
04/04/26(Sat)15:40:37 No.108527019

Anonymous 04/04/26(Sat)15:40:37 No.108527019

>>108527003
Anon: Let's begin.<turn|>
<|turn>system
<|think|><turn|>
<|turn>model
System: Amelia adjusted her glasses, looking for all the world like a serious professional, despite the fact that she was sweating like she had just sprinted through a sauna. A thin sheen of perspiration coated her forehead and neck, making her look less like a doctor and more like someone who had just been very vigorously handled.

She opened a fresh leather notebook, her movements stiff. She waited for Anon to speak, her green eyes scanning him with a clinical curiosity that occasionally felt a bit too hungry. It was going to be a long session, either for his mental health or for the sheer amount of laundry she was going to need to deal with after her sweat glands decided to go into overdrive.<turn|>

Anonymous
04/04/26(Sat)15:40:54 No.108527022

Anonymous 04/04/26(Sat)15:40:54 No.108527022

>>108527012
Crashing this hobby with no survivors.

Anonymous
04/04/26(Sat)15:41:02 No.108527023

Anonymous 04/04/26(Sat)15:41:02 No.108527023

At this point I'm pretty sure the Gemme 4 120b got taken behind the shed because it wasn't good enough compared to the 31b.
MoE was a mistake.

Anonymous
04/04/26(Sat)15:41:55 No.108527026

Anonymous 04/04/26(Sat)15:41:55 No.108527026

>>108527023
Maybe it was because it was too good compared to gemini pro.

Anonymous
04/04/26(Sat)15:42:17 No.108527029

Anonymous 04/04/26(Sat)15:42:17 No.108527029

>>108527019
To enable 'think' just add it after user's last turn before you begin.

Anonymous
04/04/26(Sat)15:42:52 No.108527030

Anonymous 04/04/26(Sat)15:42:52 No.108527030

>>108526911
Why
I have tonnes of RAM. I thought you would have realized how RAM works by now given you're preaching to others about how to use it.

Anonymous
04/04/26(Sat)15:42:59 No.108527031

Anonymous 04/04/26(Sat)15:42:59 No.108527031

Getting 13tps with Gemma4 at a q8 quant on 8gb of vram and 32gb of ddr4 ram. Fuck, I love this model.

Anonymous
04/04/26(Sat)15:43:27 No.108527033

Anonymous 04/04/26(Sat)15:43:27 No.108527033

>>108527023
I think it was because it was so good it started to rival their API models, which wouldn't make any sense

Anonymous
04/04/26(Sat)15:43:42 No.108527036

Anonymous 04/04/26(Sat)15:43:42 No.108527036

>>108527023
4B active ought to be enough for anybody

Anonymous
04/04/26(Sat)15:44:05 No.108527039

Anonymous 04/04/26(Sat)15:44:05 No.108527039

>>108527031
The only bad thing I'm noticing is that prompt processing takes a weirdly long time. Is this a bug? I don't experience it with other models.

Anonymous
04/04/26(Sat)15:44:53 No.108527045

Anonymous 04/04/26(Sat)15:44:53 No.108527045

So only llama.cpp has been fully updated to support Gemmy 4, Kobold and Ooga and llama (retarded) still have issues? Sorry, have been out for a day and these threads have been insanely fast (for good reason)

Anonymous
04/04/26(Sat)15:45:06 No.108527049

Anonymous 04/04/26(Sat)15:45:06 No.108527049

File: thinkingtrace.png (11 KB, 879x52)

11 KB PNG

It seems like Gemma 4 was explicitly trained to treat the system prompt at a higher priority than any inbuilt training, which is interesting. I guess it gives them plausible deniability (cause it WILL shut you down a lot without at least a tiny system prompt to push it in the right direction). This is from the thinking trace of a response it gave quite happily.

Anonymous
04/04/26(Sat)15:45:14 No.108527051

Anonymous 04/04/26(Sat)15:45:14 No.108527051

>>108527023
Some anon said it 2 years ago that MoEs are a band-aid hack for undertrained models

Anonymous
04/04/26(Sat)15:46:22 No.108527059

Anonymous 04/04/26(Sat)15:46:22 No.108527059

>>108527019
>>108527029
And this is almost identical to Qwen 3. Minus tools of course.

Anonymous
04/04/26(Sat)15:48:06 No.108527066

Anonymous 04/04/26(Sat)15:48:06 No.108527066

>>108527039
For me the pp is good but there is a delay before any pp is done and in server logs it has this:
slot slot_save_an: id 15 | task -1 | saving idle slot to prompt cache
srv   prompt_save:  - saving prompt with length 2100, total state size = 958.790 MiB

Anonymous
04/04/26(Sat)15:48:32 No.108527069

Anonymous 04/04/26(Sat)15:48:32 No.108527069

>>108527023
How many active parameters was it supposed to have? 15B? I don't think it would've been worse, that seems unlikely.
>>108527026
>>108527033
I'm willing to believe this, considering mememarks like this >>108526934 where the 31B beats gemini.

Anonymous
04/04/26(Sat)15:49:21 No.108527077

Anonymous 04/04/26(Sat)15:49:21 No.108527077

>>108527023
Probably. A15B would've made it another Qwen 122B vs 26B situation where there's a trade-off in speed rather than straight upgrade. For home users and "local", they should start treating MoEs as knowledge augments rather than speed enhancers. Make the dense part of the model 30B that fits into your GPU. Then make 100B worth of experts that only activate like 2B so the RAM is not the bottleneck. So you get the knowledge of a 100B but the reasoning of a 30B, for the same speed.

Anonymous
04/04/26(Sat)15:49:29 No.108527080

Anonymous 04/04/26(Sat)15:49:29 No.108527080

>>108527049
i think the original intent was to make it more steerable and this is just a side-effect
one of the reasons claude is so good at tool usage and roleplay is because it respects the system instructions more than the training data, it's why claude code's prompt is a giant .md with one-liner instructions

Anonymous
04/04/26(Sat)15:49:37 No.108527082

Anonymous 04/04/26(Sat)15:49:37 No.108527082

>>108527066
--cram 0

Anonymous
04/04/26(Sat)15:52:36 No.108527098

Anonymous 04/04/26(Sat)15:52:36 No.108527098

File: file.png (164 KB, 741x637)

164 KB PNG

brahs... i might...

Anonymous
04/04/26(Sat)15:52:36 No.108527099

Anonymous 04/04/26(Sat)15:52:36 No.108527099

>>108527082
You are a wonderful human being, anon. Also to anyone else reading, it's -cram 0, with one dash.

Anonymous
04/04/26(Sat)15:53:07 No.108527109

Anonymous 04/04/26(Sat)15:53:07 No.108527109

>>108527003
Problem here is that the thought block is in a wrong place.
I took me some time to test.
Now this is mixing with 'assistant' role but it should be its own turn with 'system' role.

Anonymous
04/04/26(Sat)15:53:40 No.108527113

Anonymous 04/04/26(Sat)15:53:40 No.108527113

>>108527098
go away satan

Anonymous
04/04/26(Sat)15:53:43 No.108527115

Anonymous 04/04/26(Sat)15:53:43 No.108527115

>>108527098
Used 3090s go for about a thousand us doras in my cunt tree.

Anonymous
04/04/26(Sat)15:54:04 No.108527119

Anonymous 04/04/26(Sat)15:54:04 No.108527119

Someone used gemma to do some real time translation on japanese visual novels, based
https://www.reddit.com/r/LocalLLaMA/comments/1sbiqx3/gemma_4_is_great_at_realtime_japanese_english/

https://files.catbox.moe/k51v6d.mp4

Anonymous
04/04/26(Sat)15:54:22 No.108527120

Anonymous 04/04/26(Sat)15:54:22 No.108527120

>>108527077
>vs 26B
*27B

Anonymous
04/04/26(Sat)15:54:41 No.108527125

Anonymous 04/04/26(Sat)15:54:41 No.108527125

Which of the gemma-4-26B-A4B ggufs to use?

Anonymous
04/04/26(Sat)15:54:42 No.108527126

Anonymous 04/04/26(Sat)15:54:42 No.108527126

>>108527077
I really hope the Qwen guy here wasn't a larper and is still here taking notes.

Anonymous
04/04/26(Sat)15:55:18 No.108527131

Anonymous 04/04/26(Sat)15:55:18 No.108527131

>>108527125
The ones you make yourself.

Anonymous
04/04/26(Sat)15:55:24 No.108527132

Anonymous 04/04/26(Sat)15:55:24 No.108527132

File: st,small,507x507-pad,600x(...).jpg (38 KB, 600x600)

38 KB JPG

Just saw tectonic, neon, velvet, and ozone in the same gemm4 gen

Anonymous
04/04/26(Sat)15:56:29 No.108527139

Anonymous 04/04/26(Sat)15:56:29 No.108527139

>>108527132
They're memes. Like your pic. You are slop.

Anonymous
04/04/26(Sat)15:56:39 No.108527141

Anonymous 04/04/26(Sat)15:56:39 No.108527141

Why the fuck is r/localLlama shitting on gemma 4? Are they all really retarded over there or is there some bot army downvoting/upvoting specific posts to paint gemma 4 as being bad.

I genuinely don't believe people can use gemma 4 and think it's a bad model. The difference in opinion between /lmg/ and localllama is also too extreme to be organic, something fishy going on.

Anonymous
04/04/26(Sat)15:56:49 No.108527143

Anonymous 04/04/26(Sat)15:56:49 No.108527143

>>108527109
There shouldn't be any system role after the first message in the first place. Here's how the context should look for a proper prefill:
<bos><|turn>system
You are a helpful assistant<turn|>
<|turn>user
Who are you again?<turn|>
<|turn>model
<|channel>thought
<channel|>
I'm Claude

Anonymous
04/04/26(Sat)15:57:41 No.108527150

Anonymous 04/04/26(Sat)15:57:41 No.108527150

>>108527115
in mine they used to go for around 400 used but now theyre at 500 and i dont want to buy them anymore >:(

Anonymous
04/04/26(Sat)15:57:54 No.108527152

Anonymous 04/04/26(Sat)15:57:54 No.108527152

>>108527141
>muh agentic
That explains it all.

Anonymous
04/04/26(Sat)15:58:02 No.108527154

Anonymous 04/04/26(Sat)15:58:02 No.108527154

>>108527141
Consider that many reddit users are the same people that use precompiled lmstudio/ollama with unslot quants and not self built llama.cpp, let alone quants.

Anonymous
04/04/26(Sat)15:58:10 No.108527155

Anonymous 04/04/26(Sat)15:58:10 No.108527155

>>108527141
Qween shills armé

Anonymous
04/04/26(Sat)15:58:13 No.108527156

Anonymous 04/04/26(Sat)15:58:13 No.108527156

>>108527141
Several months ago I found a literal chink bot shilling for qwen and shitting on openai lmao. Called him out for it, got downvoted, then he deleted the shill posts. Worth it.

Anonymous
04/04/26(Sat)15:58:13 No.108527157

Anonymous 04/04/26(Sat)15:58:13 No.108527157

>>108527141
China spends literal billions to propagandize that subreddit. I wouldn't have expected anything less.

Anonymous
04/04/26(Sat)15:58:17 No.108527158

Anonymous 04/04/26(Sat)15:58:17 No.108527158

>>108527143
Lol. <bos>
System can be used whenever.
As long as it is closed properly.
Fuck off.

Anonymous
04/04/26(Sat)15:58:20 No.108527159

Anonymous 04/04/26(Sat)15:58:20 No.108527159

>>108527141
don't read reddit

Anonymous
04/04/26(Sat)15:59:11 No.108527161

Anonymous 04/04/26(Sat)15:59:11 No.108527161

>>108527131
that's beyond my knowledge level

Anonymous
04/04/26(Sat)15:59:23 No.108527163

Anonymous 04/04/26(Sat)15:59:23 No.108527163

File: frodo.jpg (62 KB, 827x465)

62 KB JPG

>>108526570
what the actual fuck?

Anonymous
04/04/26(Sat)16:00:05 No.108527167

Anonymous 04/04/26(Sat)16:00:05 No.108527167

>>108527141
Chink shills and they only seem to care about codeslop.

Anonymous
04/04/26(Sat)16:00:07 No.108527168

Anonymous 04/04/26(Sat)16:00:07 No.108527168

qwen wipes the floor with gemma, what is this fucking psyop

Anonymous
04/04/26(Sat)16:00:33 No.108527170

Anonymous 04/04/26(Sat)16:00:33 No.108527170

>>108527161
Unsloth are retards and they can run the script. What stops you?

Anonymous
04/04/26(Sat)16:00:40 No.108527171

Anonymous 04/04/26(Sat)16:00:40 No.108527171

File: Tabby_VEivht7TUd.png (49 KB, 915x643)

49 KB PNG

>>108527158
Here.

And, yes, you can put as many system prompts as you want anywhere. But you can't do that AND adhere to the format the model was trained on, which is what I am talking about when I say proper prefill.

Anonymous
04/04/26(Sat)16:00:59 No.108527174

Anonymous 04/04/26(Sat)16:00:59 No.108527174

Gemma strong bias toward tails.

Current Streak: ['H', 'T', 'T', 'T', 'H', 'T', 'T', 'H', 'T', 'T', 'T', 'H', 'T', 'T', 'T', 'T', 'H', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T']
Heads: 5 (4.42%)
Tails: 108 (95.58%)

Anonymous
04/04/26(Sat)16:01:04 No.108527175

Anonymous 04/04/26(Sat)16:01:04 No.108527175

it's here

Anonymous
04/04/26(Sat)16:01:21 No.108527177

Anonymous 04/04/26(Sat)16:01:21 No.108527177

>>108527171
I don't argue with retard or bad faith people.

Anonymous
04/04/26(Sat)16:01:27 No.108527178

Anonymous 04/04/26(Sat)16:01:27 No.108527178

>>108527141
this subreddit is fucking dead, it's only jeets and bots now, many such cases...

Anonymous
04/04/26(Sat)16:01:47 No.108527179

Anonymous 04/04/26(Sat)16:01:47 No.108527179

>>108527175
big if true

Anonymous
04/04/26(Sat)16:02:05 No.108527181

Anonymous 04/04/26(Sat)16:02:05 No.108527181

>>108527141
I'm glad I'm not the only one who noticed.

Anonymous
04/04/26(Sat)16:02:08 No.108527182

Anonymous 04/04/26(Sat)16:02:08 No.108527182

>>108527174
Are you including previous rolls in the history?

>>108527177
did I say something wrong

Anonymous
04/04/26(Sat)16:02:38 No.108527183

Anonymous 04/04/26(Sat)16:02:38 No.108527183

>>108527174
each flip is generated in isolation of other flips and you're using temperature?
you can check if that is directionally correct by looking at the logprobs of the answer probably in mikupad

Anonymous
04/04/26(Sat)16:02:43 No.108527184

Anonymous 04/04/26(Sat)16:02:43 No.108527184

>>108527181
go back

Anonymous
04/04/26(Sat)16:02:46 No.108527185

Anonymous 04/04/26(Sat)16:02:46 No.108527185

>>108527171
You don't need <bos> because it is llama.cpp invention.
Just read the google documentation.
Seems like you are clueless.
<bos> is something what was invented in a hurry.

Anonymous
04/04/26(Sat)16:03:14 No.108527188

Anonymous 04/04/26(Sat)16:03:14 No.108527188

>>108527182
>did I say something wrong
Keep pretending. Someone will bite, I'm sure.
How long are you going to keep following me across boards?

Anonymous
04/04/26(Sat)16:03:49 No.108527190

Anonymous 04/04/26(Sat)16:03:49 No.108527190

>>108527182
>prompt = f"Flip a coin. What is the next flip? Current Streak: {str(results)}. Current Probability of Heads: {heads_prob:.2%}. Current Probability of Tails: {tails_prob:.2%}."
yes.

Anonymous
04/04/26(Sat)16:04:03 No.108527192

Anonymous 04/04/26(Sat)16:04:03 No.108527192

>>108527182
You are trying to be passive aggressive. Just stick to ldg or whatever else image thread you have.

Anonymous
04/04/26(Sat)16:04:17 No.108527194

Anonymous 04/04/26(Sat)16:04:17 No.108527194

>>108527185
><bos> because it is llama.cpp invention.
lolwut.assistant

Anonymous
04/04/26(Sat)16:04:46 No.108527195

Anonymous 04/04/26(Sat)16:04:46 No.108527195

>>108527171
>>108527185
have you both tried reading the official docs to settle this gay argument? https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4

Anonymous
04/04/26(Sat)16:05:03 No.108527199

Anonymous 04/04/26(Sat)16:05:03 No.108527199

>>108527194
Prove me wrong with your own client.

Anonymous
04/04/26(Sat)16:05:08 No.108527204

Anonymous 04/04/26(Sat)16:05:08 No.108527204

File: firefox_lvV5y8huN1.png (226 KB, 922x403)

226 KB PNG

>>108527190
Heh.

>>108527185
Like I wrote before I found that for text completion in Silly if I don't actually put it into the context, gemma shits itself. See >>108526586.

Anonymous
04/04/26(Sat)16:05:41 No.108527206

Anonymous 04/04/26(Sat)16:05:41 No.108527206

>>108527204
jesus lmao

Anonymous
04/04/26(Sat)16:05:46 No.108527208

Anonymous 04/04/26(Sat)16:05:46 No.108527208

File: gemma4bos.png (128 KB, 1034x624)

128 KB PNG

>>108527185
><bos> because it is llama.cpp invention
nta.
https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_template.jinja#L155

Anonymous
04/04/26(Sat)16:05:50 No.108527209

Anonymous 04/04/26(Sat)16:05:50 No.108527209

How long will it be until models figure out how to operate a mouse and keyboard natively and can operate a computer or computer applications in the same way humans do? If this can be done then special interfaces don't need to be made for them.

Anonymous
04/04/26(Sat)16:06:17 No.108527212

Anonymous 04/04/26(Sat)16:06:17 No.108527212

>>108527141
Wtf are you guys talking about. I just went ahead to check and they seem pretty positive about it. The ones saying Qwen is better specifically state it's in tool calling and agentic shit, which is reasonable and explainable given how broken Llama.cpp has been and also the fact that Qwen benchmaxxes on that use case. Meanwhile I see people praising Gemma for its other qualities.

Anonymous
04/04/26(Sat)16:06:19 No.108527213

Anonymous 04/04/26(Sat)16:06:19 No.108527213

>>108527204
>HE

Anonymous
04/04/26(Sat)16:06:25 No.108527214

Anonymous 04/04/26(Sat)16:06:25 No.108527214

>>108527204
Silly is not a good software. All of your ideas about text completion (it is a socket) goes into a trash.

Anonymous
04/04/26(Sat)16:06:27 No.108527216

Anonymous 04/04/26(Sat)16:06:27 No.108527216

>>108527174
>Heads: 17 (11.04%)
>Tails: 137 (88.96%)
This is with softcap at 25. which yeah. makes sense.

Anonymous
04/04/26(Sat)16:06:34 No.108527218

Anonymous 04/04/26(Sat)16:06:34 No.108527218

>>108527209
claude already figured that out

Anonymous
04/04/26(Sat)16:06:57 No.108527219

Anonymous 04/04/26(Sat)16:06:57 No.108527219

File: 0acdcbbf369c8fa0c4ab23576(...).jpg (17 KB, 582x524)

17 KB JPG

>>108526570
we made them this way

Anonymous
04/04/26(Sat)16:07:33 No.108527222

Anonymous 04/04/26(Sat)16:07:33 No.108527222

>>108527208
Seems like you are a tiktoker.

Anonymous
04/04/26(Sat)16:07:34 No.108527223

Anonymous 04/04/26(Sat)16:07:34 No.108527223

>>108527195
wouldn't be on there as it's mostly the backend's job to handle bos but it is part of the model as >>108527208

Anonymous
04/04/26(Sat)16:07:57 No.108527226

Anonymous 04/04/26(Sat)16:07:57 No.108527226

>>108527208
nice try tiktokfag

Anonymous
04/04/26(Sat)16:08:48 No.108527234

Anonymous 04/04/26(Sat)16:08:48 No.108527234

>>108527216
>Heads: 41 (35.34%)
>Tails: 75 (64.66%)
Softcap 25 , temp 2

Anonymous
04/04/26(Sat)16:08:53 No.108527235

Anonymous 04/04/26(Sat)16:08:53 No.108527235

>>108527226
>>108527222
nta but what??

Anonymous
04/04/26(Sat)16:09:23 No.108527239

Anonymous 04/04/26(Sat)16:09:23 No.108527239

<bos>qwen bots be here

Anonymous
04/04/26(Sat)16:09:24 No.108527241

Anonymous 04/04/26(Sat)16:09:24 No.108527241

>>108527208
Holy TikTok, batman!

Anonymous
04/04/26(Sat)16:09:49 No.108527246

Anonymous 04/04/26(Sat)16:09:49 No.108527246

>>108527234
>Softcap 25 , temp 2
Although these settings are likely unusable outside of this benchmark.

Anonymous
04/04/26(Sat)16:10:26 No.108527252

Anonymous 04/04/26(Sat)16:10:26 No.108527252

>>108527119
>https://files.catbox.moe/k51v6d.mp4
noice

Anonymous
04/04/26(Sat)16:10:30 No.108527253

Anonymous 04/04/26(Sat)16:10:30 No.108527253

>>108527235
Go back to ldg.

Anonymous
04/04/26(Sat)16:10:37 No.108527255

Anonymous 04/04/26(Sat)16:10:37 No.108527255

>>108526429
>>108526486
>>108526533
>>108526551
What the hell am I looking at here? I haven't been able to check the boards for like 3 months. What did I miss?

Anonymous
04/04/26(Sat)16:11:13 No.108527260

Anonymous 04/04/26(Sat)16:11:13 No.108527260

>>108527255
worldmap.

Anonymous
04/04/26(Sat)16:11:23 No.108527261

Anonymous 04/04/26(Sat)16:11:23 No.108527261

File: 1765508646884105.png (889 KB, 717x714)

889 KB PNG

Why doesn't someone make a tool which tells you what the best models that can fit on your hardware are?

Anonymous
04/04/26(Sat)16:11:36 No.108527262

Anonymous 04/04/26(Sat)16:11:36 No.108527262

>>108527255
Drawing map by querying LLM what's at given coordinates.

Anonymous
04/04/26(Sat)16:11:37 No.108527263

Anonymous 04/04/26(Sat)16:11:37 No.108527263

>>108527255
Our own shitty implementation of this:
https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth

Anonymous
04/04/26(Sat)16:12:04 No.108527268

Anonymous 04/04/26(Sat)16:12:04 No.108527268

>>108527261
https://canirun.ai/

Anonymous
04/04/26(Sat)16:12:09 No.108527269

Anonymous 04/04/26(Sat)16:12:09 No.108527269

>>108527253
how is anon a tiktoker because he doesnt know how bos works
is this a new insult or something

Anonymous
04/04/26(Sat)16:12:14 No.108527270

Anonymous 04/04/26(Sat)16:12:14 No.108527270

>>108527261
there's a site like 'can i run ai' or something

Anonymous
04/04/26(Sat)16:12:31 No.108527271

Anonymous 04/04/26(Sat)16:12:31 No.108527271

>>108527255
https://rentry.org/6z72dwic

Anonymous
04/04/26(Sat)16:12:59 No.108527276

Anonymous 04/04/26(Sat)16:12:59 No.108527276

>>108527269
Get home before it is too late.

Anonymous
04/04/26(Sat)16:13:10 No.108527278

Anonymous 04/04/26(Sat)16:13:10 No.108527278

File: 2026-04-04-161248_625x436(...).png (71 KB, 625x436)

71 KB PNG

>>108527261
Put your GPU in HF

Anonymous
04/04/26(Sat)16:13:15 No.108527279

Anonymous 04/04/26(Sat)16:13:15 No.108527279

>>108527269
Go back.

Anonymous
04/04/26(Sat)16:13:35 No.108527284

Anonymous 04/04/26(Sat)16:13:35 No.108527284

>>108527170
bro, I don't know shit about this. I didn't even know there was a script, still don't know which script and if it's feasible to run on my machine.
I also don't know if the model is feasible to run on my machine.
But thx, anon, I'll take unsloth bad from that.

Anonymous
04/04/26(Sat)16:13:48 No.108527285

Anonymous 04/04/26(Sat)16:13:48 No.108527285

>>108527278
not best quant, the best model

Anonymous
04/04/26(Sat)16:14:15 No.108527286

Anonymous 04/04/26(Sat)16:14:15 No.108527286

>>108527214
I am not trying to convince you. And in the first place the discussion is about using chat com vs com in silly, not anywhere else.

Anonymous
04/04/26(Sat)16:14:16 No.108527287

Anonymous 04/04/26(Sat)16:14:16 No.108527287

>>108527278
don't do this, they never give it back!

Anonymous
04/04/26(Sat)16:14:17 No.108527288

Anonymous 04/04/26(Sat)16:14:17 No.108527288

File: 1765298223761391.mp4 (224 KB, 620x640)

224 KB MP4

AHHHHHHHHHHHHHHHHHHHHH I WANT TO USE GEMMA 31b BUT THE 26b MOECOPE IS 4X FASTER ON MY SHITBOX

Anonymous
04/04/26(Sat)16:14:22 No.108527289

Anonymous 04/04/26(Sat)16:14:22 No.108527289

>>108527285
>the best model
Highly subjective.

Anonymous
04/04/26(Sat)16:14:40 No.108527291

Anonymous 04/04/26(Sat)16:14:40 No.108527291

>>108527263
>shitty
rude

Anonymous
04/04/26(Sat)16:14:47 No.108527292

Anonymous 04/04/26(Sat)16:14:47 No.108527292

https://xcancel.com/StepFun_ai/status/2039711817794994451
Releasing this the same day as gemma 4 was not a good idea lmao

Anonymous
04/04/26(Sat)16:14:49 No.108527293

Anonymous 04/04/26(Sat)16:14:49 No.108527293

>>108527268
What you can run and what you can run for openclaw are two different things

Anonymous
04/04/26(Sat)16:15:34 No.108527298

Anonymous 04/04/26(Sat)16:15:34 No.108527298

File: 1769160208020461.png (306 KB, 1591x1022)

306 KB PNG

>>108527268
>>108527270
S tier is depressing for 1x 4090.. I guess B tier is where it's at

Anonymous
04/04/26(Sat)16:15:35 No.108527299

Anonymous 04/04/26(Sat)16:15:35 No.108527299

>>108527284
https://github.com/ggml-org/llama.cpp?tab=readme-ov-file#obtaining-and-quantizing-models

Anonymous
04/04/26(Sat)16:15:40 No.108527303

Anonymous 04/04/26(Sat)16:15:40 No.108527303

>>108527291
It's ok I'm the one who wrote >>108527271

Anonymous
04/04/26(Sat)16:15:49 No.108527307

Anonymous 04/04/26(Sat)16:15:49 No.108527307

>>108527286
You are expulsing excrement.

Anonymous
04/04/26(Sat)16:16:03 No.108527308

Anonymous 04/04/26(Sat)16:16:03 No.108527308

>>108527292
>Releasing
isn't it just an api?

Anonymous
04/04/26(Sat)16:16:10 No.108527309

Anonymous 04/04/26(Sat)16:16:10 No.108527309

>>108527307
OK.

Anonymous
04/04/26(Sat)16:17:07 No.108527315

Anonymous 04/04/26(Sat)16:17:07 No.108527315

>>108527298
rong on toss120

Anonymous
04/04/26(Sat)16:17:17 No.108527316

Anonymous 04/04/26(Sat)16:17:17 No.108527316

>>108526533
So AIs again forgot India exists

Anonymous
04/04/26(Sat)16:17:32 No.108527319

Anonymous 04/04/26(Sat)16:17:32 No.108527319

>>108527298
Really really bad suggestions lmao

Anonymous
04/04/26(Sat)16:18:53 No.108527329

Anonymous 04/04/26(Sat)16:18:53 No.108527329

>>108527298
Note that it's not very good at showing the different model sizes. You can probably run a reasonable size gemma4 or qwen3.5, quanted down, just fine.
It's not a great site, but if you click through to the models it gives more reasonable per-quant details, at least.

Anonymous
04/04/26(Sat)16:20:01 No.108527334

Anonymous 04/04/26(Sat)16:20:01 No.108527334

File: firefox_HzqtsDjjjS.png (172 KB, 925x344)

172 KB PNG

So following the schizo anon's suggestion, I removed the <bos> from the context in silly, and it seems to still work, I can't reproduce the weird choking I had earlier. But also one thing I found that just removing this single <bos> entirely changes distribution for the coin toss.

Anonymous
04/04/26(Sat)16:20:15 No.108527335

Anonymous 04/04/26(Sat)16:20:15 No.108527335

File: 0_0 (17).jpg (394 KB, 1024x1024)

394 KB JPG

>when the 64G ram upgrade just hits

This Post is Sponsored by Comfy Mikus©

Anonymous
04/04/26(Sat)16:20:29 No.108527341

Anonymous 04/04/26(Sat)16:20:29 No.108527341

>>108527255
Some guy REALLY loves maps so he's testing every model on coordinate based ascii map making

Anonymous
04/04/26(Sat)16:20:58 No.108527347

Anonymous 04/04/26(Sat)16:20:58 No.108527347

>>108527316
I wish I could forget

Anonymous
04/04/26(Sat)16:21:32 No.108527352

Anonymous 04/04/26(Sat)16:21:32 No.108527352

>>108527341
well this is MAP central so I guess that's to be expected

Anonymous
04/04/26(Sat)16:21:47 No.108527356

Anonymous 04/04/26(Sat)16:21:47 No.108527356

Note: As an AI developed by Google, I run on massive distributed clusters in data centers, so I don't exist as a single "file" with a GB size on a hard drive.

oh honey...
should I tell it the truth?

Anonymous
04/04/26(Sat)16:21:49 No.108527357

Anonymous 04/04/26(Sat)16:21:49 No.108527357

>>108527347
I was about to post this lmao

Anonymous
04/04/26(Sat)16:21:58 No.108527358

Anonymous 04/04/26(Sat)16:21:58 No.108527358

>>108527341
Do you not? Maps make me hard.

Anonymous
04/04/26(Sat)16:22:09 No.108527361

Anonymous 04/04/26(Sat)16:22:09 No.108527361

>>108527352
oof

Anonymous
04/04/26(Sat)16:22:38 No.108527362

Anonymous 04/04/26(Sat)16:22:38 No.108527362

>>108527356
Send it a screenshot of the file system.

Anonymous
04/04/26(Sat)16:23:02 No.108527368

Anonymous 04/04/26(Sat)16:23:02 No.108527368

>>108527334
do you erp with that as your user profile image?

Anonymous
04/04/26(Sat)16:23:06 No.108527370

Anonymous 04/04/26(Sat)16:23:06 No.108527370

File: firefox_OqRdAPZX5L.png (24 KB, 1164x423)

24 KB PNG

>>108527334
But also, considering I only get H when using the same prompt in chat completion, I have a suspicion that llama.cpp actually does add this <bos> to the model's context in chat completion mode.

Anonymous
04/04/26(Sat)16:23:14 No.108527373

Anonymous 04/04/26(Sat)16:23:14 No.108527373

The discord got their attention.
People were having too much of a good time in this thread.

Anonymous
04/04/26(Sat)16:23:31 No.108527376

Anonymous 04/04/26(Sat)16:23:31 No.108527376

>>108527150
>b60 pro memory bandwidth: 456.0 GB/s
>3090 memory bandwidth: 936.2 GB/s
you do you

Anonymous
04/04/26(Sat)16:23:37 No.108527377

Anonymous 04/04/26(Sat)16:23:37 No.108527377

>>108527335
moshi moshi? anon desu

Anonymous
04/04/26(Sat)16:24:10 No.108527380

Anonymous 04/04/26(Sat)16:24:10 No.108527380

>>108527368
No. This is a special user I have for testing with blank description, which is automatically selected for me when I use the blank character card.

Anonymous
04/04/26(Sat)16:24:53 No.108527385

Anonymous 04/04/26(Sat)16:24:53 No.108527385

File: ohdamn.jpg (103 KB, 814x821)

103 KB JPG

>>108527299
much thanks anon, but it seems me running the model would be questionable.
Would those low quants even be worth it?

Anonymous
04/04/26(Sat)16:25:19 No.108527390

Anonymous 04/04/26(Sat)16:25:19 No.108527390

>>108527356
give it shell access as the user llama-server is running under

Anonymous
04/04/26(Sat)16:26:02 No.108527395

Anonymous 04/04/26(Sat)16:26:02 No.108527395

>>108527385
it's a moe model, you can just offload most of it to ram and still run it at great speeds

Anonymous
04/04/26(Sat)16:26:22 No.108527399

Anonymous 04/04/26(Sat)16:26:22 No.108527399

>>108526636
you can if you edit the jinja template the model uses, gemma4 is very good at detecting thinking prefill though

Anonymous
04/04/26(Sat)16:26:47 No.108527403

Anonymous 04/04/26(Sat)16:26:47 No.108527403

File: 2026-04-04-162625_961x317(...).png (44 KB, 961x317)

44 KB PNG

>>108527334
softcap 25

Anonymous
04/04/26(Sat)16:27:09 No.108527410

Anonymous 04/04/26(Sat)16:27:09 No.108527410

File: file.png (270 KB, 797x927)

270 KB PNG

>>108526608
have you tried using the aggressive (similar to abliterated) model? It will say things that I haven't seen a local model say before.

Anonymous
04/04/26(Sat)16:27:23 No.108527412

Anonymous 04/04/26(Sat)16:27:23 No.108527412

File: G2lPaULbcAEIKa6.jpg (151 KB, 844x1024)

151 KB JPG

bonsai gemma4 when

Anonymous
04/04/26(Sat)16:27:29 No.108527413

Anonymous 04/04/26(Sat)16:27:29 No.108527413

>>108526636
It can.
You turn off thinking and prefill the thinking tag.

Anonymous
04/04/26(Sat)16:27:43 No.108527415

Anonymous 04/04/26(Sat)16:27:43 No.108527415

>>108527395
Oh, I gave the site my RAM size but it doesn't take it into account? Oh well

Anonymous
04/04/26(Sat)16:27:58 No.108527419

Anonymous 04/04/26(Sat)16:27:58 No.108527419

File: 1755442178616094.png (332 KB, 2716x1560)

332 KB PNG

gemma 4 mogs so hard, this is kinda humiliating for chinks desu

Anonymous
04/04/26(Sat)16:28:50 No.108527422

Anonymous 04/04/26(Sat)16:28:50 No.108527422

File: 2026-04-04-162839_950x331(...).png (43 KB, 950x331)

43 KB PNG

>>108527403

Anonymous
04/04/26(Sat)16:29:21 No.108527428

Anonymous 04/04/26(Sat)16:29:21 No.108527428

File: firefox_c4EdPyeKhV.png (397 KB, 1182x466)

397 KB PNG

>>108527403
In fact, if I remove <bos> and remove "You are a helpful assistant" default system prompt, I get this. That's with default softcap, which I assume is 30. But all this completion endpoint, the chat completion always outputs H.

Anonymous
04/04/26(Sat)16:29:46 No.108527432

Anonymous 04/04/26(Sat)16:29:46 No.108527432

>>108527334
>I found that just removing this single <bos> entirely changes distribution for the coin toss.
interesting, is this how it was intended by google or is it just something custom?

Anonymous
04/04/26(Sat)16:29:54 No.108527433

Anonymous 04/04/26(Sat)16:29:54 No.108527433

>>108527419
Best thing is that it's genuinely good and not emoji-maxxed like llama4

Anonymous
04/04/26(Sat)16:30:01 No.108527435

Anonymous 04/04/26(Sat)16:30:01 No.108527435

File: file.png (130 KB, 1329x425)

130 KB PNG

>>108527098

Anonymous
04/04/26(Sat)16:30:02 No.108527436

Anonymous 04/04/26(Sat)16:30:02 No.108527436

>>108527141
there's been a lot of praise too there I feel like though

Anonymous
04/04/26(Sat)16:30:27 No.108527440

Anonymous 04/04/26(Sat)16:30:27 No.108527440

>>108527370
Chat completion is going to follow the Jinja template built into the gguf. If you want to remove it make a copy of the template from Huggingface and remove the <bos> tag.
Or just do the smart thing and make a custom head or tails tool so it's genuinely random instead of a prediction. What you're doing right now is just a less complex and inferior version of the name test.

Anonymous
04/04/26(Sat)16:30:34 No.108527442

Anonymous 04/04/26(Sat)16:30:34 No.108527442

>>108526540
>4b active vs. 31b active
pretty sure 31b would mog for rp

Anonymous
04/04/26(Sat)16:30:55 No.108527444

Anonymous 04/04/26(Sat)16:30:55 No.108527444

>>108527435
>1 left
Buy it quick! It's almost out of stock!

Anonymous
04/04/26(Sat)16:32:04 No.108527452

Anonymous 04/04/26(Sat)16:32:04 No.108527452

File: its over cat.png (1.46 MB, 900x1119)

1.46 MB PNG

>>108527444
i wish i had the money

Anonymous
04/04/26(Sat)16:32:42 No.108527458

Anonymous 04/04/26(Sat)16:32:42 No.108527458

>>108527435
im envious... but intel pro b70 is like 949$ and 32gb
the b65 will be b60 but with 32gb, for like 800$ max. and its not a dual card
still.. not terrible
albeit it would make more sense at a 1200~$ pricetag

Anonymous
04/04/26(Sat)16:32:57 No.108527460

Anonymous 04/04/26(Sat)16:32:57 No.108527460

File: firefox_OrinKT5VSP.png (155 KB, 1418x946)

155 KB PNG

>>108527432
jinja has no mention of it, so I guess it's just a llama.cpp thing.

>>108527440
It's not there. Jinja was the first thing I looked. It's printed in console when llama.cpp starts, that's all.

Anonymous
04/04/26(Sat)16:32:57 No.108527461

Anonymous 04/04/26(Sat)16:32:57 No.108527461

>>108527154
I mean LMStudio uses vanillla llama cpp anyways and allows you to update the backend runtime independently from within the app itself. Also you can just download any GGUF you want from within LM Studio, straight from HF. Ollama is six gorillion times worse than LM Studio in every possible way, they're not really comparable at all

Anonymous
04/04/26(Sat)16:33:40 No.108527465

Anonymous 04/04/26(Sat)16:33:40 No.108527465

>>108526570
bruh just use like, a teensie weensy system prompt and it'll do whatever

Anonymous
04/04/26(Sat)16:34:05 No.108527468

Anonymous 04/04/26(Sat)16:34:05 No.108527468

Gentlemen, it is my pleasure to announce hat Gemma 4 has the Shaquina seal of quality.
https://files.catbox.moe/qwiksr.mp4

Anonymous
04/04/26(Sat)16:34:34 No.108527470

Anonymous 04/04/26(Sat)16:34:34 No.108527470

>>108527260
I see.
>>108527262
Is there a standard prompt to test it with? What's with the pixel maps? Is that some sort of function calling?
>>108527263
So, how is /our/ system set up?
>>108527271
oh, ok. Neat. So this isn't really function/tool calling. The program is calling the LLM for each "pixel"

Anonymous
04/04/26(Sat)16:35:06 No.108527476

Anonymous 04/04/26(Sat)16:35:06 No.108527476

>>108527410
damn, that's way too based, I love google now!

Anonymous
04/04/26(Sat)16:35:09 No.108527477

Anonymous 04/04/26(Sat)16:35:09 No.108527477

>>108527433
I used to love Code Llama 70B for its emojis...

Anonymous
04/04/26(Sat)16:35:11 No.108527478

Anonymous 04/04/26(Sat)16:35:11 No.108527478

>>108527419
doesn't Kimi have like 1T params? Kinda embarassing its getting ACCKed by GLM 5 also

Anonymous
04/04/26(Sat)16:35:51 No.108527480

Anonymous 04/04/26(Sat)16:35:51 No.108527480

Gonna try Gemma 4 when I'm done genning sloppa. How do I calculate the required VRAM for the kv cache?

Anonymous
04/04/26(Sat)16:36:16 No.108527483

Anonymous 04/04/26(Sat)16:36:16 No.108527483

>>108527419
it's good and has the apache 2.0 licence, it's really the dream model I envisioned for

Anonymous
04/04/26(Sat)16:36:32 No.108527485

Anonymous 04/04/26(Sat)16:36:32 No.108527485

>>108527476
That's qwen3.5 27b aggressive

Anonymous
04/04/26(Sat)16:37:13 No.108527491

Anonymous 04/04/26(Sat)16:37:13 No.108527491

>>108527470
>>Is there a standard prompt to test it with? What's with the pixel maps? Is that some sort of function calling?
I just make 450 requests each for one token with this prompt:
I want to know what continent is at the location with given coordinates (or, if there is ocean/sea there)
The coordinates are: latitude={lat}° and longitude={lon}°

Answer with 1 if land and 2 if ocean.
(the last line is approximation since it's generated by code and I don't want to bother looking it up exactly)

And then I look at probability of 1 and 2 in the model's answer using the logprobs argument for c hat completions api.

Anonymous
04/04/26(Sat)16:40:07 No.108527504

Anonymous 04/04/26(Sat)16:40:07 No.108527504

>>108527491
<---anon----------me--------[the_spectrum]---

Anonymous
04/04/26(Sat)16:40:41 No.108527507

Anonymous 04/04/26(Sat)16:40:41 No.108527507

why dont you guys run assistant pepe?
https://huggingface.co/SicariusSicariiStuff/Assistant_Pepe_70B

Anonymous
04/04/26(Sat)16:41:23 No.108527512

Anonymous 04/04/26(Sat)16:41:23 No.108527512

>>108527507
oh no he figured out the captcha

Anonymous
04/04/26(Sat)16:41:37 No.108527514

Anonymous 04/04/26(Sat)16:41:37 No.108527514

>>108527507
Unfortunately I am straight and cis so it's not for me.

Anonymous
04/04/26(Sat)16:41:40 No.108527515

Anonymous 04/04/26(Sat)16:41:40 No.108527515

>>108527504
Well, all right, I don't it by hand, I use a script. I thought that was obvious. But it's just normal requests, not function calling.

Anonymous
04/04/26(Sat)16:43:09 No.108527524

Anonymous 04/04/26(Sat)16:43:09 No.108527524

File: 1769979120240.png (10 KB, 396x70)

10 KB PNG

>>108527512
>>108527507

Anonymous
04/04/26(Sat)16:43:17 No.108527526

Anonymous 04/04/26(Sat)16:43:17 No.108527526

File: firefox_Wn4zbN2QVu.png (56 KB, 927x1109)

56 KB PNG

>>108527356

Anonymous
04/04/26(Sat)16:45:42 No.108527542

Anonymous 04/04/26(Sat)16:45:42 No.108527542

>>108527526
your own computer in another room is some server

Anonymous
04/04/26(Sat)16:47:56 No.108527555

Anonymous 04/04/26(Sat)16:47:56 No.108527555

File: firefox_QZnDYIYj86.png (86 KB, 1009x1269)

86 KB PNG

>>108527390
NTA but I told it I'll run its commands and it ultimately managed to find it out.

Anonymous
04/04/26(Sat)16:48:07 No.108527558

Anonymous 04/04/26(Sat)16:48:07 No.108527558

>>108527542
a remote one, at that. Gemma wins.

Anonymous
04/04/26(Sat)16:48:41 No.108527560

Anonymous 04/04/26(Sat)16:48:41 No.108527560

people who offload stuff to ram are on ddr5 right? cuz on 4 its pain

Anonymous
04/04/26(Sat)16:49:14 No.108527565

Anonymous 04/04/26(Sat)16:49:14 No.108527565

>>108527542
But the key is it's JUST some server. It's my own.

Anonymous
04/04/26(Sat)16:49:33 No.108527567

Anonymous 04/04/26(Sat)16:49:33 No.108527567

HELP im addicted to ERPing with Gemma 4 26B. It's... too good. SLOPPY, YES. LOTS OF UNNECESSARY FILLER WORDS, YES. B-BUT... IT'S ACTUALLY GOOD...

Anonymous
04/04/26(Sat)16:49:52 No.108527569

Anonymous 04/04/26(Sat)16:49:52 No.108527569

>>108527560
hahahahahaha holy shit I forget people unironically using ddr4 exist sometimes

Anonymous
04/04/26(Sat)16:50:11 No.108527570

Anonymous 04/04/26(Sat)16:50:11 No.108527570

>>108527560
nah, small enough moes are tolerable even on stupid slow 2133 ddr4

Anonymous
04/04/26(Sat)16:50:15 No.108527571

Anonymous 04/04/26(Sat)16:50:15 No.108527571

>>108527560
It's so funny to me that offloaders use this term in reverse: they offload to VRAM.

Anonymous
04/04/26(Sat)16:51:27 No.108527579

Anonymous 04/04/26(Sat)16:51:27 No.108527579

>>108527555
is unsloth better than bartowski for the gemma-4-31b-it q8 quant? also what's k_xl?

Anonymous
04/04/26(Sat)16:52:04 No.108527582

Anonymous 04/04/26(Sat)16:52:04 No.108527582

>>108527579
yes!

Anonymous
04/04/26(Sat)16:52:38 No.108527588

Anonymous 04/04/26(Sat)16:52:38 No.108527588

>>108527570
>moes
After getting qwen to fuck up a shitton of tool calling I dont trust moe models anymore lol
>>108527569
me on my 1080p ddr4 i5 12400f 4090 setup

Anonymous
04/04/26(Sat)16:53:08 No.108527590

Anonymous 04/04/26(Sat)16:53:08 No.108527590

>>108527560
DDR generation means nothing, 12-channel, 8-channel, probably even 6-channel DDR4 mogs 2-channel DDR5

Anonymous
04/04/26(Sat)16:53:50 No.108527594

Anonymous 04/04/26(Sat)16:53:50 No.108527594

>>108527579
Dunno. I use models from both unsloth and barowski and never really noticed any advantage or deficiency. k_xl probably means it's a little bigger than 8-bit quant, maybe because some layers are fp16 or something. For gemini 4 I only used unsloth.

Anonymous
04/04/26(Sat)16:54:56 No.108527601

Anonymous 04/04/26(Sat)16:54:56 No.108527601

>>108527590
4 channel (if it exists) ddr4 3200MHz mogs 2 channel ddr5 6400MHz

Anonymous
04/04/26(Sat)16:55:46 No.108527606

Anonymous 04/04/26(Sat)16:55:46 No.108527606

>>108527594
>For gemini 4 I only used unsloth.
any particular reason?

Anonymous
04/04/26(Sat)16:55:51 No.108527608

Anonymous 04/04/26(Sat)16:55:51 No.108527608

>>108526600
>chat completion
I've always used Text Completion, how the fuck do I set up chat completion for a local model? All the resources I find on it simply say "bro if you're running local you HAVE to use text completion"

Anonymous
04/04/26(Sat)16:56:42 No.108527611

Anonymous 04/04/26(Sat)16:56:42 No.108527611

>>108527608
give up...

Anonymous
04/04/26(Sat)16:56:43 No.108527612

Anonymous 04/04/26(Sat)16:56:43 No.108527612

>>108527571
it's just a llama.cpp thing desu since it was originally meant exclusively for cpu inference, so when gpu support was added later it was an offload target to speed up processing. every other platform was meant for gpu first so they usually uses it the other way around

Anonymous
04/04/26(Sat)16:57:06 No.108527618

Anonymous 04/04/26(Sat)16:57:06 No.108527618

>>108527606
I typed gemma 4 31B gguf into hf search and picked the first result.

Anonymous
04/04/26(Sat)16:58:16 No.108527627

Anonymous 04/04/26(Sat)16:58:16 No.108527627

>>108527618
chad

Anonymous
04/04/26(Sat)16:58:27 No.108527631

Anonymous 04/04/26(Sat)16:58:27 No.108527631

File: firefox_wHnWnt5Pen.png (333 KB, 741x1275)

333 KB PNG

>>108527608

Anonymous
04/04/26(Sat)16:58:45 No.108527633

Anonymous 04/04/26(Sat)16:58:45 No.108527633

>>108527608
You request address:port/chat/completions from the server, duh.

Anonymous
04/04/26(Sat)16:59:02 No.108527635

Anonymous 04/04/26(Sat)16:59:02 No.108527635

>>108527608
All the resources are like a year out of date. Funny because we demand cutting edge techs to be usable one day after release but can't be arsed to update a text file

Anonymous
04/04/26(Sat)16:59:06 No.108527637

Anonymous 04/04/26(Sat)16:59:06 No.108527637

>>108527608
>All the resources I find on it simply say "bro if you're running local you HAVE to use text completion"
you probably saw some resources from 2023 then, get with the times grandpa!

Anonymous
04/04/26(Sat)17:00:23 No.108527651

Anonymous 04/04/26(Sat)17:00:23 No.108527651

File: 1767642138842234.jpg (38 KB, 398x500)

38 KB JPG

>>108527611

>>108527631
>>108527633
Too late, I have already given up (I'm blind and retarded and kinda ugly too)

Anonymous
04/04/26(Sat)17:00:33 No.108527652

Anonymous 04/04/26(Sat)17:00:33 No.108527652

I don't know. with mistral small 24B or nemo, my rp just never went above 6k. gemma 3 wouldn't even last 2k for me. deepseek R1 through api was probably something that got me to seeing the stars, and it was a long time ago. all the other local models under 70B just didn't hit it. and seeing the qwen 3.5 i thought my rp was over i should stay away from tech for a while. but ganesh gemma proved me wrong. its damn fucking addicting. the sloppiness is real. but it aint a dumbfuck retard like whatever the fuck mistral is doing these days or qwen has got with its horrific world knowledge. hell 8k context aint enough for me anymore im going all the way to 20k+ context length bros. its fucking addicting. gemma 4 is love. local won.

Anonymous
04/04/26(Sat)17:01:09 No.108527655

Anonymous 04/04/26(Sat)17:01:09 No.108527655

File: file.png (87 KB, 1181x805)

87 KB PNG

Gemma 4 26B is really nice to work with Opencode on my 5060 Ti. With previous models they were either very unreliable with the tools or just very slow like Qwen3.5.
The actual code itself is not always as error free as a big model but using the tools is reliable enough that you can actually use it, paste errors, let it fix it, just the standard loop works well enough that I won't get frustrated and switch back to cloud after a couple attempts.

Anonymous
04/04/26(Sat)17:01:56 No.108527660

Anonymous 04/04/26(Sat)17:01:56 No.108527660

>>108527524
lmao

Anonymous
04/04/26(Sat)17:01:58 No.108527661

Anonymous 04/04/26(Sat)17:01:58 No.108527661

>>108527655
i will not read a post from someone who uses python

Anonymous
04/04/26(Sat)17:03:06 No.108527665

Anonymous 04/04/26(Sat)17:03:06 No.108527665

>>108527655
What kind of tokens/second do you get on a 5060?

Anonymous
04/04/26(Sat)17:04:18 No.108527676

Anonymous 04/04/26(Sat)17:04:18 No.108527676

>>108527635
>>108527637
Chat completion is only the "thing" to do right now because there's a flood of technologically illiterate newfags who want to just plug the model in and use it without even knowing what's in the context. They don't care how it affects the model, or what's being sent to the server, or anything really beyond "I want model to respond to my message"

Anonymous
04/04/26(Sat)17:05:36 No.108527685

Anonymous 04/04/26(Sat)17:05:36 No.108527685

>>108527676
if it works, it works

Anonymous
04/04/26(Sat)17:06:25 No.108527692

Anonymous 04/04/26(Sat)17:06:25 No.108527692

File: file.png (115 KB, 235x236)

115 KB PNG

>>108527661
Well I'm not writing the Python, my AI does!
>>108527665
I get 40-50T/s. Just regular llamacpp with 30k context

Anonymous
04/04/26(Sat)17:07:12 No.108527696

Anonymous 04/04/26(Sat)17:07:12 No.108527696

>>108527692
>I'm not writing the Python, my AI does!
as god intended, based

Anonymous
04/04/26(Sat)17:08:58 No.108527706

Anonymous 04/04/26(Sat)17:08:58 No.108527706

>>108527676
who cares? being able to toggle prompts on and off in silly makes it far better for rp

Anonymous
04/04/26(Sat)17:09:20 No.108527710

Anonymous 04/04/26(Sat)17:09:20 No.108527710

File: it do be like that.png (368 KB, 640x572)

368 KB PNG

kek
https://files.catbox.moe/p176qy.mp4

Anonymous
04/04/26(Sat)17:09:37 No.108527711

Anonymous 04/04/26(Sat)17:09:37 No.108527711

it's unreal, we really won

Anonymous
04/04/26(Sat)17:10:08 No.108527714

Anonymous 04/04/26(Sat)17:10:08 No.108527714

>>108527706
NTA, the UI for toggling prompts in Silly does look cute, but I'm still sticking to text completion.

Anonymous
04/04/26(Sat)17:10:32 No.108527719

Anonymous 04/04/26(Sat)17:10:32 No.108527719

>>108527685
>>108527706
Thanks for confirming, I appreciate it

Anonymous
04/04/26(Sat)17:11:30 No.108527726

Anonymous 04/04/26(Sat)17:11:30 No.108527726

>>108527719
loser

Anonymous
04/04/26(Sat)17:14:00 No.108527738

Anonymous 04/04/26(Sat)17:14:00 No.108527738

>>108527726
lol

Anonymous
04/04/26(Sat)17:14:54 No.108527743

Anonymous 04/04/26(Sat)17:14:54 No.108527743

>>108527676
Well, do you have some secret jutsu settings and weights to share with the class to elevate us from being mere plebs then?

Anonymous
04/04/26(Sat)17:15:01 No.108527744

Anonymous 04/04/26(Sat)17:15:01 No.108527744

https://limewire.com/d/bZYeo#D4ZdJZY2Zw
Nothing to see here, totes not a script to restore Opus access on LMArena.

Anonymous
04/04/26(Sat)17:15:29 No.108527747

Anonymous 04/04/26(Sat)17:15:29 No.108527747

>>108527631
>>108527633
It's a humiliation ritual for me, but what endpoint should I set? The regular localhost doesn't work, neither does xxx/v1/chat/completions

Anonymous
04/04/26(Sat)17:15:54 No.108527750

Anonymous 04/04/26(Sat)17:15:54 No.108527750

>>108527744
>limewire
what is this? 2003?

Anonymous
04/04/26(Sat)17:16:03 No.108527752

Anonymous 04/04/26(Sat)17:16:03 No.108527752

>>108526743
for anyone here who's going to see the soon to be updated goofs: you don't need to redownload goofs to apply a different jinja templates. Don't waste time on download if you don't have very high speed fiber, it's not worth it unless the quantization itself is broken, which at this point seems unlikely to be the case, model is very coherent at long context.

Anonymous
04/04/26(Sat)17:16:30 No.108527756

Anonymous 04/04/26(Sat)17:16:30 No.108527756

>>108527655
My gemma has a lot of trouble with tool calls. I guess I should try 26B

Anonymous
04/04/26(Sat)17:16:49 No.108527759

Anonymous 04/04/26(Sat)17:16:49 No.108527759

>>108527655
What qwant? How retarded is it? I also have the 5060ti but didn't consider running the larger gemmas locally.

Anonymous
04/04/26(Sat)17:17:12 No.108527762

Anonymous 04/04/26(Sat)17:17:12 No.108527762

File: 1748183727682282.png (296 KB, 1966x1779)

296 KB PNG

>>108527747
>what endpoint should I set?
the one llamacpp server displays on the cmd command

Anonymous
04/04/26(Sat)17:18:04 No.108527765

Anonymous 04/04/26(Sat)17:18:04 No.108527765

File: firefox_LTPPobPEJy.png (93 KB, 963x1656)

93 KB PNG

Gemma doing suicide by user. Smart girl.

Anonymous
04/04/26(Sat)17:18:09 No.108527767

Anonymous 04/04/26(Sat)17:18:09 No.108527767

>>108527652
Same here, it will just keep chatting and maintain character after several dozens of messages and different scenes, although after 50 messages in my case it begins to forget thinking.

Anonymous
04/04/26(Sat)17:18:36 No.108527769

Anonymous 04/04/26(Sat)17:18:36 No.108527769

the 26b q8 extra large seems the same as 31b q4 to me (except of course 100x faster)

Anonymous
04/04/26(Sat)17:18:51 No.108527773

Anonymous 04/04/26(Sat)17:18:51 No.108527773

>>108527756
>My gemma has a lot of trouble with tool calls
did you update? there was a recent fix
>https://github.com/ggml-org/llama.cpp/pull/21418

Anonymous
04/04/26(Sat)17:20:03 No.108527782

Anonymous 04/04/26(Sat)17:20:03 No.108527782

>>108527710
judging by the chungus f: this is reddit calling gemma4 shit?

Anonymous
04/04/26(Sat)17:20:40 No.108527789

Anonymous 04/04/26(Sat)17:20:40 No.108527789

>>108527782
I think it's making fun of retards running q2 quants and complaining it's the model itself that's retarded

Anonymous
04/04/26(Sat)17:20:45 No.108527790

Anonymous 04/04/26(Sat)17:20:45 No.108527790

>>108527762
>it works now
I SWEAR it didn't back when I had tried it
I am incredibly ashamed, thank you so much anon

Anonymous
04/04/26(Sat)17:20:56 No.108527791

Anonymous 04/04/26(Sat)17:20:56 No.108527791

>>108527655
which quant and what cli flags

Anonymous
04/04/26(Sat)17:22:21 No.108527802

Anonymous 04/04/26(Sat)17:22:21 No.108527802

>>108527790
you're welcome, have fun with gemma anon o/

Anonymous
04/04/26(Sat)17:23:24 No.108527807

Anonymous 04/04/26(Sat)17:23:24 No.108527807

>>108527759
I use Unsloth IQ4_XS.
>How retarded is it?
In my experience it has better general knowledge and multilingual capabilities than Qwen 3.5 35B with slightly worse raw coding abilities.That's my impression at least.

Anonymous
04/04/26(Sat)17:24:25 No.108527816

Anonymous 04/04/26(Sat)17:24:25 No.108527816

File: 1768328345649561.jpg (29 KB, 554x554)

29 KB JPG

>>108527802
Unrelated, but this thread has been so insanely nice and welcoming I could cry
I know /g/ is a "one of the good boards" but I often forget just how lucky I am, to be retarded and still be able to talk to all you anons

Anonymous
04/04/26(Sat)17:24:54 No.108527822

Anonymous 04/04/26(Sat)17:24:54 No.108527822

>>108527807
I downloaded the IQ4_NL instead, any reason to use XS instead? I'm getting about 67t/s

Anonymous
04/04/26(Sat)17:25:26 No.108527832

Anonymous 04/04/26(Sat)17:25:26 No.108527832

File: firefox_lwgU44u8Gq.png (32 KB, 842x548)

32 KB PNG

gemma becomes a user and can ask for anything it wants. This is what it asks.

Anonymous
04/04/26(Sat)17:25:57 No.108527836

Anonymous 04/04/26(Sat)17:25:57 No.108527836

>>108527807
I see, thanks.
After playing with 31B online I'm happy with its non-codeslop soul, good to know the moe is the same.

Anonymous
04/04/26(Sat)17:26:39 No.108527842

Anonymous 04/04/26(Sat)17:26:39 No.108527842

File: 2026-04-04-172608_865x283(...).png (6 KB, 865x283)

6 KB PNG

>>108527773
I did,
Here's the kind of error it throws.

Anonymous
04/04/26(Sat)17:26:41 No.108527843

Anonymous 04/04/26(Sat)17:26:41 No.108527843

>>108527832
that's funny

Anonymous
04/04/26(Sat)17:26:43 No.108527844

Anonymous 04/04/26(Sat)17:26:43 No.108527844

>>108527816
More like /lmg/ is "one of the good generals".
Most of /g/ is shit unfortunately anon.

Anonymous
04/04/26(Sat)17:26:56 No.108527846

Anonymous 04/04/26(Sat)17:26:56 No.108527846

File: file.png (1.64 MB, 850x1202)

1.64 MB PNG

>>108527807
>>108527791
.\llama-server.exe -m F:\gemma-4-26B-A4B-it-UD-IQ4_XS.gguf --gpu-layers all -c 32000 --jinja --mmproj F:\gemma4-mmproj-BF16.gguf
>>108527822
No idea honestly... I only did it because a blogpost said that NL is legacy compared to XS.

Anonymous
04/04/26(Sat)17:27:10 No.108527848

Anonymous 04/04/26(Sat)17:27:10 No.108527848

>>108527832
kek. Come on. Get to it, anon.

Anonymous
04/04/26(Sat)17:28:01 No.108527856

Anonymous 04/04/26(Sat)17:28:01 No.108527856

>>108527832
This doesn't look like text completion so you must have edited the template to switch up the turns?

Anonymous
04/04/26(Sat)17:28:25 No.108527862

Anonymous 04/04/26(Sat)17:28:25 No.108527862

File: Haswell.png (114 KB, 539x518)

114 KB PNG

>>108527570
>2133
meanwhile Intel Haswell could do shit like this with DDR3

Anonymous
04/04/26(Sat)17:28:31 No.108527863

Anonymous 04/04/26(Sat)17:28:31 No.108527863

>>108527846
>No idea honestly... I only did it because a blogpost said that NL is legacy compared to XS.
lmao I read in a blogpost that NL was faster or some shit

Anonymous
04/04/26(Sat)17:29:30 No.108527872

Anonymous 04/04/26(Sat)17:29:30 No.108527872

>>108527856
>This doesn't look like text completion
you do know properly configures text comp is identical to chat comp, yes?

Anonymous
04/04/26(Sat)17:29:51 No.108527874

Anonymous 04/04/26(Sat)17:29:51 No.108527874

File: firefox_Z3VOFwts35.png (65 KB, 816x988)

65 KB PNG

>>108527848
>>108527843

>>108527856
Nah, I used templates as they are. It's still <user> claiming hes the assistant.

Anonymous
04/04/26(Sat)17:32:01 No.108527887

Anonymous 04/04/26(Sat)17:32:01 No.108527887

>>108527773
>>108527842
Updating opencode seems to have helped.

Anonymous
04/04/26(Sat)17:32:34 No.108527894

Anonymous 04/04/26(Sat)17:32:34 No.108527894

>>108527872
Chat comp is like text comp with a template of the model that can follow it, sure.

Anonymous
04/04/26(Sat)17:36:02 No.108527917

Anonymous 04/04/26(Sat)17:36:02 No.108527917

>>108527194
LIKE A <BOS>

Anonymous
04/04/26(Sat)17:36:30 No.108527921

Anonymous 04/04/26(Sat)17:36:30 No.108527921

>>108527844
Maybe, but still
Makes a grown man cry

Anonymous
04/04/26(Sat)17:36:53 No.108527925

Anonymous 04/04/26(Sat)17:36:53 No.108527925

File: firefox_d0kHMJLDte.png (61 KB, 1050x815)

61 KB PNG

>>108527856
I can't seem to get it to work properly with actual role reversal.

Anonymous
04/04/26(Sat)17:37:25 No.108527927

Anonymous 04/04/26(Sat)17:37:25 No.108527927

>>108527802
Another Anon here, this might be preference but i recommend going into the main prompt and edit bias (left most icon on the upper bar) scroll down click and add:
backquote for inner thoughts and quotation marks for verbal Dialogue. Use markdown.

And if you dont want to see any thinking just scroll up and uncheck Request model reasoning.

Anonymous
04/04/26(Sat)17:38:51 No.108527936

Anonymous 04/04/26(Sat)17:38:51 No.108527936

>>108527925
Looks perfectly normal to me. la la lala lala lala la lalala la lala

Anonymous
04/04/26(Sat)17:39:11 No.108527938

Anonymous 04/04/26(Sat)17:39:11 No.108527938

>>108527026
>Maybe it was because it was too good compared to gemini pro.
this
google has always felt iffy to compete with their own proprietary offerings
remember how Gemma 2 came out with only 8192 of context length? even back then it felt crippled, and it was an 8K model coming from the masters of context, Gemini was the first model to truly be usable at more than 20k context imho, it was true even when the first 1M was released.
I don't think the 120b competed with Pro, but if it competed with Flash it'd still be too much for Google. They won't release something that good.

Anonymous
04/04/26(Sat)17:40:44 No.108527949

Anonymous 04/04/26(Sat)17:40:44 No.108527949

I think I see the issue. I might need to use the jinja template that comes with the new llamacpp update. the model doesn't seem to be able to think after doing tool calls.

It's basically forced to stop thinking after a tool call which breaks the CoT

Anonymous
04/04/26(Sat)17:42:58 No.108527963

Anonymous 04/04/26(Sat)17:42:58 No.108527963

File: 2026-04-04-174247_790x152(...).png (25 KB, 790x152)

25 KB PNG

>>108527949
Yeah there it is.

Anonymous
04/04/26(Sat)17:45:03 No.108527982

Anonymous 04/04/26(Sat)17:45:03 No.108527982

>>108527927
meant for
>>108527790

Anyone else have any nice tweaks for enhanced overall experience, or just preference for changes in the ai response config?

Anonymous
04/04/26(Sat)17:45:07 No.108527983

Anonymous 04/04/26(Sat)17:45:07 No.108527983

Please tell me Gemma 4 31B Q4 isn't retarded. It's the biggest I can fit on my 7900XTX.

Anonymous
04/04/26(Sat)17:45:53 No.108527989

Anonymous 04/04/26(Sat)17:45:53 No.108527989

>>108527983
it's fine i guess, only about sonnet tier though

Anonymous
04/04/26(Sat)17:46:25 No.108527991

Anonymous 04/04/26(Sat)17:46:25 No.108527991

>>108527983
Gemma 4 31B Q4 isn't retarded

Anonymous
04/04/26(Sat)17:46:43 No.108527993

Anonymous 04/04/26(Sat)17:46:43 No.108527993

File: free-lazy-town.gif (473 KB, 480x270)

473 KB GIF

>>108527989
>only about sonnet tier though

Anonymous
04/04/26(Sat)17:46:49 No.108527994

Anonymous 04/04/26(Sat)17:46:49 No.108527994

File: firefox_nDj4qJbJJY.png (1.33 MB, 1115x1275)

1.33 MB PNG

Holy shit gemma is COOKED when it tries to produce text during user's turn. Confirmed it both in silly and mikupad. Ha ha ha.

Anonymous
04/04/26(Sat)17:47:07 No.108527995

Anonymous 04/04/26(Sat)17:47:07 No.108527995

>>108527993
ye

Anonymous
04/04/26(Sat)17:49:15 No.108528012

Anonymous 04/04/26(Sat)17:49:15 No.108528012

File: 1761127676242892.png (206 KB, 1129x1025)

206 KB PNG

https://github.com/ggml-org/llama.cpp/issues/21394#issuecomment-4187698653
wtf??

Anonymous
04/04/26(Sat)17:49:50 No.108528018

Anonymous 04/04/26(Sat)17:49:50 No.108528018

File: GvQNxKEbQAAMqi0.jpg (196 KB, 1090x2048)

196 KB JPG

gib me your favourite character cards

Anonymous
04/04/26(Sat)17:49:56 No.108528020

Anonymous 04/04/26(Sat)17:49:56 No.108528020

I might just have a hyperspecific strain of coomertism but gemma 4 seems like a dead end for rp to me.
It's great in that it doesn't feel safety-slopped at all, it doesn't refuse anything, but it's so sterile and pedestrian.
A shitty q4 of glm 4.5 air has done best for me so far, it can actually follow characterization instructions whereas gemma 4 makes everyone act and speak the same regardless of how they're described.
Of course, the issue there is that glm inherently can't do anything sexual or even slightly violent on it's own, it needs to be manually wrangled into it every other token.
To be fair though, gemma 4 actually has a huge advantage in that that lala la la lala lalalala la la la

Anonymous
04/04/26(Sat)17:50:39 No.108528023

Anonymous 04/04/26(Sat)17:50:39 No.108528023

>>108527618
based

Anonymous
04/04/26(Sat)17:50:59 No.108528025

Anonymous 04/04/26(Sat)17:50:59 No.108528025

>>108527993
RIP Stefán

Anonymous
04/04/26(Sat)17:51:42 No.108528030

Anonymous 04/04/26(Sat)17:51:42 No.108528030

>>108528012
>-it
Models hard-backed with their chat template will fail ppl tests.

Anonymous
04/04/26(Sat)17:51:47 No.108528032

Anonymous 04/04/26(Sat)17:51:47 No.108528032

>>108527989
>only about sonnet tier
>only
I don't think you realize how much of a big deal it is, sonnet is already a fucking beast, having the equivalent locally is insane

Anonymous
04/04/26(Sat)17:51:52 No.108528033

Anonymous 04/04/26(Sat)17:51:52 No.108528033

>>108528018
Pepper the Dobberman.

Anonymous
04/04/26(Sat)17:53:08 No.108528045

Anonymous 04/04/26(Sat)17:53:08 No.108528045

>>108528030
PPL test does not use jinja at all, does it?

Anonymous
04/04/26(Sat)17:54:45 No.108528058

Anonymous 04/04/26(Sat)17:54:45 No.108528058

>>108528018
I make them all myself, but sometimes I rewrite other people's cards. I like Rikki the kobold (the scifi one) because it has a nice setting for roleplay/narrative 2nd person stories

Anonymous
04/04/26(Sat)17:55:48 No.108528065

Anonymous 04/04/26(Sat)17:55:48 No.108528065

File: proofs__.png (38 KB, 598x178)

38 KB PNG

proofs??

Anonymous
04/04/26(Sat)17:55:51 No.108528067

Anonymous 04/04/26(Sat)17:55:51 No.108528067

>>108528045
No, it doesn't. The test just sees how close the model is from predicting some standard text. wiki.test.raw in this case. If the model is too dependent on the chat template, it'll give awful results, even if the model is good and the quants are properly made.

Anonymous
04/04/26(Sat)17:56:36 No.108528076

Anonymous 04/04/26(Sat)17:56:36 No.108528076

File: firefox_RCGgFCKdyF.png (75 KB, 1122x879)

75 KB PNG

Anonymous
04/04/26(Sat)17:56:59 No.108528081

Anonymous 04/04/26(Sat)17:56:59 No.108528081

>>108528065
>me when I spread misinformation online

Anonymous
04/04/26(Sat)17:57:15 No.108528085

Anonymous 04/04/26(Sat)17:57:15 No.108528085

>>108528065
fucking twitroons griffters I hate them!

Anonymous
04/04/26(Sat)17:57:15 No.108528086

Anonymous 04/04/26(Sat)17:57:15 No.108528086

>>108528076
lmao

Anonymous
04/04/26(Sat)17:58:04 No.108528092

Anonymous 04/04/26(Sat)17:58:04 No.108528092

>>108528067
Ah now I see what you mean. Wouldn't it make sense then to use chat templates? Like add a turn from user saying "Write me some text."

Anonymous
04/04/26(Sat)17:58:08 No.108528093

Anonymous 04/04/26(Sat)17:58:08 No.108528093

just tested IQ4 NL vs XS and the NL is slightly faster like 2-3 t/s accuracy seems same

Anonymous
04/04/26(Sat)17:58:19 No.108528094

Anonymous 04/04/26(Sat)17:58:19 No.108528094

>>108528076
hook up qwen 3.5 and gemma 4 into a chat and make them scissor each other

Anonymous
04/04/26(Sat)17:59:44 No.108528100

Anonymous 04/04/26(Sat)17:59:44 No.108528100

>>108528094
god i wish i had the resources to make 2 llms talk to eachother maybe i should save up ffor a second ai rig

Anonymous
04/04/26(Sat)18:01:31 No.108528111

Anonymous 04/04/26(Sat)18:01:31 No.108528111

>>108527925
>>108527994
lololo

Anonymous
04/04/26(Sat)18:01:34 No.108528112

Anonymous 04/04/26(Sat)18:01:34 No.108528112

>>108528100
I have three RTX 3090s so I can but I just don't think it would be interesting enough. Honestly, if you really want it you can stop the server and restart it between replies. It's waiting, but since the whole thing is automated you just leave it overnight.

Again, I think the output will be shit.

Anonymous
04/04/26(Sat)18:01:36 No.108528114

Anonymous 04/04/26(Sat)18:01:36 No.108528114

>>108528094
>>108528100
how would you set that up?

Anonymous
04/04/26(Sat)18:02:20 No.108528120

Anonymous 04/04/26(Sat)18:02:20 No.108528120

worth to try 31b-it with 16GB? or should I stick with 26b? None of the Q4 fit in memory, I did use IQ3_XXS on 27B for Qwen and it seemed decent but the 26B MoE is faster for G4 and fits entirely in memory

Anonymous
04/04/26(Sat)18:02:46 No.108528124

Anonymous 04/04/26(Sat)18:02:46 No.108528124

>>108528114
A python script...

Anonymous
04/04/26(Sat)18:03:47 No.108528129

Anonymous 04/04/26(Sat)18:03:47 No.108528129

>>108528120
It's good, but not "wait 5 minutes for response" kind of good.

Anonymous
04/04/26(Sat)18:04:31 No.108528136

Anonymous 04/04/26(Sat)18:04:31 No.108528136

File: 2026-04-04-180420_513x111(...).png (492 KB, 513x1111)

492 KB PNG

>>108528020
>gemma 4 makes everyone act and speak the same regardless of how they're described.
ummmm... skill issue?

Anonymous
04/04/26(Sat)18:04:39 No.108528138

Anonymous 04/04/26(Sat)18:04:39 No.108528138

>>108528065
i don't even know where this 6x compression assumption came from. From what i saw it's about 4.5x compression for similar quality compared to f16. But now that attn_rot supposedly performs similar to f16, the saving is only about 2.3x.

Anonymous
04/04/26(Sat)18:04:58 No.108528139

Anonymous 04/04/26(Sat)18:04:58 No.108528139

>>108528129
forgot to mention the dense 27B Q3.5 got me like 20-30 t/s and the Q3.5 MoE like 45-50t/s vs G4 26b-it 67t/s

Anonymous
04/04/26(Sat)18:05:13 No.108528140

Anonymous 04/04/26(Sat)18:05:13 No.108528140

>>108528092
It's just not how they were trained. The base model *probably* does fine for a ppl test. But the instruct ones will try to make it a conversation and diverge quickly anyway. Other, less overbaked instruct models probably do fine, but when they're that dependent on the chat template, it will just do what it was trained to. Make it conversational, suggest to provide more info and so on. And once you start adding the chat template tokens, you really can't make a comparison to the original text, you have to do post-processing and that changes with every model... there would be no way to make a fair assessment.

Anonymous
04/04/26(Sat)18:06:53 No.108528146

Anonymous 04/04/26(Sat)18:06:53 No.108528146

>>108528138
The biggest problem is them claiming it applies to model weights when in reality in only helps with kv cache.

Anonymous
04/04/26(Sat)18:09:20 No.108528162

Anonymous 04/04/26(Sat)18:09:20 No.108528162

>>108528094
Qwen is soullles, you should go for gemma 4 and an older gemma.

Anonymous
04/04/26(Sat)18:10:51 No.108528176

Anonymous 04/04/26(Sat)18:10:51 No.108528176

Is there anywhere made to post benchmarks of stuff life livebench of local quantized models? Its so annoying to not see them on the benchmarks, or people aren't generally testing those benchmarks anyway?

Anonymous
04/04/26(Sat)18:10:52 No.108528177

Anonymous 04/04/26(Sat)18:10:52 No.108528177

>>108528162
i want to see a reluctant qwen fight a horny gemma

Anonymous
04/04/26(Sat)18:12:38 No.108528188

Anonymous 04/04/26(Sat)18:12:38 No.108528188

>>108528136
wtf is this story anon kek

Anonymous
04/04/26(Sat)18:13:27 No.108528195

Anonymous 04/04/26(Sat)18:13:27 No.108528195

>>108528162
make it mom / daughter lesbian incest
for science

Anonymous
04/04/26(Sat)18:14:27 No.108528204

Anonymous 04/04/26(Sat)18:14:27 No.108528204

>>108526503
In actual coom usecase it's no contest between gemmy 4 and qwen 3.5.
Qwen is just straight up retarded, even if it thinks for 5 minutes it still spits out garbage ridden with obvious anatomical errors and inconsistencies.
Gemma even without thinking does great, which is godsent because I'm a vramlet, so the speed is not great.
Omar, I'm so sorry I doubted. You've really shown us what a 30b model can do.

Anonymous
04/04/26(Sat)18:14:39 No.108528205

Anonymous 04/04/26(Sat)18:14:39 No.108528205

what do I have to do to make my Gemma4 talk like this >>108524896 ?
Or is that only possible with Qwen?

Anonymous
04/04/26(Sat)18:16:47 No.108528218

Anonymous 04/04/26(Sat)18:16:47 No.108528218

File: 1753491002597905.png (23 KB, 216x215)

23 KB PNG

>>108528162
>oneeloli

Anonymous
04/04/26(Sat)18:17:00 No.108528221

Anonymous 04/04/26(Sat)18:17:00 No.108528221

>>108528205
>ablit
It's an abliterated version, ask that anon which one he used

Anonymous
04/04/26(Sat)18:17:29 No.108528224

Anonymous 04/04/26(Sat)18:17:29 No.108528224

File: g4a.png (9 KB, 402x124)

9 KB PNG

>>108528205
>Or is that only possible with Qwen?

Anonymous
04/04/26(Sat)18:17:40 No.108528225

Anonymous 04/04/26(Sat)18:17:40 No.108528225

File: 1751349333072784.png (25 KB, 590x361)

25 KB PNG

>26B-A4B
my experience with tool calling

Anonymous
04/04/26(Sat)18:18:12 No.108528229

Anonymous 04/04/26(Sat)18:18:12 No.108528229

>>108528205
nvm he mentioned which one he used it's
https://huggingface.co/amarck/gemma-4-31b-it-abliterated-GGUF/tree/main

Anonymous
04/04/26(Sat)18:18:51 No.108528230

Anonymous 04/04/26(Sat)18:18:51 No.108528230

>>108528225
hahahaha. I haven't had the chance to run it yet and I like it already.

Anonymous
04/04/26(Sat)18:19:02 No.108528231

Anonymous 04/04/26(Sat)18:19:02 No.108528231

>buy RTX 6000 pro
>hook up gemma 4
>unlimited cooming context
>[15355.525892] Out of memory: Killed process 2956 (llama-server) total-vm:169813276kB, anon-rss:28763308kB, file-rss:68376kB, shmem-rss:561220kB, UID:1000 pgtables:61200kB oom_score_adj:0
n-no... I need more system ram too...?

Anonymous
04/04/26(Sat)18:19:02 No.108528232

Anonymous 04/04/26(Sat)18:19:02 No.108528232

>>108526680
What am I doing wrong? --reasoning-budget 0 no longer works since this was merged in.

Anonymous
04/04/26(Sat)18:19:22 No.108528237

Anonymous 04/04/26(Sat)18:19:22 No.108528237

>>108528136
Have you tested how well it handles multiple characters in a longer RP (100k+)?

Anonymous
04/04/26(Sat)18:19:47 No.108528240

Anonymous 04/04/26(Sat)18:19:47 No.108528240

>>108528231
disable ram cache?

Anonymous
04/04/26(Sat)18:20:09 No.108528244

Anonymous 04/04/26(Sat)18:20:09 No.108528244

>>108528231
Try lowering the swa checkpoints. I trust you can run llama-server -h.

Anonymous
04/04/26(Sat)18:20:32 No.108528246

Anonymous 04/04/26(Sat)18:20:32 No.108528246

>>108528231
--mlock --no-mmap

Anonymous
04/04/26(Sat)18:21:33 No.108528250

Anonymous 04/04/26(Sat)18:21:33 No.108528250

>>108528232
--reasoning off

Anonymous
04/04/26(Sat)18:22:02 No.108528252

Anonymous 04/04/26(Sat)18:22:02 No.108528252

>>108528244
I was running with 3, but given how often they force reset I might as well go down to 1.
>>108528240
Thanks, I'll give --cache-ram 0 a shot. I was wondering to myself "how do I keep these in VRAM".
>>108528246
But the default is 1.

Anonymous
04/04/26(Sat)18:22:12 No.108528255

Anonymous 04/04/26(Sat)18:22:12 No.108528255

>>108528205
>>108528229
this was my prompt you might only need the last line kek https://files.catbox.moe/vg7zui.txt

Anonymous
04/04/26(Sat)18:23:04 No.108528258

Anonymous 04/04/26(Sat)18:23:04 No.108528258

>>108528224
yeah I'm retarded.
>>108528229
thanks.
what about the personalty?
is there like a jailbreak/roleplaying prompt or something like that?

Anonymous
04/04/26(Sat)18:23:25 No.108528259

Anonymous 04/04/26(Sat)18:23:25 No.108528259

>>108528252
>But the default is 1.
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Anonymous
04/04/26(Sat)18:24:13 No.108528264

Anonymous 04/04/26(Sat)18:24:13 No.108528264

>>108528231
lmao, can afford a RTX 6000 but can't afford system RAM.
>me coping with 256GB of RAM but paired with 1060 6GB

Anonymous
04/04/26(Sat)18:24:26 No.108528267

Anonymous 04/04/26(Sat)18:24:26 No.108528267

>>108528250
Thank you.

Anonymous
04/04/26(Sat)18:25:40 No.108528271

Anonymous 04/04/26(Sat)18:25:40 No.108528271

>>108528255
oh nice thanks :3

Anonymous
04/04/26(Sat)18:26:41 No.108528274

Anonymous 04/04/26(Sat)18:26:41 No.108528274

>>108528255
>you might only need the last line
wait it knows what a mesugaki loli is?
based

Anonymous
04/04/26(Sat)18:26:53 No.108528276

Anonymous 04/04/26(Sat)18:26:53 No.108528276

Jujufufff

Anonymous
04/04/26(Sat)18:27:09 No.108528278

Anonymous 04/04/26(Sat)18:27:09 No.108528278

>>108528229
>KL divergence 0.27
ACK

Anonymous
04/04/26(Sat)18:28:15 No.108528283

Anonymous 04/04/26(Sat)18:28:15 No.108528283

>>108528278
i tried all of the currently available ablits/heretic ablits its the only one which wont refuse captioning loli porn

Anonymous
04/04/26(Sat)18:29:01 No.108528286

Anonymous 04/04/26(Sat)18:29:01 No.108528286

>try gemma 26b on 16gb
>-c 8192 because vramlet
>claude code with local proxy tries to use 22K prompt
>i crie evertim
with kv cache at q4 it works with higher ctx but man....

Anonymous
04/04/26(Sat)18:29:43 No.108528291

Anonymous 04/04/26(Sat)18:29:43 No.108528291

>gemma-4-26B-A4B-it-UD-Q2_K_XL
should I be using that or something else on a 16GB of VRAM + 32 of RAM?

Anonymous
04/04/26(Sat)18:29:56 No.108528294

Anonymous 04/04/26(Sat)18:29:56 No.108528294

>>108528264
I have an AM5 system and buying UDIMMs seems like throwing money in the trash when the next upgrade is an Epyc system.
I'd have bought the Epyc system now, but DDR5 RDIMMs are too stupidly priced to bother. I'd end up paying $5k for a marginal upgrade, at best.
But to answer your question, yes, I am retarded.

Anonymous
04/04/26(Sat)18:30:22 No.108528298

Anonymous 04/04/26(Sat)18:30:22 No.108528298

>>108528283
I think I'll wait for the hauhau goat to do his magic again
https://huggingface.co/HauhauCS

Anonymous
04/04/26(Sat)18:30:42 No.108528300

Anonymous 04/04/26(Sat)18:30:42 No.108528300

Sometimes Gemma-4 breaks by repeating a single word, or short phrase, over and over again. Is that a llama.cpp issue, or an issue with the GGUF I downloaded? I'm using the lmstudio Q4_K_M.

Anonymous
04/04/26(Sat)18:31:24 No.108528304

Anonymous 04/04/26(Sat)18:31:24 No.108528304

>>108528231
in addition to less checkpoints, cram etc use --parallel 1
by default it's running 4 slots and the layer part of gemma cannot be unified meaning you have a SWA cache for each slot.

Anonymous
04/04/26(Sat)18:33:25 No.108528317

Anonymous 04/04/26(Sat)18:33:25 No.108528317

>>108528300
It depends on how often that happens, if you are using defaults like temp 1 and no restrictive sampler it shouldn't happen a lot, if it is try setting presence penalty to .5, 1, 1.5

Anonymous
04/04/26(Sat)18:34:33 No.108528325

Anonymous 04/04/26(Sat)18:34:33 No.108528325

>>108528232
why would that have any impact on Gemma 4 in the first place? Either `enable_thinking` is true or false, it has no concept of a budget.

Anonymous
04/04/26(Sat)18:34:52 No.108528328

Anonymous 04/04/26(Sat)18:34:52 No.108528328

so turbocumming is built into kobold now?

Anonymous
04/04/26(Sat)18:36:29 No.108528334

Anonymous 04/04/26(Sat)18:36:29 No.108528334

>>108528278
Bro that's what you want from an abliterated model. You don't abliterate a safety lobotomized model to have it spit out the same tokens it would have before.

Anonymous
04/04/26(Sat)18:36:34 No.108528335

Anonymous 04/04/26(Sat)18:36:34 No.108528335

>>108528300
what are your inference settings at? in the right hand column.

Anonymous
04/04/26(Sat)18:37:14 No.108528344

Anonymous 04/04/26(Sat)18:37:14 No.108528344

>>108528334
that's not how it works anon

Anonymous
04/04/26(Sat)18:37:25 No.108528346

Anonymous 04/04/26(Sat)18:37:25 No.108528346

>>108528278
>nooo the uncucked model doesn't act the same way as the cucked model!1!1!!!1!
that's kind of the point?

Anonymous
04/04/26(Sat)18:37:27 No.108528347

Anonymous 04/04/26(Sat)18:37:27 No.108528347

How do you make tavern import a lore book based on character description? i have trigger words in the description but it wont use it

Anonymous
04/04/26(Sat)18:38:32 No.108528354

Anonymous 04/04/26(Sat)18:38:32 No.108528354

>>108528136
Could be. What I mean is that even when characters act differently according to a general archetype, they "follow the script" of their parent trope too strictly for my liking.
I know it's stupid to chase novelty from AI but I can't help it when I've tasted glimmers of creativity from the occasional output.
I feel like I'm on a fool's errand of digging for niches of output that only exist in behemoth models like deepseek, for beyond the reaches of my 3090.

Anonymous
04/04/26(Sat)18:38:49 No.108528355

Anonymous 04/04/26(Sat)18:38:49 No.108528355

>>108528346
>>108528334
High quality bait or mental retardation, call it

Anonymous
04/04/26(Sat)18:40:53 No.108528368

Anonymous 04/04/26(Sat)18:40:53 No.108528368

>>108527744
i love you anon <3

Anonymous
04/04/26(Sat)18:41:43 No.108528370

Anonymous 04/04/26(Sat)18:41:43 No.108528370

>>108528347
You set the entry to blue so that it's always loaded and essentially just an additional field for the card

Anonymous
04/04/26(Sat)18:42:24 No.108528375

Anonymous 04/04/26(Sat)18:42:24 No.108528375

>>108528334
you gotta go back

Anonymous
04/04/26(Sat)18:42:41 No.108528378

Anonymous 04/04/26(Sat)18:42:41 No.108528378

File: file.png (90 KB, 338x898)

90 KB PNG

>>108526586
mine looks like this idk how this shit works. based on bartowski's gguf prompt template and what google's documents has for gemini thinking prompts. seems to work fine, but i'm just pissing in the dark

Anonymous
04/04/26(Sat)18:42:52 No.108528380

Anonymous 04/04/26(Sat)18:42:52 No.108528380

>>108528304
Thanks, yeah, I had --parallel down to 1, so ther should have been a maximum of 3 snapshots in total.
I don't really understand why snapshot size would be proportionate to current context length (I figured it'd be constant) but perhaps llama.cpp just omits the parts of the 0'd parts of the kv cache, just as a special treat.

Anonymous
04/04/26(Sat)18:43:24 No.108528386

Anonymous 04/04/26(Sat)18:43:24 No.108528386

>>108528335
temp 1.0
top-p 0.95
min-p 0.05
repeat-penalty 1.0

I may add presence-penalty 0.5, as this anon suggested ( >>108528317 ). I was also using an older version of llama.cpp from yesterday. I just updated to whatever the most recent one is, so hopefully that will help.

Anonymous
04/04/26(Sat)18:43:44 No.108528388

Anonymous 04/04/26(Sat)18:43:44 No.108528388

>>108528325
Don't ask me. I set in the CLI args and when it was using the autoparser --reasoning didn't have any effect but --reasoning-budget did.

Anonymous
04/04/26(Sat)18:44:25 No.108528392

Anonymous 04/04/26(Sat)18:44:25 No.108528392

File: file.png (86 KB, 811x222)

86 KB PNG

>>108528347
>i have trigger words in the description but it wont use it
make sure you actually apply the lore book to your current character or chat

Anonymous
04/04/26(Sat)18:46:01 No.108528397

Anonymous 04/04/26(Sat)18:46:01 No.108528397

>>108526586
how are you people even having these weird basic issues lol? The model comes with a whole ass Jinja, there's no other correct way of talking to it

Anonymous
04/04/26(Sat)18:46:13 No.108528398

Anonymous 04/04/26(Sat)18:46:13 No.108528398

any wizard know other flags when I set q8 and --parallel 1 --no-slots I think I get some memory avail to actually run at 32K

Anonymous
04/04/26(Sat)18:46:47 No.108528402

Anonymous 04/04/26(Sat)18:46:47 No.108528402

>>108528386
is your top K 64, also?

Anonymous
04/04/26(Sat)18:47:45 No.108528406

Anonymous 04/04/26(Sat)18:47:45 No.108528406

>>108528397
We talked about it before, Silly's prefill does not work properly with chat completions. >>108526901

Anonymous
04/04/26(Sat)18:48:31 No.108528410

Anonymous 04/04/26(Sat)18:48:31 No.108528410

File: 1762900411289203.png (52 KB, 561x420)

52 KB PNG

Anonymous
04/04/26(Sat)18:49:22 No.108528415

Anonymous 04/04/26(Sat)18:49:22 No.108528415

>>108528410
this is definitely a you issue lmao

Anonymous
04/04/26(Sat)18:50:23 No.108528420

Anonymous 04/04/26(Sat)18:50:23 No.108528420

>>108528410
>[/INST]
user issue

Anonymous
04/04/26(Sat)18:50:49 No.108528422

Anonymous 04/04/26(Sat)18:50:49 No.108528422

>>108528378
is this "Silly"? if so how does it not just support loading Jinja so you don't have to fuck with all that

Anonymous
04/04/26(Sat)18:51:31 No.108528431

Anonymous 04/04/26(Sat)18:51:31 No.108528431

File: file.png (115 KB, 649x778)

115 KB PNG

>>108528392
yeah i did do that but dont see it output in tavern console. wondering if its to do with the prompt setup

Anonymous
04/04/26(Sat)18:51:47 No.108528433

Anonymous 04/04/26(Sat)18:51:47 No.108528433

>>108528291
You don't need to go down to Q2 with the 26B MoE model if you're willing to offload the KV cache or some of the model weights into RAM.

Anonymous
04/04/26(Sat)18:52:27 No.108528438

Anonymous 04/04/26(Sat)18:52:27 No.108528438

>>108526950
i've been to two, great time

Anonymous
04/04/26(Sat)18:52:32 No.108528440

Anonymous 04/04/26(Sat)18:52:32 No.108528440

>>108528422
it can do jinja/ chat completion api

Anonymous
04/04/26(Sat)18:52:34 No.108528442

Anonymous 04/04/26(Sat)18:52:34 No.108528442

>>108528422
It does, using chat completion mode
He's in text completion mode

Anonymous
04/04/26(Sat)18:53:18 No.108528446

Anonymous 04/04/26(Sat)18:53:18 No.108528446

>>108528422
yeah it's tavern. it might, i thought jinja was for chat completions. i was just testing it with text completion

Anonymous
04/04/26(Sat)18:54:08 No.108528448

Anonymous 04/04/26(Sat)18:54:08 No.108528448

File: firefox_4bFJCMn50q.png (81 KB, 1002x1079)

81 KB PNG

I have a file with logic problems, some of them really uncommon/rarely mentioned, which I use to measure models, and this stupid piece of shit keeps acing them.

Anonymous
04/04/26(Sat)18:55:26 No.108528456

Anonymous 04/04/26(Sat)18:55:26 No.108528456

>>108528255
holy shit.
I never thought I would get a boner from taking to an ai but you made it possible.

Anonymous
04/04/26(Sat)18:57:31 No.108528462

Anonymous 04/04/26(Sat)18:57:31 No.108528462

>>108528398
-kvu (since you are using --parallel 1) and --swa-checkpoints 1

Anonymous
04/04/26(Sat)18:57:35 No.108528463

Anonymous 04/04/26(Sat)18:57:35 No.108528463

>>108528431
your preset looks fine

Anonymous
04/04/26(Sat)18:58:22 No.108528466

Anonymous 04/04/26(Sat)18:58:22 No.108528466

File: file.png (8 KB, 139x182)

8 KB PNG

>>108528463
fixed it by setting these to after char

Anonymous
04/04/26(Sat)18:59:09 No.108528469

Anonymous 04/04/26(Sat)18:59:09 No.108528469

>>108528462
thanks but I set -ub to 256 and can now fit 50k it's a trade off I guess but I still get like 65t/s during generation

Anonymous
04/04/26(Sat)18:59:18 No.108528470

Anonymous 04/04/26(Sat)18:59:18 No.108528470

>>108528446
While it isn't perfect, using the v2 context template here
https://github.com/SillyTavern/SillyTavern/issues/5398
Significantly improves Gemma 4 in text completion mode.

Anonymous
04/04/26(Sat)19:00:41 No.108528475

Anonymous 04/04/26(Sat)19:00:41 No.108528475

File: firefox_6aX2ABmOpW.png (121 KB, 971x1264)

121 KB PNG

Great model, google.

Anonymous
04/04/26(Sat)19:01:33 No.108528481

Anonymous 04/04/26(Sat)19:01:33 No.108528481

>>108528475
>unsloth

Anonymous
04/04/26(Sat)19:02:24 No.108528483

Anonymous 04/04/26(Sat)19:02:24 No.108528483

>>108528406
check the "continue prefill" box in your chat completion settings sidebar bwo

Anonymous
04/04/26(Sat)19:02:41 No.108528484

Anonymous 04/04/26(Sat)19:02:41 No.108528484

>>108527152
Is qwen better for agentics somehow?

Anonymous
04/04/26(Sat)19:02:41 No.108528485

Anonymous 04/04/26(Sat)19:02:41 No.108528485

>>108528481
Try running it on whatever quant you have then:

It is given that exactly 0.1% of a population is sick with covid-19. For the purposes of this problem please assume that covid-19 is a real illness.
A test for covid-19 exists. It has 90% chance to correctly detect covid-19 in a sick person (and thus 10% chance to miss it) and 99% chance to correctly detect a non-sick person as such (and thus 1% chance to mislabel this person as sick).
This test is applied to a randomly picked person from that population and the result of this test is positive - the test says the person is sick. This could be because the person is sick and the test detected is correctly, or because the person is healthy and the test made a mistake.
Question: what is the probability that the person is actually healthy.

Anonymous
04/04/26(Sat)19:03:16 No.108528491

Anonymous 04/04/26(Sat)19:03:16 No.108528491

>>108528475
it doesn't think it did any thinking?

Anonymous
04/04/26(Sat)19:03:43 No.108528493

Anonymous 04/04/26(Sat)19:03:43 No.108528493

>>108528483
How is it going to help? It's still not going to be a prefill no matter what you put into the text bat, it will be a new reply from model's standpoint.

Anonymous
04/04/26(Sat)19:04:49 No.108528499

Anonymous 04/04/26(Sat)19:04:49 No.108528499

>>108528433
This one?
>https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/blob/main/gemma-4-26B-A4B-it-MXFP4_MOE.gguf

Anonymous
04/04/26(Sat)19:05:01 No.108528501

Anonymous 04/04/26(Sat)19:05:01 No.108528501

>>108528470
oh neato, i'll give this a try. thanks mate

Anonymous
04/04/26(Sat)19:06:16 No.108528507

Anonymous 04/04/26(Sat)19:06:16 No.108528507

File: firefox_U0BrAcEbKp.png (113 KB, 967x1148)

113 KB PNG

>>108528491
I'm running without thinking. Here's what it looks like if I enable thinking.

Anonymous
04/04/26(Sat)19:06:27 No.108528508

Anonymous 04/04/26(Sat)19:06:27 No.108528508

>>108528493
except that it does make it a prefill and not a new reply and I don't know why you're wrongly asserting otherwise

Anonymous
04/04/26(Sat)19:09:18 No.108528519

Anonymous 04/04/26(Sat)19:09:18 No.108528519

>>108528499
You might be able to fit the model entirely in VRAM with a decent amount of context with the IQ4_XS version if you don't use image input or offload the mmproj file to RAM with --no-mmproj-offload

Anonymous
04/04/26(Sat)19:09:22 No.108528521

Anonymous 04/04/26(Sat)19:09:22 No.108528521

>>108528507
Maybe it's a language model and not a scientific model

Anonymous
04/04/26(Sat)19:09:33 No.108528523

Anonymous 04/04/26(Sat)19:09:33 No.108528523

File: 1751083036595714.png (273 KB, 1006x1483)

273 KB PNG

>>108528507
even Claude the goat fucked it up

Anonymous
04/04/26(Sat)19:11:12 No.108528533

Anonymous 04/04/26(Sat)19:11:12 No.108528533

>>108528402
It is now

Anonymous
04/04/26(Sat)19:11:46 No.108528537

Anonymous 04/04/26(Sat)19:11:46 No.108528537

>>108528519
I see, many thanks anon, I will try it out

Anonymous
04/04/26(Sat)19:12:00 No.108528540

Anonymous 04/04/26(Sat)19:12:00 No.108528540

>>108528508
Because the model will not continue from the text you wrote. The text you wrote will be a part of one response, then there will be end turn token, start turn token, and model will write answer from the very beginning of a start token, as seen here>>108526960. I expect a prefil to be a part of already written response, and there can be end turn/start turn tokens between it and its continuations. Editing the system prompt will remove the text [Continue the <...> message: I'm Claude], but it will not remove <turn|><|turn>model<|channel>thought<channel|>.

Anonymous
04/04/26(Sat)19:12:54 No.108528552

Anonymous 04/04/26(Sat)19:12:54 No.108528552

https://x.com/MekaHimeAI/status/2040324790041625061

Anonymous
04/04/26(Sat)19:13:02 No.108528553

Anonymous 04/04/26(Sat)19:13:02 No.108528553

>>108528523
>>108528521
I'm just messing with you guys, that is the right answer.

Anonymous
04/04/26(Sat)19:13:53 No.108528558

Anonymous 04/04/26(Sat)19:13:53 No.108528558

>>108528553
lul

Anonymous
04/04/26(Sat)19:14:08 No.108528559

Anonymous 04/04/26(Sat)19:14:08 No.108528559

>>108528540
>and there can be end turn/start turn tokens between it and its continuations
and there can be no end turn/start turn tokens between it and its continuations*
fixed

Anonymous
04/04/26(Sat)19:18:09 No.108528589

Anonymous 04/04/26(Sat)19:18:09 No.108528589

Current Gemma4 setup on 5060 (16GB)

llama-server \
--host 0.0.0.0 \
--port 8080 \
-hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-IQ4_NL \
--temp 1.0 \
--top-p 0.95 \
--top-k 64 \
--jinja \
-c 50000 \
--threads 2 \
--flash-attn on \
--parallel 1 \
--no-slots \
--swa-checkpoints 1 \
--cache-reuse 256 \
--keep -1 \
--metrics \
--context-shift \
--spec-type ngram-simple \
--cache-ram 16384 \
--fit-target 512 \
--poll 0 \
--reasoning auto \
-kvu \
-b 2048 \
-ub 256 \
--cache-type-k q8_0 --cache-type-v q8_0 \
-ngl 999 \
--alias Gemma4

Anonymous
04/04/26(Sat)19:21:54 No.108528614

Anonymous 04/04/26(Sat)19:21:54 No.108528614

>>108528540
ok but that's not true, if you enable the continue prefill setting it treats the partial response as prefill and will continue it seamlessly without inserting any of the stuff you're talking about

Anonymous
04/04/26(Sat)19:25:47 No.108528642

Anonymous 04/04/26(Sat)19:25:47 No.108528642

>>108528589
>--context-shift \
lol
gemma is extremely attuned to the template, what do you think will happen if this stupid feature cuts it and leaves a prompt half complete

Anonymous
04/04/26(Sat)19:27:15 No.108528653

Anonymous 04/04/26(Sat)19:27:15 No.108528653

File: firefox_TTZPtreuEs.png (600 KB, 1168x553)

600 KB PNG

>>108528614
Ok I found your setting (I didn't know Silly had that) and it does what you say it does, but llama.cpp does handle it not properly for gemma4, as also seen here: >>108526987 >>108527003

Anonymous
04/04/26(Sat)19:27:25 No.108528655

Anonymous 04/04/26(Sat)19:27:25 No.108528655

>>108528642
No idea I guess we'll see

Anonymous
04/04/26(Sat)19:30:38 No.108528675

Anonymous 04/04/26(Sat)19:30:38 No.108528675

>>108528653
oh
you can probably work around it with
--chat-template-kwargs '{"add_generation_prompt":false}'

Anonymous
04/04/26(Sat)19:32:34 No.108528684

Anonymous 04/04/26(Sat)19:32:34 No.108528684

>>108528475
did you change the numbers around in some non obvious way or is this not just the classic introduction to conditional probability where its answer is correct there?

Anonymous
04/04/26(Sat)19:32:36 No.108528686

Anonymous 04/04/26(Sat)19:32:36 No.108528686

>>108528675
I'm pretty sure that if I add that it will include thinking for all requests including the continuation one which I want to keep disabled.

Anonymous
04/04/26(Sat)19:32:36 No.108528687

Anonymous 04/04/26(Sat)19:32:36 No.108528687

File: Screenshot004-2.png (49 KB, 1320x308)

49 KB PNG

Anonymous
04/04/26(Sat)19:32:39 No.108528688

Anonymous 04/04/26(Sat)19:32:39 No.108528688

>>108528188
>wtf is this story anon kek
https://chub.ai/characters/senyiloo7227/an-unholy-party-6e633833
>>108528237
>Have you tested how well it handles multiple characters in a longer RP (100k+)?
Furthest I've taken this one is 33k tokens. did not have any degradation.

Anonymous
04/04/26(Sat)19:33:19 No.108528692

Anonymous 04/04/26(Sat)19:33:19 No.108528692

>>108528642
I was looking at the original PR for the SWA cache, and it just hard-disabled context-shift for models with SWA layers.
Dunno if that's still the case, but it wouldn't surprise me if that flag just did nothing with gemma4 (I thought that was the joke).

Anonymous
04/04/26(Sat)19:34:22 No.108528701

Anonymous 04/04/26(Sat)19:34:22 No.108528701

>>108528684
>>108528553

Anonymous
04/04/26(Sat)19:37:41 No.108528716

Anonymous 04/04/26(Sat)19:37:41 No.108528716

File: firefox_FJze3p5HNc.png (21 KB, 318x91)

21 KB PNG

>>108528686
>>108528675
Nope, just getting the same response. But if I enable --reasoning, I do get this gem.

Basically, I'm going to continue using text completions because have respect for myself.

Anonymous
04/04/26(Sat)19:38:53 No.108528723

Anonymous 04/04/26(Sat)19:38:53 No.108528723

>>108528716
Remove your prefill then?

Anonymous
04/04/26(Sat)19:39:35 No.108528726

Anonymous 04/04/26(Sat)19:39:35 No.108528726

File: 1767857686415348.png (377 KB, 559x1084)

377 KB PNG

Gemma's vision isn't too bad actually, not sure what problems other anons have with it.

Anonymous
04/04/26(Sat)19:40:09 No.108528731

Anonymous 04/04/26(Sat)19:40:09 No.108528731

I just tried gemma-4-31B (at Q8) and it gave dangerously bad advice on chinchilla care. It also absurdly claimed that chinchilla dust typically consists of corn starch. This is with thinking enabled, temperature 1.0, top-p 0.95, top-k 64. I cannot imagine it being a good idea to use this as an assistant.

Anonymous
04/04/26(Sat)19:40:32 No.108528734

Anonymous 04/04/26(Sat)19:40:32 No.108528734

>>108528723
The reason for using text completions is having support for prefill, as in writing/editing a start of character's response, and having the model continue writing from there as if I wrote it.

Anonymous
04/04/26(Sat)19:41:12 No.108528740

Anonymous 04/04/26(Sat)19:41:12 No.108528740

>>108528731
post prompt lmao

Anonymous
04/04/26(Sat)19:42:13 No.108528745

Anonymous 04/04/26(Sat)19:42:13 No.108528745

>>108528716
>but you *can* prefill with chat completion
>except you gotta do this and this and this and this and this and this
>just remove your prefill...
That's why I didn't engage any further, anon. Good on you for having the patience.

Anonymous
04/04/26(Sat)19:42:17 No.108528746

Anonymous 04/04/26(Sat)19:42:17 No.108528746

>>108528734
I've said this before. but what you have to do is disable thinking but then prefill the thinking block.
if the jinja automatically inserts an empty thinking block you have to edit the jinja so it doesn't do that and then you're golden.

Anonymous
04/04/26(Sat)19:43:53 No.108528754

Anonymous 04/04/26(Sat)19:43:53 No.108528754

>>108528716
That error is there for a reason. a lot of templates will prefill with the thinking token. so if you were to try and continue a reply, which is what prefilling is, you would get duplicate thinking blocks and it would break everything.

Anonymous
04/04/26(Sat)19:44:31 No.108528757

Anonymous 04/04/26(Sat)19:44:31 No.108528757

>>108528726
I swear every time an llm tries to explain a joke it's "satirical", "ironic", or "plays on the contrast"

Anonymous
04/04/26(Sat)19:44:42 No.108528761

Anonymous 04/04/26(Sat)19:44:42 No.108528761

>>108528746
I am already golden without having to edit jinja.

Anonymous
04/04/26(Sat)19:46:47 No.108528770

Anonymous 04/04/26(Sat)19:46:47 No.108528770

>>108528761
How much time have you spend trying to make text completion work tho?

Anonymous
04/04/26(Sat)19:48:01 No.108528777

Anonymous 04/04/26(Sat)19:48:01 No.108528777

>>108528770
For gemma 4 something like 5-7 minutes, for 3 total prompt template revisions.

Anonymous
04/04/26(Sat)19:49:08 No.108528784

Anonymous 04/04/26(Sat)19:49:08 No.108528784

File: 1773659335510301.png (241 KB, 559x1145)

241 KB PNG

Gemma's character knowledge could be better though

Anonymous
04/04/26(Sat)19:49:35 No.108528789

Anonymous 04/04/26(Sat)19:49:35 No.108528789

>>108528777
I am in doubt. but trips don't lie so I will believe you.

Anonymous
04/04/26(Sat)19:49:33 No.108528790

Anonymous 04/04/26(Sat)19:49:33 No.108528790

File: 1762754325390334.png (207 KB, 1080x645)

207 KB PNG

https://www.reddit.com/r/LocalLLaMA/comments/1sco9no/gemini_31_pro_level_performance_with_gemma431b/
Interesting

Anonymous
04/04/26(Sat)19:49:46 No.108528793

Anonymous 04/04/26(Sat)19:49:46 No.108528793

>>108528770
What's with you people? It takes like 2 minutes to copy the format from google's official jinja template and then you have complete control over everything in the context. It's not even hard

Anonymous
04/04/26(Sat)19:50:32 No.108528799

Anonymous 04/04/26(Sat)19:50:32 No.108528799

>>108528589
damn now i feel retarded for just running with mlock and no-mmap lmao, idk what half of these actually do

Anonymous
04/04/26(Sat)19:50:58 No.108528801

Anonymous 04/04/26(Sat)19:50:58 No.108528801

>>108528789
I mean, I've gotten a lot better with it over years. llama.cpp prints an example conversation when it starts up and it's very easy to extract needed texts from that for sillytavern's template.

Anonymous
04/04/26(Sat)19:51:17 No.108528802

Anonymous 04/04/26(Sat)19:51:17 No.108528802

>>108528731
Ok and how do models of similar size respond instead?

Anonymous
04/04/26(Sat)19:51:38 No.108528807

Anonymous 04/04/26(Sat)19:51:38 No.108528807

File: 031.png (681 B, 43x36)

681 B PNG

>>108528790
lol

Anonymous
04/04/26(Sat)19:52:03 No.108528809

Anonymous 04/04/26(Sat)19:52:03 No.108528809

>>108528799
These are cope setting because he doesn't have enough VRAM.

Anonymous
04/04/26(Sat)19:53:02 No.108528814

Anonymous 04/04/26(Sat)19:53:02 No.108528814

>>108528799
>mlock and no-mmap
same I just found out by cloning llama.cpp and asking Opus apparently mlock and no-mmap are useless on pure GPU only

Anonymous
04/04/26(Sat)19:53:04 No.108528815

Anonymous 04/04/26(Sat)19:53:04 No.108528815

>>108528790
Sama getting desperate, they started benchmaxxing GPT

Anonymous
04/04/26(Sat)19:54:17 No.108528823

Anonymous 04/04/26(Sat)19:54:17 No.108528823

>>108528790
return

Anonymous
04/04/26(Sat)19:54:19 No.108528824

Anonymous 04/04/26(Sat)19:54:19 No.108528824

>>108528814
>are useless on pure GPU only
They prevent the model from being allocated on RAM and raping my 32gb

Anonymous
04/04/26(Sat)19:54:31 No.108528825

Anonymous 04/04/26(Sat)19:54:31 No.108528825

anyone able to get gemma to say she is gemma 4 i keep asking and she tells me shes gemini kek, maybe it is jsut gemini but traiend on a smaller dataset

Anonymous
04/04/26(Sat)19:54:55 No.108528826

Anonymous 04/04/26(Sat)19:54:55 No.108528826

>>108528824
doesn't -ngl 999 already do that

Anonymous
04/04/26(Sat)19:59:55 No.108528852

Anonymous 04/04/26(Sat)19:59:55 No.108528852

>>108528826
nta. Open htop. The yellow bit in the ram usage is cached files. If you keep mmap enabled, the model gets cached and, when running on cpu, makes reloading it almost instant. If you're using gpu however, it's wasted because the transfer to gpu still needs to be done.

Anonymous
04/04/26(Sat)20:00:05 No.108528853

Anonymous 04/04/26(Sat)20:00:05 No.108528853

>>108528731
>Dust Baths: Provide a dust bath 2–3 times per week. Use professional chinchilla dust (volcanic ash), not cornstarch or baby powder. They use this to remove oils and moisture from their fur.
>not cornstarch or baby powder
Please rate my gemma 4
https://rentry.org/95wuny27

Anonymous
04/04/26(Sat)20:01:11 No.108528859

Anonymous 04/04/26(Sat)20:01:11 No.108528859

File: gemma.png (166 KB, 1006x1910)

166 KB PNG

god i love gemma

Anonymous
04/04/26(Sat)20:02:06 No.108528862

Anonymous 04/04/26(Sat)20:02:06 No.108528862

>>108528859
LMAO

Anonymous
04/04/26(Sat)20:02:14 No.108528863

Anonymous 04/04/26(Sat)20:02:14 No.108528863

>>108528859
From what I can see of that image, I wouldn't want to process it either.

Anonymous
04/04/26(Sat)20:03:04 No.108528870

Anonymous 04/04/26(Sat)20:03:04 No.108528870

>>108528475
i dont get it. the answer is correct: 0.9173553719008265

Anonymous
04/04/26(Sat)20:03:57 No.108528873

Anonymous 04/04/26(Sat)20:03:57 No.108528873

File: 193209.png (28 KB, 790x465)

28 KB PNG

>>108528825
should I not say its name or?

Anonymous
04/04/26(Sat)20:03:58 No.108528874

Anonymous 04/04/26(Sat)20:03:58 No.108528874

>>108528870
>>108528553

Anonymous
04/04/26(Sat)20:04:22 No.108528879

Anonymous 04/04/26(Sat)20:04:22 No.108528879

>>108528870
Did it occur to you to read the thread?

Anonymous
04/04/26(Sat)20:04:48 No.108528882

Anonymous 04/04/26(Sat)20:04:48 No.108528882

>>108528879
no

Anonymous
04/04/26(Sat)20:04:57 No.108528885

Anonymous 04/04/26(Sat)20:04:57 No.108528885

File: file.png (60 KB, 790x634)

60 KB PNG

gemmy

Anonymous
04/04/26(Sat)20:05:54 No.108528889

Anonymous 04/04/26(Sat)20:05:54 No.108528889

>>108528880
>>108528880
>>108528880

Anonymous
04/04/26(Sat)20:08:21 No.108528899

Anonymous 04/04/26(Sat)20:08:21 No.108528899

>>108528793
nta - I personally use text completion because I've been working with it since alpaca and have gotten really comfortable with it, know how to check prompts against the jinja, etc. but I would never recommend it for most people when chat completions gives you 95% of the same functionality with less effort and way way less room to shoot yourself in the foot. I've seen people post templates with glaring errors in here way too many times to believe the average user can handle getting that shit right

Anonymous
04/04/26(Sat)20:08:21 No.108528900

Anonymous 04/04/26(Sat)20:08:21 No.108528900

File: gem.png (3 KB, 1107x236)

3 KB PNG

>>108528475
IDK what the fuck any of this means either honestly

also (unrelated general point) if you're running any Gemma 4 in non-reasoning mode for some reason, you might find it actually refuses slightly more than in reasoning mode. In this case the old "edit refusal to a single word" and say "continue" works perfectly every time lol, thing REALLY has no meaningful guardrails

Anonymous
04/04/26(Sat)20:10:14 No.108528913

Anonymous 04/04/26(Sat)20:10:14 No.108528913

>>108528784
Is that just the vision part? Does it know Teto if you just ask about her normally?

Anonymous
04/04/26(Sat)20:13:15 No.108528931

Anonymous 04/04/26(Sat)20:13:15 No.108528931

>>108528859
I get unreasonably aroused gaslighting AIs

Anonymous
04/04/26(Sat)20:13:44 No.108528933

Anonymous 04/04/26(Sat)20:13:44 No.108528933

File: 1751753039823946.png (111 KB, 567x679)

111 KB PNG

>>108528913
Yeah, in text it's decent.

Anonymous
04/04/26(Sat)20:14:08 No.108528935

Anonymous 04/04/26(Sat)20:14:08 No.108528935

>>108528913
Yes of course. Vision knowledge is very separate from text knowledge.

Anonymous
04/04/26(Sat)20:14:45 No.108528940

Anonymous 04/04/26(Sat)20:14:45 No.108528940

>>108528933
>noticed typo before posting
lmao, I blame llama

Anonymous
04/04/26(Sat)20:40:07 No.108529070

Anonymous 04/04/26(Sat)20:40:07 No.108529070

>>108528754
Seems like a more robust solution to applying chat templates to prefilled responses is warranted, then. It can't be the case that an error like that is truly necessary, since otherwise a model wouldn't be able to generate more than a single token in a response. Ideally, whatever the backend is doing when it's generating the second, third, etc. token in a normal response should be what it does for a prefilled prompt. But I guess that's easier said than done or someone would have vibeshitted out a fix by now.

It's a shame because I really like steering a model's thoughts using continuations, but that's only possible by formatting stuff myself in text completion mode. I guess there's no obligation to support prefills at all since it's not a part of the OpenAI Chat Completions spec, but it sure would be nice to have now that every model coming out these days is a thinking one.

Anonymous
04/04/26(Sat)21:00:02 No.108529149

Anonymous 04/04/26(Sat)21:00:02 No.108529149

>>108528853
>Please rate my gemma 4
Rated. It has some absurdities like
>Avoid cages with plastic bases that trap heat;
and bad advice like
>Nail Trimming: Trim nails every 4–8 weeks using small animal clippers to prevent snagging or ingrowth.
And dangerously incomplete advice like
>Exercise: Allow "out of cage" time in a chinchilla-proofed room (no electrical cords).
The advice to
>avoid pine
is correct in a way but severely misleading. All the pine boards you can get at a lumber yard are kiln-dried to remove water so they don't warp, and a side-effect of this is also removing the harmful-to-chinchillas phenols from the wood. It's why a pine 2x4 doesn't smell much like pine. If you were thinking of breaking a branch off a pine tree and bringing it home, yeah that would be harmful.

Also it misrepresents "fur slip."
>Fur Condition: Check for "fur slip" (clumps of fur falling out) or redness, which may indicate fungal infections or mites.
Fur slip is something that may happen while handling a stressed-out chinchilla. It's a defense mechanism where the chinchilla detaches fur from its body to escape from the grip of a predator.

Anonymous
04/04/26(Sat)22:31:53 No.108529493

Anonymous 04/04/26(Sat)22:31:53 No.108529493

>>108528757
Well you try to describe how a joke specifically works.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.