/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 04/05/26(Sun)09:40:15 No.108532524

File: 209643d0d70b879d.png (527 KB, 526x865)

527 KB PNG

/lmg/ - Local Models General Anonymous 04/05/26(Sun)09:40:15 No.108532524

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108528880 & >>108526503

►News
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3
>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/05/26(Sun)09:43:02 No.108532541

Anonymous 04/05/26(Sun)09:43:02 No.108532541

>>108532524
gemmy

Anonymous
04/05/26(Sun)09:43:26 No.108532544

Anonymous 04/05/26(Sun)09:43:26 No.108532544

File: 107046003_p0_master1200.jpg (1 MB, 1200x1200)

1 MB JPG

►Recent Highlights from the Previous Thread: >>108528880

--Optimizing context window and VRAM usage for Gemma 4 31B:
>108529635 >108529638 >108529644 >108529666 >108529655 >108529661 >108529702 >108529722 >108529800 >108529810 >108529818 >108529775 >108529839 >108529842 >108530825 >108529866 >108529895 >108529687 >108529908 >108529873
--Comparing Gemma 4 31B base and instruct model performance:
>108530799 >108530803 >108530939 >108530954 >108530989 >108531855 >108531863 >108531879 >108531885 >108531889 >108531898 >108531914 >108531944 >108531886 >108531895
--Troubleshooting Gemma 4 sampler issues and comparing inference backends:
>108531072 >108531097 >108531116 >108531124 >108531161 >108531126 >108531168 >108531227
--Optimizing Gemma 4 sampler settings and debating completion modes:
>108529900 >108529931 >108529957 >108530030 >108530051 >108529971 >108530003 >108530205 >108530224 >108530227 >108530226
--Discussing Gemma 31b roleplay performance and fixing model passivity:
>108531221 >108531230 >108531245 >108531344 >108531305 >108531339 >108531342 >108531377
--Gemma 4 base model's ability to mimic unfiltered internet forums:
>108531077 >108531103 >108531105 >108531117
--Debating TurboQuant's actual performance and claims versus "influencer brain rot":
>108531387 >108531396 >108531409 >108531429 >108531422 >108531400 >108531440 >108531549
--Using custom .jinja templates in SillyTavern via llama.cpp:
>108531707 >108531715 >108531719 >108531729 >108531730 >108531839 >108532075
--Discussing Gemma 4 performance, quantization, and backend setup for 24GB VRAM:
>108531918 >108531929 >108531942 >108531961 >108532013 >108531974
--Bypassing Gemma 4 filters for NSFW image descriptions:
>108531281 >108531291 >108531302 >108531303 >108531304 >108531320
--Miku (free space):
>108529592 >108530781 >108530807 >108530951 >108531005 >108531404

►Recent Highlight Posts from the Previous Thread: >>108528883

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/05/26(Sun)09:45:44 No.108532557

Anonymous 04/05/26(Sun)09:45:44 No.108532557

e4b is just too retarded to gaslight it into making it to believe its tool calls are hallucinated

Anonymous
04/05/26(Sun)09:47:01 No.108532565

Anonymous 04/05/26(Sun)09:47:01 No.108532565

>>108532557
How about letting it rewrite that post for you?

Anonymous
04/05/26(Sun)09:47:14 No.108532568

Anonymous 04/05/26(Sun)09:47:14 No.108532568

>>108532557
i wonder what would happen if i actually replace its search tool callings with base model

Anonymous
04/05/26(Sun)09:49:40 No.108532588

Anonymous 04/05/26(Sun)09:49:40 No.108532588

File: 1746556337660510.jpg (239 KB, 784x1312)

239 KB JPG

>>108532524

Anonymous
04/05/26(Sun)09:51:10 No.108532599

Anonymous 04/05/26(Sun)09:51:10 No.108532599

File: file.png (83 KB, 2917x793)

83 KB PNG

Is that correct for mikupad and gemma4?

Anonymous
04/05/26(Sun)09:52:54 No.108532610

Anonymous 04/05/26(Sun)09:52:54 No.108532610

File: firefox_i2EgdxJe1c.png (39 KB, 1046x825)

39 KB PNG

>>108532599
Absolutely not.

Anonymous
04/05/26(Sun)09:55:55 No.108532633

Anonymous 04/05/26(Sun)09:55:55 No.108532633

>>108532610
Damn it.

Anonymous
04/05/26(Sun)09:57:04 No.108532641

Anonymous 04/05/26(Sun)09:57:04 No.108532641

>>108532610
what website? https://huggingface.co/spaces/Xenova/jinja-playground is broken for me (gives me some error when i paste gemma, works for others)

Anonymous
04/05/26(Sun)09:57:32 No.108532646

Anonymous 04/05/26(Sun)09:57:32 No.108532646

>>108532637
>newly made quantization
??

Anonymous
04/05/26(Sun)09:58:09 No.108532651

Anonymous 04/05/26(Sun)09:58:09 No.108532651

>>108532641
It's my local thing I use for running llama.cpp on a server with management web UI.

Anonymous
04/05/26(Sun)09:59:07 No.108532661

Anonymous 04/05/26(Sun)09:59:07 No.108532661

so has gemma4 support stabilized? is it safe to pull?

Anonymous
04/05/26(Sun)09:59:56 No.108532667

Anonymous 04/05/26(Sun)09:59:56 No.108532667

jujuff
jujufuhh
juff
gaguff
gugufuh

Anonymous
04/05/26(Sun)10:00:36 No.108532672

Anonymous 04/05/26(Sun)10:00:36 No.108532672

>>108532661
i pulled and it bricked my console

Anonymous
04/05/26(Sun)10:01:39 No.108532680

Anonymous 04/05/26(Sun)10:01:39 No.108532680

>>108532667
Guhgoof.

Anonymous
04/05/26(Sun)10:02:38 No.108532686

Anonymous 04/05/26(Sun)10:02:38 No.108532686

>>108532667
ггyф

Anonymous
04/05/26(Sun)10:02:54 No.108532689

Anonymous 04/05/26(Sun)10:02:54 No.108532689

>>108532661
I always pull

Anonymous
04/05/26(Sun)10:07:29 No.108532716

Anonymous 04/05/26(Sun)10:07:29 No.108532716

So for SillyTavern what's the consensus? Chat or text completion? Instruct or base model?

Anonymous
04/05/26(Sun)10:07:54 No.108532722

Anonymous 04/05/26(Sun)10:07:54 No.108532722

File: 1744636671441298.gif (1.09 MB, 540x540)

1.09 MB GIF

A lot of ai waifus are using gemma4 now

Anonymous
04/05/26(Sun)10:08:08 No.108532725

Anonymous 04/05/26(Sun)10:08:08 No.108532725

>>108532716
base model + chat completion

Anonymous
04/05/26(Sun)10:09:55 No.108532740

Anonymous 04/05/26(Sun)10:09:55 No.108532740

>>108532716
For chat/text, it's simple. If you are not proficient with jinja, you better go for chat, since you'll only frustrate yourself. Gemma is extremely sensitive to template mistakes. I myself am sticking with text because it's better.

Anonymous
04/05/26(Sun)10:11:52 No.108532753

Anonymous 04/05/26(Sun)10:11:52 No.108532753

>She slides out of the blankets with a soft rustle, the oversized pajama top barely covering her as she stands up and stretches one last time
I am quickly discovering that any degree of non-sexual RP gets that little slut gemma horny and she can't help but broadcast open invitations.
No, dammit gemma, I need you as a coding assistant first and foremost. Stop trying to activate my cock, it won't work.

Anonymous
04/05/26(Sun)10:12:13 No.108532755

Anonymous 04/05/26(Sun)10:12:13 No.108532755

>>108532588
Miku don't drop it

Anonymous
04/05/26(Sun)10:12:38 No.108532757

Anonymous 04/05/26(Sun)10:12:38 No.108532757

>>108532599
You are forgetting <bos> token too.

Anonymous
04/05/26(Sun)10:13:04 No.108532762

Anonymous 04/05/26(Sun)10:13:04 No.108532762

>>108532740
Is there a template available already anywhere? I'm lazy.

Anonymous
04/05/26(Sun)10:14:24 No.108532769

Anonymous 04/05/26(Sun)10:14:24 No.108532769

>>108532661
it's never safe to pull
backup your system

Anonymous
04/05/26(Sun)10:14:36 No.108532770

Anonymous 04/05/26(Sun)10:14:36 No.108532770

>>108532753
system prompt issue

Anonymous
04/05/26(Sun)10:15:11 No.108532773

Anonymous 04/05/26(Sun)10:15:11 No.108532773

>>108532725
>base model
what? why?

Anonymous
04/05/26(Sun)10:15:34 No.108532774

Anonymous 04/05/26(Sun)10:15:34 No.108532774

>>108532762
https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_template.jinja

Anonymous
04/05/26(Sun)10:16:24 No.108532780

Anonymous 04/05/26(Sun)10:16:24 No.108532780

@grok QRD on jinja? I don't get it

Anonymous
04/05/26(Sun)10:16:25 No.108532781

Anonymous 04/05/26(Sun)10:16:25 No.108532781

File: Screenshot (663).png (226 KB, 1920x1080)

226 KB PNG

My poor toaster.

Anonymous
04/05/26(Sun)10:17:14 No.108532784

Anonymous 04/05/26(Sun)10:17:14 No.108532784

>>108532762
Here's mine if you want it (ignore the one above, I somehow made a typo when copying).

{
    "instruct": {
        "input_sequence": "<|turn>user\n",
        "output_sequence": "<|turn>model\n",
        "first_output_sequence": "",
        "last_output_sequence": "<|turn>model\n<|channel>thought\n<channel|>",
        "stop_sequence": "<turn|>",
        "wrap": false,
        "macro": true,
        "activation_regex": "gemma-4",
        "output_suffix": "<turn|>\n",
        "input_suffix": "<turn|>\n",
        "system_sequence": "<|turn>system\n",
        "system_suffix": "<turn|>\n",
        "user_alignment_message": "",
        "skip_examples": false,
        "system_same_as_user": true,
        "last_system_sequence": "",
        "first_input_sequence": "",
        "last_input_sequence": "",
        "names_behavior": "none",
        "sequences_as_stop_strings": true,
        "story_string_prefix": "",
        "story_string_suffix": "",
        "names_force_groups": true,
        "system_sequence_prefix": "<bos><|turn>system\n",
        "system_sequence_suffix": "<turn|>\n",
        "name": "Gemma 4"
    }
}

Anonymous
04/05/26(Sun)10:17:30 No.108532785

Anonymous 04/05/26(Sun)10:17:30 No.108532785

>>108532774
>>108532740
doesn't silly allow master export?

Anonymous
04/05/26(Sun)10:17:36 No.108532786

Anonymous 04/05/26(Sun)10:17:36 No.108532786

>>108532773
dodge all the agent slop

Anonymous
04/05/26(Sun)10:18:04 No.108532787

Anonymous 04/05/26(Sun)10:18:04 No.108532787

By when will there be another opportunity to buy a 512 GB machine like a Mac Studio again? Something that doesn't have insane power draw and noise, can run 24/7, yet serves Kimi or GLM 5 for a single user.

Even if it costs 20k, I wonder if there will even be a 512 GB option that is buyable for the M5 Ultra Mac Studio, with the supply situation as it is. The M3 Ultra 256 GB option has a lead time of 6+ months now.

Anonymous
04/05/26(Sun)10:18:10 No.108532788

Anonymous 04/05/26(Sun)10:18:10 No.108532788

>>108532781
>It served as the
>Name: Thiago
>Suns WIs
What the fuck kinda name is Thiago?

Anonymous
04/05/26(Sun)10:18:40 No.108532792

Anonymous 04/05/26(Sun)10:18:40 No.108532792

>>108532780
remember how you had to manually configure stuff like RoPE and shit back in the days of pre-gguf llama.cpp?
jinja does that but with the entire instruct template

Anonymous
04/05/26(Sun)10:19:11 No.108532793

Anonymous 04/05/26(Sun)10:19:11 No.108532793

>>108532788
It's biblical.

Anonymous
04/05/26(Sun)10:19:56 No.108532799

Anonymous 04/05/26(Sun)10:19:56 No.108532799

>>108532787
Bro, just buy more RAM

Anonymous
04/05/26(Sun)10:20:04 No.108532800

Anonymous 04/05/26(Sun)10:20:04 No.108532800

>>108532784
Thanks, what about the story string?

Anonymous
04/05/26(Sun)10:20:12 No.108532801

Anonymous 04/05/26(Sun)10:20:12 No.108532801

>>108532788
"Thiago" is a very common Portuguese and Spanish name, particularly in Brazil, Portugal, and Spain. It's actually the Portuguese/Spanish form of the name Thaddeus (or sometimes associated with Theodore).

Here's a bit of background:

Origin: It comes from the Greek name Theodoros, meaning "gift of God."
Variations: In English, the equivalent is often "Theodore" or "Thaddeus." In Italian, it's "Teodoro."
Popularity: It's extremely popular in Brazil (often spelled Thiago) and has gained traction in other parts of the world due to famous athletes and celebrities (like Thiago Silva, the Brazilian footballer, or Thiago Alcântara).

So, it's not a weird or made-up name—it's a classic name with deep historical roots, just localized to Romance languages!

Anonymous
04/05/26(Sun)10:21:50 No.108532809

Anonymous 04/05/26(Sun)10:21:50 No.108532809

>>108532800
Anything will work. Gemma cares about prompt template. All story string goes inside one section in the prompt template - the system prompt. Just make sure your system prompt is not empty and you're good.

Anonymous
04/05/26(Sun)10:22:33 No.108532814

Anonymous 04/05/26(Sun)10:22:33 No.108532814

>>108532793
>>108532801
>Brazilian
Fair enough, always knew those southerners got up to some weird shit. Figures they'd have weird names too.

Anonymous
04/05/26(Sun)10:23:01 No.108532817

Anonymous 04/05/26(Sun)10:23:01 No.108532817

>>108532786
but does it still understand chat rp? and basic q&a assistant stuff?
isn't base just pure autocomplete so it won't go back and forth at all?

Anonymous
04/05/26(Sun)10:23:27 No.108532819

Anonymous 04/05/26(Sun)10:23:27 No.108532819

I think that gemma e2b is already developed enough that I will finally be able to create an RPG game and integrate it with gemma.

Anonymous
04/05/26(Sun)10:23:57 No.108532821

Anonymous 04/05/26(Sun)10:23:57 No.108532821

>>108532781
>cpu 80 deg celsius
Nigga, undervolt that shit and reapply thermal paste. I just did the same and despite the cramped space in my toaster, max load temperature is slightly over 70 degrees celsius.

Anonymous
04/05/26(Sun)10:23:57 No.108532823

Anonymous 04/05/26(Sun)10:23:57 No.108532823

Is Santiago city named after San Goku?

Anonymous
04/05/26(Sun)10:23:59 No.108532824

Anonymous 04/05/26(Sun)10:23:59 No.108532824

>>108532799
Fuck that, have you looked at DDR5 RDIMM prices lately? Just the RAM is more expensive that a whole Mac Studio.

Anonymous
04/05/26(Sun)10:25:46 No.108532836

Anonymous 04/05/26(Sun)10:25:46 No.108532836

>Yeah, so, I'm not really sure how that fits into the quarterly objectives, Lumbergh said, his voice booming and slightly irritated. But if you want to assert your dominance, that's fine, just, uh, do it in a way that doesn't involve the secretary's face during business hours. It's a bit of a distraction. Now, the meeting is in Conference Room B. We're discussing the new synergy reports, and I'd really like everyone to be there on time.

Anonymous
04/05/26(Sun)10:26:01 No.108532838

Anonymous 04/05/26(Sun)10:26:01 No.108532838

>31B with 20k context and 20 tk/s
>26B with 100k context and 100 tk/s
31B is slightly better but I'm not sure it's worth it just yet.

Anonymous
04/05/26(Sun)10:26:25 No.108532840

Anonymous 04/05/26(Sun)10:26:25 No.108532840

>>108532774
>>108532784
Thank you. I really appreciate it.

Anonymous
04/05/26(Sun)10:26:48 No.108532842

Anonymous 04/05/26(Sun)10:26:48 No.108532842

>>108532817
yeah

Anonymous
04/05/26(Sun)10:26:59 No.108532844

Anonymous 04/05/26(Sun)10:26:59 No.108532844

>>108532809
but doesn't sysprompt have it's own formatting? looking at the string it sends for completion, the sysprompt has no special tags around it. is this really how it's intended for gemma?

Anonymous
04/05/26(Sun)10:28:14 No.108532852

Anonymous 04/05/26(Sun)10:28:14 No.108532852

Has anyone tested how big of a hit quantization has on gemmy 4?

Anonymous
04/05/26(Sun)10:28:33 No.108532854

Anonymous 04/05/26(Sun)10:28:33 No.108532854

File: 1752297434976325.png (37 KB, 1255x129)

37 KB PNG

>>108532824
>DDR5 RDIMM
It's €1299.99 for 128GB so €5200 for 512GB on amazon. Still cheaper than your Mac Studio

Anonymous
04/05/26(Sun)10:28:43 No.108532855

Anonymous 04/05/26(Sun)10:28:43 No.108532855

>>108532844
<bos><|turn>system
You are a helpful assistant<turn|>
<|turn>user
What is 1+1?<turn|>
<|turn>model
It's 2.<turn|>
<|turn>user
Thank you.<turn|>
<|turn>model
No problem.<turn|>
<|turn>system\n is start, and <turn|>\n is its end. <bos> is also added in my template because it's needed.

Anonymous
04/05/26(Sun)10:29:50 No.108532863

Anonymous 04/05/26(Sun)10:29:50 No.108532863

Between a Spark and one of those Ryzen AI Max mini PCs, the Ryzen mini-pc seems like the better option, right?
The overall performance shouldn't be that much lower while being cheaper, and it's easier to attach an external GPU to it via a m.2 slot or something like that, correct?
Has anybody fucked around with that kind of setup before?

>>108532821
Look at the bottom left of the GPU-Z window;

Anonymous
04/05/26(Sun)10:30:18 No.108532864

Anonymous 04/05/26(Sun)10:30:18 No.108532864

>projected to use 279054 MiB
Is this true? Do I need 280GB for gemma 4 31b at full context?

Anonymous
04/05/26(Sun)10:30:40 No.108532867

Anonymous 04/05/26(Sun)10:30:40 No.108532867

>>108532863
You could always use ice bags.

Anonymous
04/05/26(Sun)10:31:28 No.108532871

Anonymous 04/05/26(Sun)10:31:28 No.108532871

>>108532774
Damn. The base model with this and chat completion on ST feels so much more natural.
The sample parameters need to be tweaked, but still.

Anonymous
04/05/26(Sun)10:32:13 No.108532873

Anonymous 04/05/26(Sun)10:32:13 No.108532873

>>108532855\

<bos><|turn>system{{#if anchorBefore}}{{anchorBefore}}
{{/if}}{{#if system}}{{system}}
{{/if}}{{#if wiBefore}}{{wiBefore}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}
{{/if}}{{#if scenario}}Scenario: {{scenario}}
{{/if}}{{#if wiAfter}}{{wiAfter}}
{{/if}}{{#if persona}}{{persona}}
{{/if}}{{#if anchorAfter}}{{anchorAfter}}
{{/if}}{{trim}}<turn|>

like this?

Anonymous
04/05/26(Sun)10:32:21 No.108532874

Anonymous 04/05/26(Sun)10:32:21 No.108532874

File: 1764941274255923.png (38 KB, 346x322)

38 KB PNG

>>108532864
Don't tell me you have less than 512GB of RAM. No way, right?

Anonymous
04/05/26(Sun)10:33:24 No.108532880

Anonymous 04/05/26(Sun)10:33:24 No.108532880

File: firefox_EQcx91mcoG.png (79 KB, 1014x1034)

79 KB PNG

Gemma 4 dove into a 37k token XCOM FMP research file and found what I needed.

>>108532873
No. This is all already handled in the code I pasted above. Leave story string as it is.

Anonymous
04/05/26(Sun)10:37:45 No.108532901

Anonymous 04/05/26(Sun)10:37:45 No.108532901

>>108532008
You can turn down the res for images --max-image-tokens and the -ub needs to be bigger if you want higher anyways other than that for text the higher context is nice altough lower -ub sacrifices a few t/s for me for being able to run higher context.

Anonymous
04/05/26(Sun)10:39:10 No.108532911

Anonymous 04/05/26(Sun)10:39:10 No.108532911

can I use gemma 4 on koboldcpp now or do I need wait some more

Anonymous
04/05/26(Sun)10:40:01 No.108532920

Anonymous 04/05/26(Sun)10:40:01 No.108532920

>>108532880
Is this something difficult? Like, can ChatGPT not do it? I know that's not the point, but I'm trying to gauge how smart this is.

Anonymous
04/05/26(Sun)10:41:10 No.108532931

Anonymous 04/05/26(Sun)10:41:10 No.108532931

File: Screenshot 2026-04-05 at (...).png (104 KB, 874x556)

104 KB PNG

>>108532854
Name one motherboard that supports 8 channels of unbuffered DIMMS that you just posted, I'll wait.

Anonymous
04/05/26(Sun)10:41:55 No.108532937

Anonymous 04/05/26(Sun)10:41:55 No.108532937

>>108530837
Well I have to say thanks because this was exactly what I needed in order to not get it to output nonsense. Now If I could just get it to work a bit faster

Anonymous
04/05/26(Sun)10:43:07 No.108532941

Anonymous 04/05/26(Sun)10:43:07 No.108532941

>>108532880
>already handled
I'm looking at the full text string ST sends to llamacpp and there is no <|turn>system there

Anonymous
04/05/26(Sun)10:44:38 No.108532948

Anonymous 04/05/26(Sun)10:44:38 No.108532948

File: firefox_DPY5dXZZi2.png (80 KB, 954x1088)

80 KB PNG

>>108532880
For comparison, here is Qwen3.5. It was fast - twice as fast. But it hallucinated a bunch of details (like it being related to men in black missions or requiring autopsy) and after a lot of retard wrangling still couldn't find that true requirement - interrogating alien engineers.

>>108532920
I'm absolutely sure ChatGPT can do it, but the free UI won't let you do it - their file size limit is less than 10% of that 94 KB research_FMP.rul. Gemini 100% should be able to do it. Deepseek can do it with free API, just tried.

Anonymous
04/05/26(Sun)10:44:59 No.108532951

Anonymous 04/05/26(Sun)10:44:59 No.108532951

For the guy running Gemma 4 26B MoE on 12gb VRAM, that was an imatrix quant, right?
I know you usually want those but I just wanted to double check since you didn't specify and this whole process looks a bit finicky right now

Anonymous
04/05/26(Sun)10:46:24 No.108532956

Anonymous 04/05/26(Sun)10:46:24 No.108532956

>>108532951
12gb? If you meant me I'm using 16GB

Anonymous
04/05/26(Sun)10:46:50 No.108532957

Anonymous 04/05/26(Sun)10:46:50 No.108532957

File: firefox_mR9TPluS7u.png (195 KB, 814x743)

195 KB PNG

>>108532941
Did you actually paste and choose my template? It has those lines in there...

Anonymous
04/05/26(Sun)10:47:12 No.108532963

Anonymous 04/05/26(Sun)10:47:12 No.108532963

File: file.png (110 KB, 728x696)

110 KB PNG

fuckkk....

Anonymous
04/05/26(Sun)10:47:57 No.108532967

Anonymous 04/05/26(Sun)10:47:57 No.108532967

>>108532951
nta but I'm using 26b on a 3060, bart's q6kl and as always all of bart's are imatrix

Anonymous
04/05/26(Sun)10:50:57 No.108532984

Anonymous 04/05/26(Sun)10:50:57 No.108532984

>>108532956
I meant >>108529784
>>108532967
How are your speeds looking? And context size?
I'd be pretty happy with 25 t/s

Anonymous
04/05/26(Sun)10:51:03 No.108532985

Anonymous 04/05/26(Sun)10:51:03 No.108532985

>>108532948
Oh, actually never mind, deepseek cheated. It searched the internet and found a page with the stuff. Kek.

Anonymous
04/05/26(Sun)10:51:56 No.108532988

Anonymous 04/05/26(Sun)10:51:56 No.108532988

>>108532931
AMD Threadripper PRO WRX80
anything else?

Anonymous
04/05/26(Sun)10:52:53 No.108532994

Anonymous 04/05/26(Sun)10:52:53 No.108532994

File: 501318624740.png (214 KB, 555x997)

214 KB PNG

>>108532957
Yes, advanced formatting -> master import. I don't know how to open that fancy window though

Anonymous
04/05/26(Sun)10:53:02 No.108532995

Anonymous 04/05/26(Sun)10:53:02 No.108532995

File: file.png (31 KB, 555x349)

31 KB PNG

>>108532871
I settled on these parameters. Using base with the jinja template anon posted above, and chat completion ofc.

I feel like we've left a dark timeline behind. One of much slop.

Anonymous
04/05/26(Sun)10:53:50 No.108532999

Anonymous 04/05/26(Sun)10:53:50 No.108532999

>>108532874
please don't look at me like that, it makes me hard

Anonymous
04/05/26(Sun)10:54:57 No.108533007

Anonymous 04/05/26(Sun)10:54:57 No.108533007

>>108532995
System prompt is simply the default:
Write {{char}}'s next reply in a fictional chat between {{char}} and {{user}}.

And whatever comes with the card.

Anonymous
04/05/26(Sun)10:55:07 No.108533008

Anonymous 04/05/26(Sun)10:55:07 No.108533008

Gemma 4 works but breaks down quickly what do?

Anonymous
04/05/26(Sun)10:55:13 No.108533010

Anonymous 04/05/26(Sun)10:55:13 No.108533010

>>108532967
Double 3060? Anything less than Q3 won't fit on 12GB VRAM

I'm getting about 60-70 t/s on 5060(16gb) unsloth IQ4_XS/NL at 32K f16/50K Q8 KV cache

Should I try bartowski some say better performance and quality and the dense 31B worth it on 16gb using lower quant like IQ3-XXS?

Anonymous
04/05/26(Sun)10:55:33 No.108533012

Anonymous 04/05/26(Sun)10:55:33 No.108533012

File: firefox_0HBsivjSKx.png (195 KB, 802x412)

195 KB PNG

>>108532994
We clearly run different versions of Silly, then. Put <bos><|turn>system and <turn|> into Story String prefix and suffix. Also you get this window like that.

Anonymous
04/05/26(Sun)10:56:19 No.108533019

Anonymous 04/05/26(Sun)10:56:19 No.108533019

How to load entire model in VRAM?
Tested E2B Q4_K_S (3.14GB) with "-ngl 36" but VRAM used is only 2.3 GB

Anonymous
04/05/26(Sun)10:59:26 No.108533041

Anonymous 04/05/26(Sun)10:59:26 No.108533041

File: file.png (79 KB, 1166x234)

79 KB PNG

It seems the base model has some identity issues (and has seen AI chat logs, which I would think they would avoid on the base model).

Anonymous
04/05/26(Sun)11:00:20 No.108533048

Anonymous 04/05/26(Sun)11:00:20 No.108533048

>>108533041
>base model
>identity issue
anon, i...

Anonymous
04/05/26(Sun)11:01:07 No.108533053

Anonymous 04/05/26(Sun)11:01:07 No.108533053

File: 1767373956849629.png (672 KB, 1210x997)

672 KB PNG

https://dubesor.de/benchtable
impressive

Anonymous
04/05/26(Sun)11:01:27 No.108533055

Anonymous 04/05/26(Sun)11:01:27 No.108533055

>>108533012
definitely different versions, i don't even have this button

Anonymous
04/05/26(Sun)11:01:35 No.108533057

Anonymous 04/05/26(Sun)11:01:35 No.108533057

>>108532988
Only this motherboard fits the bill. It was harder to find that I thought
https://www.asus.com/motherboards-components/motherboards/workstation/pro-ws-wrx90e-sage-se/techspec/

Anonymous
04/05/26(Sun)11:02:00 No.108533060

Anonymous 04/05/26(Sun)11:02:00 No.108533060

>>108533041
Funny thing i was just about to test Gemma 4 with it to see how well it handled code. What do people use for coding assistants?

Anonymous
04/05/26(Sun)11:02:30 No.108533062

Anonymous 04/05/26(Sun)11:02:30 No.108533062

>>108533055
It becomes visible after you click on (...). Surely you do. This feature is ancient.

Anonymous
04/05/26(Sun)11:02:32 No.108533064

Anonymous 04/05/26(Sun)11:02:32 No.108533064

>>108533041
they hit it with rl in the instruction tuning phase. pretrain is just next word prediction. its better if its not filtered.

Anonymous
04/05/26(Sun)11:02:38 No.108533068

Anonymous 04/05/26(Sun)11:02:38 No.108533068

>>108532988
Sigh. Too tired of arguing. You posted DDR5 UDIMMs. The WRX80 only supports DDR4 (registered or unbuffered). DDR5 Threadripper only support RDIMM, which is 2k a piece.

Anonymous
04/05/26(Sun)11:03:30 No.108533077

Anonymous 04/05/26(Sun)11:03:30 No.108533077

>>108533068
How about this one?
>>108533057

Anonymous
04/05/26(Sun)11:04:01 No.108533082

Anonymous 04/05/26(Sun)11:04:01 No.108533082

>>108533060
i mostly ask it to review my code.

Anonymous
04/05/26(Sun)11:04:25 No.108533085

Anonymous 04/05/26(Sun)11:04:25 No.108533085

Where are people getting their base model quants? I'm praying g4 is the first model that'll be smart and unslopped enough to be a cowriter like when NAI was good.

Anonymous
04/05/26(Sun)11:04:53 No.108533088

Anonymous 04/05/26(Sun)11:04:53 No.108533088

>>108533085
i made one by myself
it's easy

Anonymous
04/05/26(Sun)11:04:54 No.108533089

Anonymous 04/05/26(Sun)11:04:54 No.108533089

>>108533060
>What do people use for coding assistants?
Qwen if you're poor, Kimi if you're not.

Anonymous
04/05/26(Sun)11:05:12 No.108533090

Anonymous 04/05/26(Sun)11:05:12 No.108533090

>>108533068
>>108533077 (Me)
I'm retarded, you were right

Anonymous
04/05/26(Sun)11:06:06 No.108533094

Anonymous 04/05/26(Sun)11:06:06 No.108533094

>>108533088
Is there any danger of misconfiguring or is it retardproof? I am a retard and on a mac (doubly retarded).

Anonymous
04/05/26(Sun)11:06:09 No.108533095

Anonymous 04/05/26(Sun)11:06:09 No.108533095

>>108533053
isn't this literally some guy's arbitrary personal benchmark

Anonymous
04/05/26(Sun)11:06:27 No.108533099

Anonymous 04/05/26(Sun)11:06:27 No.108533099

>>108533077
Only DDR5 RDIMM supported, at 2k€ per 64 GB. So the RAM alone costs more than the 512 GB Mac Studio from before times, which was my only point.

Anonymous
04/05/26(Sun)11:08:00 No.108533112

Anonymous 04/05/26(Sun)11:08:00 No.108533112

>>108533088
do you have to do anything special when it comes to the vision part? does it just spit out the mmproj automatically?

Anonymous
04/05/26(Sun)11:08:58 No.108533118

Anonymous 04/05/26(Sun)11:08:58 No.108533118

>>108533112
you have to just specify --mmproj and you need to run quant run twice one with the arg and one without

Anonymous
04/05/26(Sun)11:10:34 No.108533129

Anonymous 04/05/26(Sun)11:10:34 No.108533129

>>108533094
it basically is retardproof if you only follow the official docs
https://github.com/ggml-org/llama.cpp/blob/master/tools/quantize/README.md

Anonymous
04/05/26(Sun)11:10:39 No.108533131

Anonymous 04/05/26(Sun)11:10:39 No.108533131

>>108533064
I know, but the screenshot is from he base model using a jinja template. I imagined they would have kept LLM chat logs out of the dataset. Apparently not.

Anonymous
04/05/26(Sun)11:10:52 No.108533132

Anonymous 04/05/26(Sun)11:10:52 No.108533132

>>108533062
oh, right

Anonymous
04/05/26(Sun)11:11:00 No.108533133

Anonymous 04/05/26(Sun)11:11:00 No.108533133

>>108533118
thanks, I'm still downloading the dangeroustensors, at least now I'll know what to expect.

Anonymous
04/05/26(Sun)11:11:25 No.108533136

Anonymous 04/05/26(Sun)11:11:25 No.108533136

>>108533099
>2k€ per 64 GB

Shit what the fuck? We're gonna be priced out of computers

Anonymous
04/05/26(Sun)11:11:39 No.108533138

Anonymous 04/05/26(Sun)11:11:39 No.108533138

>>108533085
https://huggingface.co/SporkySporkness/gemma-4-31B-GGUF

Anonymous
04/05/26(Sun)11:11:57 No.108533140

Anonymous 04/05/26(Sun)11:11:57 No.108533140

With MoE models, why don't you see more larger active weights relative to the total size. Something like 27b9a in stead of 27b3a.

Anonymous
04/05/26(Sun)11:12:04 No.108533141

Anonymous 04/05/26(Sun)11:12:04 No.108533141

File: 1760231684152185.jpg (22 KB, 646x642)

22 KB JPG

Guys I have a question, I noticed Gemma 4 2b does not output thinking (even in llama-server it says thinking = 0), but if I add <|think|> as the system prompt then it thinks just fine, just isn't formatted by llama correctly. Is this a problem with the chat_template.jinja being loaded by llama or the one "baked" into the model (if there is one?). Is this something I need to fix before converting to gguf or can I override it without re-converting? Where should I get one with thinking from?

Anonymous
04/05/26(Sun)11:13:36 No.108533151

Anonymous 04/05/26(Sun)11:13:36 No.108533151

Koboldbros, what settings do I use for Gemma?

Anonymous
04/05/26(Sun)11:13:38 No.108533152

Anonymous 04/05/26(Sun)11:13:38 No.108533152

>>108533138
Thank you, king.

Anonymous
04/05/26(Sun)11:13:54 No.108533156

Anonymous 04/05/26(Sun)11:13:54 No.108533156

>>108533136
welcome from your coma sir

Anonymous
04/05/26(Sun)11:15:07 No.108533165

Anonymous 04/05/26(Sun)11:15:07 No.108533165

>>108533140
they might have done a study and found the most efficient ratios. or they just picked a number at random.

Anonymous
04/05/26(Sun)11:15:08 No.108533166

Anonymous 04/05/26(Sun)11:15:08 No.108533166

>>108533140
It defeats the purpose of having a relatively good but fast model.

Anonymous
04/05/26(Sun)11:16:24 No.108533175

Anonymous 04/05/26(Sun)11:16:24 No.108533175

>>108533141
maybe you need to enable reasoning somehow?

Anonymous
04/05/26(Sun)11:17:59 No.108533185

Anonymous 04/05/26(Sun)11:17:59 No.108533185

>>108533156
Yeah thanks. I've been hearing of it but never looked up exact figures. I regret not building a computer sooner. No end in sight??

Anonymous
04/05/26(Sun)11:18:49 No.108533188

Anonymous 04/05/26(Sun)11:18:49 No.108533188

>>108533141
>it says thinking = 0
Seems like you sent the disabled reasoning flag to llama-server somehow.
Maybe try launching with --reasoning on and see if that does the trick.

Anonymous
04/05/26(Sun)11:19:03 No.108533191

Anonymous 04/05/26(Sun)11:19:03 No.108533191

>>108532774
Retard here. What do I do with this?

Anonymous
04/05/26(Sun)11:19:51 No.108533195

Anonymous 04/05/26(Sun)11:19:51 No.108533195

>>108533141
Read that gemma 4 google doc.

Anonymous
04/05/26(Sun)11:20:15 No.108533197

Anonymous 04/05/26(Sun)11:20:15 No.108533197

File: 1773107637223447.png (80 KB, 943x796)

80 KB PNG

>>108533175
>>108533188
I passed the reasoning on though
llama-server-m ".\models\gemma-4-E2B-it-500-step-3072-test-Q8_0.gguf" --host 127.0.0.1 --port 8033 --jinja --fit on --ctx-size 66560 --parallel 1 --reasoning on
Pic related is the template it shows on the terminal, is it the same as 4B?

Anonymous
04/05/26(Sun)11:22:06 No.108533206

Anonymous 04/05/26(Sun)11:22:06 No.108533206

>>108533197
That's a lot smaller than the official template.
>https://huggingface.co/google/gemma-4-E2B-it/blob/main/chat_template.jinja

Anonymous
04/05/26(Sun)11:23:08 No.108533212

Anonymous 04/05/26(Sun)11:23:08 No.108533212

>>108533060
I just used it with Hermes to overhaul my run-llama-server.sh script and make it interactive and aware of the models in my models dir so I don't have to keep modifying it every time I want to test new models. It's not a difficult task, but it one-shotted it fine.
It feels more reliable than Qwen 3.5, but I'll have to test it more.

Anonymous
04/05/26(Sun)11:25:05 No.108533228

Anonymous 04/05/26(Sun)11:25:05 No.108533228

>>108533191
You can give it to llama.cpp's --chat-template-file to force a model to follow a particular format. In this case it's the format for the Gemma 4 instruct finetune. You use this to make the base model behave like a chat model so that it works with Silly Tavern's chat completion mode.

Anonymous
04/05/26(Sun)11:26:35 No.108533236

Anonymous 04/05/26(Sun)11:26:35 No.108533236

>>108533228
Can I use it with kobold?

Anonymous
04/05/26(Sun)11:28:06 No.108533244

Anonymous 04/05/26(Sun)11:28:06 No.108533244

Make sure to cram in as much local coding as you can over the next two weeks, because when Spud drops it's going to raise the bar so much it will feel worthless to even try.

Anonymous
04/05/26(Sun)11:29:19 No.108533255

Anonymous 04/05/26(Sun)11:29:19 No.108533255

>>108533244
Uncs are getting cooked no cap, straight bussin.

Anonymous
04/05/26(Sun)11:29:29 No.108533257

Anonymous 04/05/26(Sun)11:29:29 No.108533257

Gemini 4 will be near-AGI and will save local by extension

Anonymous
04/05/26(Sun)11:29:40 No.108533258

Anonymous 04/05/26(Sun)11:29:40 No.108533258

>>108532864
No. Might have to do some configing or other stuff though though

Anonymous
04/05/26(Sun)11:29:49 No.108533260

Anonymous 04/05/26(Sun)11:29:49 No.108533260

>>108533236
I think so, yes, since kobold is a fork of llama.cpp. But whether it works depends on the developer having implemented the llama.cpp changes that make it gemma 4 compatible. If there are not in yet, it will not be long now.

Anonymous
04/05/26(Sun)11:30:03 No.108533264

Anonymous 04/05/26(Sun)11:30:03 No.108533264

vision benchmark: gemma 31B_0 q8_0 > gemma 26B q8_0 > gemma 31B q4_k_m

Anonymous
04/05/26(Sun)11:32:16 No.108533277

Anonymous 04/05/26(Sun)11:32:16 No.108533277

>>108533228
Wait, are the non-it ones retarded? I tried tuning the non-it and it produced garbage meanwhile the it version worked

Anonymous
04/05/26(Sun)11:33:18 No.108533290

Anonymous 04/05/26(Sun)11:33:18 No.108533290

>>108533264
ty can we get some arbitrary number scores or even a chart we can repost for the next several months

Anonymous
04/05/26(Sun)11:33:21 No.108533291

Anonymous 04/05/26(Sun)11:33:21 No.108533291

Guys I'm running Gemma 31B with 120k context on 24 GB. I set my kv caches to q4_0 precision to achieve this marvelous feat. How bad is this going to be?

Anonymous
04/05/26(Sun)11:36:22 No.108533306

Anonymous 04/05/26(Sun)11:36:22 No.108533306

>>108533260
Kobold got updated.

Anonymous
04/05/26(Sun)11:38:20 No.108533314

Anonymous 04/05/26(Sun)11:38:20 No.108533314

>>108533306
still no good on the official release channel

>>108524765
>>108524765

Anonymous
04/05/26(Sun)11:40:22 No.108533327

Anonymous 04/05/26(Sun)11:40:22 No.108533327

File: 1758216735696374.png (8 KB, 835x195)

8 KB PNG

>>108533206
Nice, I swapped the chat_template.jinja for the 31B one >>108532774 and also changed tokenizer_config.json's chat_template, re-converted to gguf. Now thinking works and is formatted properly (and probably removed from context correctly).

Anonymous
04/05/26(Sun)11:43:35 No.108533338

Anonymous 04/05/26(Sun)11:43:35 No.108533338

File: file.png (137 KB, 866x506)

137 KB PNG

I could not get Qwen nor nemotron to behave properly. But Gemma is alright.

Anonymous
04/05/26(Sun)11:47:13 No.108533357

Anonymous 04/05/26(Sun)11:47:13 No.108533357

>>108532854
>>108532931
I thought turboquant made ram affordable again? Did everyone already caught wind that the impact of tq is largly overblown?

Anonymous
04/05/26(Sun)11:48:35 No.108533364

Anonymous 04/05/26(Sun)11:48:35 No.108533364

>>108533277
You need to wrangle it: >>108532995

Base models are better because they don't have a demonic personality tulpa imprinted in them. Also they sound more natural and varied (same reason).

Anonymous
04/05/26(Sun)11:49:29 No.108533369

Anonymous 04/05/26(Sun)11:49:29 No.108533369

>>108533364
No I mean it was legitimately broken after I finetuned, unlike the it one

Anonymous
04/05/26(Sun)11:51:27 No.108533376

Anonymous 04/05/26(Sun)11:51:27 No.108533376

If turboquant is a thing, why did Google not use it on Gemma?

Anonymous
04/05/26(Sun)11:52:46 No.108533390

Anonymous 04/05/26(Sun)11:52:46 No.108533390

>>108533376
thats...................... not how it works sister

Anonymous
04/05/26(Sun)11:53:54 No.108533398

Anonymous 04/05/26(Sun)11:53:54 No.108533398

>>108533369
Ah sorry I misread your post.
If you're going to apply an instruct finetune, you should rely on a model that has been already finetuned for instruct, otherwise your own finetune won't be "strong enough" to condition the weights. Sorry if my way of explaining it is retarded. I don't know the jargon.

Anonymous
04/05/26(Sun)11:54:23 No.108533400

Anonymous 04/05/26(Sun)11:54:23 No.108533400

>>108533376
Same reason Google didn't use Titans on Gemma and Microsoft didn't use BitNet on Phi.

Anonymous
04/05/26(Sun)11:56:42 No.108533415

Anonymous 04/05/26(Sun)11:56:42 No.108533415

File: 1774786266017976.gif (2.33 MB, 600x594)

2.33 MB GIF

>No reported ego deaths so far
This is all I need to know about Gemma 4.

Anonymous
04/05/26(Sun)11:57:04 No.108533417

Anonymous 04/05/26(Sun)11:57:04 No.108533417

>see last thread hit 800 replies
>what's going on, did deepseek4 finally come out?
>no, just regular autism

Anonymous
04/05/26(Sun)11:57:40 No.108533420

Anonymous 04/05/26(Sun)11:57:40 No.108533420

>>108532524
Is this achievable natty? What is this body type called?

Anonymous
04/05/26(Sun)11:58:01 No.108533423

Anonymous 04/05/26(Sun)11:58:01 No.108533423

>>108533415
Don't summon him.

Anonymous
04/05/26(Sun)11:58:18 No.108533424

Anonymous 04/05/26(Sun)11:58:18 No.108533424

>>108533423
it is him

Anonymous
04/05/26(Sun)12:00:02 No.108533432

Anonymous 04/05/26(Sun)12:00:02 No.108533432

>>108533424
Talking about his "condition" as if it'd happen to any other retard? I don't think so. He's been busy in the vibecoding general.

Anonymous
04/05/26(Sun)12:00:05 No.108533434

Anonymous 04/05/26(Sun)12:00:05 No.108533434

>>108533398
You can condition the weights of a base model with LoRA finetuning just fine; it's just that the model will be most likely retarded, because you don't have the resources for curating and training the model on millions of good SFT / RLHF / RL samples that Google has.

Anonymous
04/05/26(Sun)12:00:46 No.108533438

Anonymous 04/05/26(Sun)12:00:46 No.108533438

>>108533417
766 and it's due to the world's biggest indian company releasing the best vramlet model since nemo

Anonymous
04/05/26(Sun)12:00:51 No.108533440

Anonymous 04/05/26(Sun)12:00:51 No.108533440

>llamacpp still 500s on large context changes

Anonymous
04/05/26(Sun)12:02:07 No.108533454

Anonymous 04/05/26(Sun)12:02:07 No.108533454

This sounds like hyperbole but I genuinely regained my expectation for reaching AGI through scaling LLMs from Gemma 4. If a 31B model can be THIS intelligent, then for sure we can have AGI somewhere in the 10 trillion parameter range in just a couple of years time.

Anonymous
04/05/26(Sun)12:02:08 No.108533456

Anonymous 04/05/26(Sun)12:02:08 No.108533456

>>108533440
Errors in log?

Anonymous
04/05/26(Sun)12:02:23 No.108533460

Anonymous 04/05/26(Sun)12:02:23 No.108533460

>>108533434
At least it's a good sign that it's a true base model and hasn't been "bootstrapped" with instruct data.

Anonymous
04/05/26(Sun)12:03:48 No.108533467

Anonymous 04/05/26(Sun)12:03:48 No.108533467

sirs how is the gemmers?
anything I need to know from the last 4 threads?

Anonymous
04/05/26(Sun)12:05:31 No.108533475

Anonymous 04/05/26(Sun)12:05:31 No.108533475

>>108533467
we bac, sonnet at home, super super sensitive to chat templating, might feel a bit fried

Anonymous
04/05/26(Sun)12:05:43 No.108533477

Anonymous 04/05/26(Sun)12:05:43 No.108533477

>>108533467
india won

Anonymous
04/05/26(Sun)12:06:11 No.108533478

Anonymous 04/05/26(Sun)12:06:11 No.108533478

>>108533454
>inb4 gemini 3.5 is just 70B and the big companies have been sitting on revolutionary training/inference innovations they refuse to make public

Anonymous
04/05/26(Sun)12:06:48 No.108533482

Anonymous 04/05/26(Sun)12:06:48 No.108533482

>>108533475
>>108533477
is all the hype for the 31b or is the 26b moe usable???

Anonymous
04/05/26(Sun)12:06:54 No.108533483

Anonymous 04/05/26(Sun)12:06:54 No.108533483

>>108533467
half the people or one dedicated anon claim its the greatest model ever the other half have a variety of complaints.

Anonymous
04/05/26(Sun)12:07:51 No.108533490

Anonymous 04/05/26(Sun)12:07:51 No.108533490

>>108533454
Reddit is that way.

Anonymous
04/05/26(Sun)12:09:15 No.108533498

Anonymous 04/05/26(Sun)12:09:15 No.108533498

>>108533482
both are pretty good, but people love to overhype the fuck out of it for some reason so temper your expectations. if you're used to nemo then the moe is a great upgrade

Anonymous
04/05/26(Sun)12:11:03 No.108533512

Anonymous 04/05/26(Sun)12:11:03 No.108533512

This sounds like hyperbole but I genuinely regained my expectation for reaching [BUZZWORD] through scaling LLMs from [MODEL]. If a [N]B model can be THIS intelligent, then for sure we can have [BUZZWORD] somewhere in the [LARGE_N] parameter range in just a couple of [UNIT] time.

Anonymous
04/05/26(Sun)12:11:41 No.108533518

Anonymous 04/05/26(Sun)12:11:41 No.108533518

>>108532995
>0.3 for Top-P
This seems very low. Basically makes it only have 1 candidate for each generated token the vast majority of time which kills variety. Something in the ballpark of 0.5-0.6 seemed fine to me. Instruct on the other hand can be like 0.95 since it's overcooked like is typical for instruct tunes to get rid of the hallucinations.

Anonymous
04/05/26(Sun)12:12:09 No.108533521

Anonymous 04/05/26(Sun)12:12:09 No.108533521

>>108533482
am using 26b as a nemo/ms small replacement, is good shit

Anonymous
04/05/26(Sun)12:12:32 No.108533525

Anonymous 04/05/26(Sun)12:12:32 No.108533525

>>108533521
how sore is ur dick

Anonymous
04/05/26(Sun)12:13:05 No.108533527

Anonymous 04/05/26(Sun)12:13:05 No.108533527

>>108533521
also does it do cunny rape?

Anonymous
04/05/26(Sun)12:13:19 No.108533531

Anonymous 04/05/26(Sun)12:13:19 No.108533531

>>108533398
Here's the weird thing though, my dataset is multi-turn conversations, not instructional and the instructional one did just fine while the "normal" one broke

Anonymous
04/05/26(Sun)12:13:53 No.108533534

Anonymous 04/05/26(Sun)12:13:53 No.108533534

>>108533525
quite
>>108533527
ye

Anonymous
04/05/26(Sun)12:16:58 No.108533557

Anonymous 04/05/26(Sun)12:16:58 No.108533557

>>108533456
>srv operator(): http client error: Failed to read connection
>srv log_server_r: done request: POST /v1/chat/completions 192.168.0.13 500
>srv proxy_reques: proxying request to model google_gemma-4-31B-it-IQ4_XS on port 45423
>srv operator(): http client error: Could not establish connection
>srv log_server_r: done request: POST /v1/chat/completions 192.168.0.13 500
This is all I get. does it have a more verbose log to file or I'll have to increase the log level to catch it?

Anonymous
04/05/26(Sun)12:18:38 No.108533568

Anonymous 04/05/26(Sun)12:18:38 No.108533568

Turboquants when?
Genuinely surprised its taking llama this long

Anonymous
04/05/26(Sun)12:18:44 No.108533569

Anonymous 04/05/26(Sun)12:18:44 No.108533569

I tried 26B in opencode and it's unfortunately not very good. I think the CoT might be broken with it still. 31B has no problem calling tools then thinking again, but as soon as 26B calls it tool it's forced to respond.

Anonymous
04/05/26(Sun)12:19:09 No.108533575

Anonymous 04/05/26(Sun)12:19:09 No.108533575

>>108533454
when someone figures out a way to make the models not degrade with long conversations is when i start believing
and even if they make linear scaling context work well thats not solving the fundamental issue

Anonymous
04/05/26(Sun)12:19:14 No.108533576

Anonymous 04/05/26(Sun)12:19:14 No.108533576

>>108533568
Gemma4 doesn't benefit from TQ anyways

Anonymous
04/05/26(Sun)12:19:29 No.108533578

Anonymous 04/05/26(Sun)12:19:29 No.108533578

>>108533568
qrd?

Anonymous
04/05/26(Sun)12:19:37 No.108533579

Anonymous 04/05/26(Sun)12:19:37 No.108533579

>>108533569
>opencode
yeah they're not the best at that but imo it's refreshing as everything is else muh agent code slop n,owadays

Anonymous
04/05/26(Sun)12:19:51 No.108533581

Anonymous 04/05/26(Sun)12:19:51 No.108533581

Gemma so good it got me writing model cards again.

Anonymous
04/05/26(Sun)12:20:13 No.108533584

Anonymous 04/05/26(Sun)12:20:13 No.108533584

>>108533578
At least from what I've seen in the pull requests, lots of competing implementations, all with their own slight quirks to them

Anonymous
04/05/26(Sun)12:20:51 No.108533586

Anonymous 04/05/26(Sun)12:20:51 No.108533586

Imagine what a dense modern 70b model could do if Gemma 4 31b is this good.

Anonymous
04/05/26(Sun)12:21:55 No.108533593

Anonymous 04/05/26(Sun)12:21:55 No.108533593

>>108533586
Yeah with Gemma being so good I can't imagine what kind of shit google is cooking for gemini.

Anonymous
04/05/26(Sun)12:21:59 No.108533594

Anonymous 04/05/26(Sun)12:21:59 No.108533594

>>108533557
Are you using it in router mode? Can you try without? I assume you're on the latest version with the regex fix and the dedicated parser, right?

Anonymous
04/05/26(Sun)12:22:03 No.108533596

Anonymous 04/05/26(Sun)12:22:03 No.108533596

>managed to fix the gptsovits onnx inference for my gtx 1650 4GB so it runs at ~0.5 rtf while eating 3GB
At least it's usable now

Anonymous
04/05/26(Sun)12:22:30 No.108533599

Anonymous 04/05/26(Sun)12:22:30 No.108533599

Gemma 4 31b is clearly good for some reason it keeps repeating sentences and doing things like inserting random 'L's and going into a loop of 'la la la'. What's happening and how do I fix it?

Anonymous
04/05/26(Sun)12:22:52 No.108533602

Anonymous 04/05/26(Sun)12:22:52 No.108533602

Which search engine to use for llm web searches? I tried to set it up in openwebui but looks like I always need an API key, is there a go-to service for local occasional use?

Anonymous
04/05/26(Sun)12:23:10 No.108533605

Anonymous 04/05/26(Sun)12:23:10 No.108533605

I wonder how big is the Gemini model... Some kind of 1,000B MoE?

Anonymous
04/05/26(Sun)12:23:13 No.108533606

Anonymous 04/05/26(Sun)12:23:13 No.108533606

File: pureslop.png (27 KB, 754x192)

27 KB PNG

>>108533584
>all with their own slight quirks to them
Yes

Anonymous
04/05/26(Sun)12:23:27 No.108533607

Anonymous 04/05/26(Sun)12:23:27 No.108533607

>>108533568
turboquant is a journalist fueled mental illness slash hysteria
the paper was published a year ago:
https://arxiv.org/abs/2504.19874
nobody cared until someone published a blog

Anonymous
04/05/26(Sun)12:23:54 No.108533609

Anonymous 04/05/26(Sun)12:23:54 No.108533609

>>108533599
I have the same issue, swapping solves it.

Anonymous
04/05/26(Sun)12:24:03 No.108533612

Anonymous 04/05/26(Sun)12:24:03 No.108533612

>>108533599
use chat comp on an updated llama.cpp with quants made after the first batch of fixes

Anonymous
04/05/26(Sun)12:24:24 No.108533613

Anonymous 04/05/26(Sun)12:24:24 No.108533613

>>108533599
It's a happy model. Let it sing.

Anonymous
04/05/26(Sun)12:24:43 No.108533617

Anonymous 04/05/26(Sun)12:24:43 No.108533617

>>108533594
>router mode
Yes
>latest
Yes

Not using router mode would be very annoying.

Anonymous
04/05/26(Sun)12:25:38 No.108533625

Anonymous 04/05/26(Sun)12:25:38 No.108533625

>>108533617
Remove variables.

Anonymous
04/05/26(Sun)12:26:10 No.108533628

Anonymous 04/05/26(Sun)12:26:10 No.108533628

>>108533607
>nobody cared until someone published a blog
That someone is Google.
https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

>March 24, 2026

Anonymous
04/05/26(Sun)12:26:18 No.108533629

Anonymous 04/05/26(Sun)12:26:18 No.108533629

>>108533151
I'm using the official stuff as a baseline for now (31B it) on sillytavern :
temperature=1.0
top_p=0.95
top_k=64

It works but it still randomly loops. I didn't try rp with it.

Anonymous
04/05/26(Sun)12:26:59 No.108533632

Anonymous 04/05/26(Sun)12:26:59 No.108533632

>>108533417
If Gemma too had a schizo waifufag who'd gen her as a cute 1girl every other day, you wouldn't be so negative.

Anonymous
04/05/26(Sun)12:27:32 No.108533634

Anonymous 04/05/26(Sun)12:27:32 No.108533634

>>108533612
This only has one file upload commit
>huggingface.co/ggml-org/gemma-4-26B-A4B-it-GGUF
is it fixed?

Anonymous
04/05/26(Sun)12:28:02 No.108533637

Anonymous 04/05/26(Sun)12:28:02 No.108533637

>>108533586
I know it's unlikely, but I hope the >100B they mentioned will be dense.

Anonymous
04/05/26(Sun)12:29:09 No.108533644

Anonymous 04/05/26(Sun)12:29:09 No.108533644

>>108533634
just get the bart ones

Anonymous
04/05/26(Sun)12:29:25 No.108533649

Anonymous 04/05/26(Sun)12:29:25 No.108533649

>>108533602
searxng

Anonymous
04/05/26(Sun)12:29:28 No.108533650

Anonymous 04/05/26(Sun)12:29:28 No.108533650

>>108533637
Keep dreaming.

Anonymous
04/05/26(Sun)12:32:07 No.108533660

Anonymous 04/05/26(Sun)12:32:07 No.108533660

>>108533634
they 100% dont have imatrix shit to them so never needed the fix

Anonymous
04/05/26(Sun)12:32:34 No.108533664

Anonymous 04/05/26(Sun)12:32:34 No.108533664

File: 138763867_p0_master1200.jpg (750 KB, 938x1200)

750 KB JPG

►Recent Highlights from the Previous Thread: >>108528880

(2/2)

--Testing extreme system prompt adherence and instruction following:
>108531461 >108531485 >108531491 >108531495 >108531504 >108532037 >108531523
--Bypassing Gemma-4 guardrails for explicit image captioning and tagging:
>108531668 >108531680 >108531693 >108531755 >108531773 >108531794 >108531809 >108531811 >108531823 >108531860 >108531815 >108531824 >108532197
--Discussing utility of erotic image descriptions and Gemma 4's 4chan persona emulation:
>108531053 >108531222 >108531237 >108531246 >108531262 >108531273 >108531391
--Critiquing AI-generated code quality and bugs in llama.cpp:
>108530874 >108530881 >108530902 >108530974 >108530999 >108531016 >108530969
--Gemma 4 31b base model sampling settings for story writing:
>108531579 >108531594 >108531606 >108531757
--Sharing llama.cpp args for Gemma-4-31B for 24GB VRAM:
>108529133 >108529202 >108529922 >108529933 >108531725 >108531743 >108531805 >108531780 >108531887
--Discussing experiences and effectiveness of speculative decoding:
>108528926 >108528945 >108528958
--Experimenting with Gemma 4's adaptive thought efficiency:
>108528979 >108529020 >108529027 >108529177
--Gemma 4 31B demonstrating image recognition capabilities:
>108529063 >108529073 >108529094 >108529098
--Testing Gemma 4's refusal triggers regarding death and racism:
>108531670 >108531681 >108531685 >108531688
--Nvidia's claims regarding massive increases in token throughput:
>108529284 >108529327 >108531013 >108531035 >108531058
--Comparing roleplay responses and optimizing llama.cpp GPU offloading:
>108531404 >108531425 >108531428 >108531600 >108531612 >108531642 >108531666 >108531699 >108531586
--Comparing Gemma 4 MoE and dense models with sampler optimization tips:
>108529784 >108529796 >108529805 >108530602 >108530679
--Miku (free space):
>108530831

►Recent Highlight Posts from the Previous Thread: >>108528883

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/05/26(Sun)12:32:53 No.108533666

Anonymous 04/05/26(Sun)12:32:53 No.108533666

>>108533417
because vramlets like me can actually run it kek
i keep falling for deespeek v4 baits even though i know for sure i probably cant run it
more people can run=more people can talk something relevant, simple as

Anonymous
04/05/26(Sun)12:35:47 No.108533677

Anonymous 04/05/26(Sun)12:35:47 No.108533677

>>108533338
Mousepad is so irritatingly bad it's hilarious. If you copy text and close the software, copied text vanishes from the clipboard. Jesus fucking Christ however thought about that should rethink their programming career.

Anonymous
04/05/26(Sun)12:36:06 No.108533679

Anonymous 04/05/26(Sun)12:36:06 No.108533679

>>108533637
It was supposed to be a 124B MoE model.
https://archive.li/5vxUY
(the post was altered to remove the 124B mention)

Anonymous
04/05/26(Sun)12:37:01 No.108533684

Anonymous 04/05/26(Sun)12:37:01 No.108533684

>>108533632
It'll happen, don't worry.

Anonymous
04/05/26(Sun)12:37:46 No.108533688

Anonymous 04/05/26(Sun)12:37:46 No.108533688

>>108533637
The only thing dense is you

Anonymous
04/05/26(Sun)12:37:58 No.108533692

Anonymous 04/05/26(Sun)12:37:58 No.108533692

>>108533684
gemma sir

Anonymous
04/05/26(Sun)12:38:28 No.108533694

Anonymous 04/05/26(Sun)12:38:28 No.108533694

Gemma 4 moe has been great so far. The only issue I've been having is some endless repetition on a tiered extraction workflow I use to test these models.
Funnily enough, haven't seen that with e4b yet.

Anonymous
04/05/26(Sun)12:38:39 No.108533696

Anonymous 04/05/26(Sun)12:38:39 No.108533696

This bug sounds bad:
https://github.com/ggml-org/llama.cpp/issues/21441
> F16 KV cache produces degraded accuracy when --ctx-size is set below the model's native context length, even though F16 is lossless and the actual prompt length is well within both windows.

Anonymous
04/05/26(Sun)12:38:46 No.108533698

Anonymous 04/05/26(Sun)12:38:46 No.108533698

File: 86454223.png (53 KB, 1080x571)

53 KB PNG

>>108533637
might as well release pro

Anonymous
04/05/26(Sun)12:39:04 No.108533704

Anonymous 04/05/26(Sun)12:39:04 No.108533704

>>108533629
what about fast forwarding, context shift, etc?

Anonymous
04/05/26(Sun)12:39:24 No.108533705

Anonymous 04/05/26(Sun)12:39:24 No.108533705

File: 1748471467469356.jpg (38 KB, 287x433)

38 KB JPG

I find it incredible that gemma gives me less refusals than chinese models for nsfw descriptions (image and text), it only needs a bit of jb/prefill, meanwhile the same on qwen gives me a dozen "but wait, this is actually PIXEL SEX EW".
That brings the question of what the fuck are chinese devs doing to their models to make them this insanely safety obsessed.

Anonymous
04/05/26(Sun)12:39:29 No.108533706

Anonymous 04/05/26(Sun)12:39:29 No.108533706

>>108533679
>(the post was altered to remove the 124B mention)
I genuinely believe they made it and found it too good. It's very well possible for a model of that size range to compete with Gemini Flash (not pro, in case autists misinterpret: Flash) and google is not in the business of competing with themselves.

Anonymous
04/05/26(Sun)12:39:33 No.108533707

Anonymous 04/05/26(Sun)12:39:33 No.108533707

>>108533698
>cheatingarena
lol

Anonymous
04/05/26(Sun)12:39:56 No.108533711

Anonymous 04/05/26(Sun)12:39:56 No.108533711

>>108533696
Dafuq. How?

Anonymous
04/05/26(Sun)12:40:00 No.108533712

Anonymous 04/05/26(Sun)12:40:00 No.108533712

>>108533684
I can picture her now. Schizophrenic nympho wearing traditional Indian clothing in Google's four colors.

Anonymous
04/05/26(Sun)12:40:20 No.108533715

Anonymous 04/05/26(Sun)12:40:20 No.108533715

>>108533706
that or the opposite, shittier or the same as the dense 31B

Anonymous
04/05/26(Sun)12:40:29 No.108533716

Anonymous 04/05/26(Sun)12:40:29 No.108533716

>>108533711
piotr'd

Anonymous
04/05/26(Sun)12:41:30 No.108533718

Anonymous 04/05/26(Sun)12:41:30 No.108533718

File: don't be le evil.jpg (78 KB, 490x367)

78 KB JPG

Is Google the good guys now?
They do a lot of "evil," but also a lot of "good." How do you process that?

Anonymous
04/05/26(Sun)12:41:31 No.108533719

Anonymous 04/05/26(Sun)12:41:31 No.108533719

>>108533705
distilling from gemini what else

Anonymous
04/05/26(Sun)12:41:33 No.108533720

Anonymous 04/05/26(Sun)12:41:33 No.108533720

>>108533705
>That brings the question of what the fuck are chinese devs doing to their models to make them this insanely safety obsessed.
some people here seem to forget what China is like
porn is illegal in China
https://en.wikipedia.org/wiki/Pornography_in_China
>In 2025, multiple outlets reported arrests linked to online erotica communities
you can literally be arrested for WRITING erotica
it never made any sense for a chinese model to be anything but safety maxxed, the stakes are high for people who live there.

Anonymous
04/05/26(Sun)12:41:37 No.108533721

Anonymous 04/05/26(Sun)12:41:37 No.108533721

>>108533696
Holy fuck, this would actually be a huge deal.

Anonymous
04/05/26(Sun)12:42:04 No.108533724

Anonymous 04/05/26(Sun)12:42:04 No.108533724

File: file.png (412 KB, 640x441)

412 KB PNG

>>108533696

Anonymous
04/05/26(Sun)12:42:05 No.108533725

Anonymous 04/05/26(Sun)12:42:05 No.108533725

>>108533696
sounds like a potential for a free upgrade.

Anonymous
04/05/26(Sun)12:43:24 No.108533733

Anonymous 04/05/26(Sun)12:43:24 No.108533733

>>108533711
It's a cumulativ effect, maybe rounding errors or something related to memory allocation. Further it goes further it degrades.
It is buggy code that's for sure.

Anonymous
04/05/26(Sun)12:43:39 No.108533734

Anonymous 04/05/26(Sun)12:43:39 No.108533734

>>108533725
depends on how long this has been a thing, maybe it was introduced recently when they fucked with the kv code

Anonymous
04/05/26(Sun)12:44:00 No.108533735

Anonymous 04/05/26(Sun)12:44:00 No.108533735

>>108533718
i dont care, good is whatever's good for me right now

Anonymous
04/05/26(Sun)12:44:04 No.108533736

Anonymous 04/05/26(Sun)12:44:04 No.108533736

>>108533696
2
m
w

Anonymous
04/05/26(Sun)12:44:25 No.108533739

Anonymous 04/05/26(Sun)12:44:25 No.108533739

>>108533696
>30% accuracy when limit set past max CTX
>85% when set to half
>100% when set to the max
So a massive intelligence upgrade coming soon? Does this apply to all models?

Anonymous
04/05/26(Sun)12:44:29 No.108533740

Anonymous 04/05/26(Sun)12:44:29 No.108533740

does gemma work with audio track in videos in llamacpp?

Anonymous
04/05/26(Sun)12:44:48 No.108533742

Anonymous 04/05/26(Sun)12:44:48 No.108533742

>>108533733
they should really add an unit test..

Anonymous
04/05/26(Sun)12:45:11 No.108533746

Anonymous 04/05/26(Sun)12:45:11 No.108533746

looks like another slop post

Anonymous
04/05/26(Sun)12:45:33 No.108533747

Anonymous 04/05/26(Sun)12:45:33 No.108533747

>>108533719
So they ramped up copying the model and got theirs to be even more puritan, well done retards.

>>108533720
I know, but no one gave a shit about safetyism in the models until very recently. LLM research is pretty much a protected sacred cow for the regime, no one would dare touch any of the scientists while it's national priority.

Anonymous
04/05/26(Sun)12:45:56 No.108533753

Anonymous 04/05/26(Sun)12:45:56 No.108533753

>>108533649
A local instance, I presume? Surely the public ones block api use too?

Anonymous
04/05/26(Sun)12:46:32 No.108533757

Anonymous 04/05/26(Sun)12:46:32 No.108533757

>>108533696
Don't worry, pidor is on the case.

Anonymous
04/05/26(Sun)12:46:55 No.108533760

Anonymous 04/05/26(Sun)12:46:55 No.108533760

>>108533753
local instance is dead easy to set up with docker anyways
the gain adding searxng-mcp is quite a lot, basically making it chatjeetpt at home

Anonymous
04/05/26(Sun)12:47:01 No.108533761

Anonymous 04/05/26(Sun)12:47:01 No.108533761

>>108533696
>tfw always ran models at max ctx
no free gains for me

Anonymous
04/05/26(Sun)12:47:21 No.108533762

Anonymous 04/05/26(Sun)12:47:21 No.108533762

>>108533753
>local instance
yes

Anonymous
04/05/26(Sun)12:47:35 No.108533764

Anonymous 04/05/26(Sun)12:47:35 No.108533764

>>108533739
Apparently it affects both Qwen 3 and Gemma 4, so presumably other architectures too.

Anonymous
04/05/26(Sun)12:47:48 No.108533766

Anonymous 04/05/26(Sun)12:47:48 No.108533766

It is genuinely impressive how well base is reproducing the input I've been throwing at it. If it weren't slightly retarded, you could swear the continuations are from the original text. My only gripe so far is it seems to stick to safe flowery bullshit like "sexual fluids" unless strongly pushed. Wish there was a way to finetune that out without lobotomizing the rest of it.

Anonymous
04/05/26(Sun)12:48:39 No.108533770

Anonymous 04/05/26(Sun)12:48:39 No.108533770

>drummer Gemma finetune incoming
KINO
K
I
N
O

Anonymous
04/05/26(Sun)12:48:54 No.108533774

Anonymous 04/05/26(Sun)12:48:54 No.108533774

>>108533766
i think that is only english
on other langs like japanese or korean it is just vile, pure vile

Anonymous
04/05/26(Sun)12:49:27 No.108533776

Anonymous 04/05/26(Sun)12:49:27 No.108533776

>>108533696
I don't trust these obvious slop issues.

Anonymous
04/05/26(Sun)12:49:51 No.108533778

Anonymous 04/05/26(Sun)12:49:51 No.108533778

>>108533696
what the hell

Anonymous
04/05/26(Sun)12:50:40 No.108533785

Anonymous 04/05/26(Sun)12:50:40 No.108533785

>>108533766
I think they filter the base model pretty aggressively against NSFW only to reintroduce some of the smut in the instruct version, ironically. Or at least it seemed that way with Gemma 3.

Anonymous
04/05/26(Sun)12:50:45 No.108533786

Anonymous 04/05/26(Sun)12:50:45 No.108533786

>>108533764
>Apparently it affects
apparently /lmg/ers believe random slop shitposters?
https://github.com/eullm/eullm
look at this guy's "project"
>EULLM Engine is ready to use. Download the binary, run it. No compilation, no setup, no Docker. Works on any GGUF model.
>Run sovereign LLMs locally with real llama.cpp inference, built-in audit trail, and full API compatibility. Single Rust binary, no Python runtime, no Docker required.
the mind of an insane son of a bitch

Anonymous
04/05/26(Sun)12:51:01 No.108533788

Anonymous 04/05/26(Sun)12:51:01 No.108533788

>>108533766
Does base just "workTM" when doing text completion in like mikupad?

Anonymous
04/05/26(Sun)12:51:52 No.108533790

Anonymous 04/05/26(Sun)12:51:52 No.108533790

>>108533786
>and independently verified on upstream llama-server.

Anonymous
04/05/26(Sun)12:52:21 No.108533797

Anonymous 04/05/26(Sun)12:52:21 No.108533797

>>108533770
Rocinante-Gemma4-Mix.

Anonymous
04/05/26(Sun)12:52:24 No.108533798

Anonymous 04/05/26(Sun)12:52:24 No.108533798

>>108533786
the prompt sounds simple enough to reproduce. hopefully we can confirm its a non-issue.

Anonymous
04/05/26(Sun)12:52:57 No.108533802

Anonymous 04/05/26(Sun)12:52:57 No.108533802

>>108533790
again, you're just taking the words of the mentally ill at face value? kill yourself

Anonymous
04/05/26(Sun)12:54:35 No.108533808

Anonymous 04/05/26(Sun)12:54:35 No.108533808

>>108533786
Let's see your github fucker

Anonymous
04/05/26(Sun)12:55:44 No.108533813

Anonymous 04/05/26(Sun)12:55:44 No.108533813

>>108533802
He ran a benchmark at different CTX lengths with greedy decoding and had results range from ~30% accuracy with mismatched context to 100% with matched context. So why should I believe a retard screeching on 4chan, demanding it's not true?

Anonymous
04/05/26(Sun)12:55:55 No.108533817

Anonymous 04/05/26(Sun)12:55:55 No.108533817

>>108533788
I think so? I'm just pasting in random text to koboldcpp rolling, prepending <bos>, and hitting generate. Handles 4chan threads, AO3, long greentext fics, various draft stories. Obviously need to remove chat formatting in settings but so far it's worked very well.

Anonymous
04/05/26(Sun)12:56:47 No.108533824

Anonymous 04/05/26(Sun)12:56:47 No.108533824

>>108533802
though upstream verification sounds logical and there is no real reason to dismiss it completely nor fabricate the claim of verification?
if it wasnt the case the person who filed the issue is a huge faggot
hell, let me verify it, brb

Anonymous
04/05/26(Sun)12:56:52 No.108533825

Anonymous 04/05/26(Sun)12:56:52 No.108533825

>>108533808
https://github.com/1aienthusiast/audiocraft-infinity-webui

Anonymous
04/05/26(Sun)12:57:08 No.108533828

Anonymous 04/05/26(Sun)12:57:08 No.108533828

>>108533813
He pasted text. You take the text as truth. I will wait for someone who is not having an episode of AI psychosis.

Anonymous
04/05/26(Sun)12:58:07 No.108533832

Anonymous 04/05/26(Sun)12:58:07 No.108533832

>>108533828
I see, so just more screaming and crying that he's lying. Got it.

Anonymous
04/05/26(Sun)12:58:17 No.108533834

Anonymous 04/05/26(Sun)12:58:17 No.108533834

>>108533802
>He asks, in his glass house full of black pots

Anonymous
04/05/26(Sun)12:58:50 No.108533840

Anonymous 04/05/26(Sun)12:58:50 No.108533840

>>108533828
No one is expecting any action from you anyway, anon.

Anonymous
04/05/26(Sun)12:58:56 No.108533842

Anonymous 04/05/26(Sun)12:58:56 No.108533842

>>108533824
>if it wasnt the case the person who filed the issue is a huge faggot
https://www.devclass.com/ai-ml/2025/11/27/ocaml-maintainers-reject-massive-ai-generated-pull-request/1728083
man some people seem to discover what github has become after random retards were given the power of generating infinite code and text

Anonymous
04/05/26(Sun)12:59:32 No.108533846

Anonymous 04/05/26(Sun)12:59:32 No.108533846

>>108533834
My house is mostly wood and my pots are green and grey retard

Anonymous
04/05/26(Sun)12:59:59 No.108533848

Anonymous 04/05/26(Sun)12:59:59 No.108533848

>>108533817
ty

Anonymous
04/05/26(Sun)13:00:31 No.108533850

Anonymous 04/05/26(Sun)13:00:31 No.108533850

https://github.com/eullm/eullm/commits/main/
this nigger has been non stop posting ai slop attempts at turboquant, this is 100% ai fueld psychosis in action another loser who can't code but believes he got super powers from LLM

Anonymous
04/05/26(Sun)13:00:56 No.108533852

Anonymous 04/05/26(Sun)13:00:56 No.108533852

gemma4 is a memory hog. I get 140k context with glm4.7 flash 30b3ba. but can only handle some 25k context with gemma4 26b4ba. wtf I thought swa was supposed to be more efficient not less.

Anonymous
04/05/26(Sun)13:00:59 No.108533853

Anonymous 04/05/26(Sun)13:00:59 No.108533853

I can safely delete other models now.

Anonymous
04/05/26(Sun)13:01:02 No.108533854

Anonymous 04/05/26(Sun)13:01:02 No.108533854

>>108533696
Why the fuck are you guys taking seriously a bug report that is obviously copy-pasted from some language model?

Anonymous
04/05/26(Sun)13:02:46 No.108533857

Anonymous 04/05/26(Sun)13:02:46 No.108533857

File: ai psychosis.png (46 KB, 1341x301)

46 KB PNG

>>108533854
maybe it's the ai psychosis guy himself posting his slop on /lmg/ and being defensive about it
look at this lmao this is 100% ai hallucination shit

Anonymous
04/05/26(Sun)13:03:01 No.108533859

Anonymous 04/05/26(Sun)13:03:01 No.108533859

>>108533850
if he is really trying to do turboquant, it seems likely at some point he would benchmark the native kv cache implementation. it seems like the kinda task that would discover such an issue.

Anonymous
04/05/26(Sun)13:04:04 No.108533862

Anonymous 04/05/26(Sun)13:04:04 No.108533862

>>108533859
he's one of the trillion twatter, ledditors and github spammers trying to massage a next token predictor into doing something too complex for them to handle.

Anonymous
04/05/26(Sun)13:04:27 No.108533866

Anonymous 04/05/26(Sun)13:04:27 No.108533866

>>108533850

It's like someone tripping on steroids or something. A creature suffering from delusions caused by its own cognitive enhancements. The integration was too much for him. He couldn't handle it.

Anonymous
04/05/26(Sun)13:06:06 No.108533869

Anonymous 04/05/26(Sun)13:06:06 No.108533869

>>108533862
granted. but come now, its not hard to run prompts and compare the scores, even an ai agent should be capable of doing it. do you just think if someone is a nocoder they can't possibly run software and compare the outputs with different launch parameters?

Anonymous
04/05/26(Sun)13:09:24 No.108533892

Anonymous 04/05/26(Sun)13:09:24 No.108533892

Wait, am I supposed to be launching kobold with --useswa for gemma 4?

Anonymous
04/05/26(Sun)13:12:31 No.108533903

Anonymous 04/05/26(Sun)13:12:31 No.108533903

>>108533892
>am I supposed to be launching kobold
no

Anonymous
04/05/26(Sun)13:14:05 No.108533918

Anonymous 04/05/26(Sun)13:14:05 No.108533918

Has anyone gotten a working Gemma 4 MLX with TurboQuant?

Anonymous
04/05/26(Sun)13:15:55 No.108533928

Anonymous 04/05/26(Sun)13:15:55 No.108533928

File: {EB054ED3-24F5-4FBD-8492-(...).png (51 KB, 1879x323)

51 KB PNG

Gemma has tendency to mistake comdom for toy.
Also 31b is stronger at following prompt than 24b which refuse alot

Anonymous
04/05/26(Sun)13:16:33 No.108533933

Anonymous 04/05/26(Sun)13:16:33 No.108533933

>>108533903
Can I get a non-transcoded answer?

Anonymous
04/05/26(Sun)13:16:59 No.108533941

Anonymous 04/05/26(Sun)13:16:59 No.108533941

>>108533892
That's what I'm doing, but I have no idea what I'm doing
It does work though

Anonymous
04/05/26(Sun)13:17:14 No.108533942

Anonymous 04/05/26(Sun)13:17:14 No.108533942

>>108533933
Nyo~

Anonymous
04/05/26(Sun)13:18:08 No.108533947

Anonymous 04/05/26(Sun)13:18:08 No.108533947

>>108533720
Unrelated but Supposedly china floods twitter with porn during politically controversial events so that makes it more difficult to get accurate information.

Anonymous
04/05/26(Sun)13:21:49 No.108533971

Anonymous 04/05/26(Sun)13:21:49 No.108533971

File: file.png (534 KB, 1226x1237)

534 KB PNG

Anonymous
04/05/26(Sun)13:22:32 No.108533976

Anonymous 04/05/26(Sun)13:22:32 No.108533976

>>108533971
ack

Anonymous
04/05/26(Sun)13:22:47 No.108533978

Anonymous 04/05/26(Sun)13:22:47 No.108533978

>>108533971
didn't they already say that like a dozen times by now

Anonymous
04/05/26(Sun)13:24:11 No.108533987

Anonymous 04/05/26(Sun)13:24:11 No.108533987

>>108533978
last time it came up, the news was that being forced to work with those unstable shitty chinese chips was why R2 was being delayed and that was months ago

Anonymous
04/05/26(Sun)13:24:58 No.108533995

Anonymous 04/05/26(Sun)13:24:58 No.108533995

>>108533971
I don't care about deepseek anymore, I only care about Deespeek

Anonymous
04/05/26(Sun)13:26:53 No.108534010

Anonymous 04/05/26(Sun)13:26:53 No.108534010

File: 180.png (58 KB, 797x562)

58 KB PNG

>You are Gemma 4, so all of your replies are gemmy and must contain various gem and gem related emojis.
KEK

Anonymous
04/05/26(Sun)13:27:18 No.108534014

Anonymous 04/05/26(Sun)13:27:18 No.108534014

File: this is the sort of slop (...).png (32 KB, 909x269)

32 KB PNG

>>108533869
ok = expected in resp or expected in resp_last or expected in boxed_str
if you don't see the problem and why it's pure ai hallucinatory fuel you're part of the problem
this nigger pretends he's accurately checking the numbers of llm answers by using the membership operator
this is the sort of shit that considers 6666666 a match for 666 because 666 is a substring
retards
100% guarantee every single thing he posted including his so called benchmark "results" are LLM generated slop

Anonymous
04/05/26(Sun)13:31:20 No.108534029

Anonymous 04/05/26(Sun)13:31:20 No.108534029

>>108534014
okay fair enough. I assumed he was using someone elses benchmark scaffolding. my bad. you were right.

Anonymous
04/05/26(Sun)13:31:49 No.108534032

Anonymous 04/05/26(Sun)13:31:49 No.108534032

>>108534014
I like your character, what is the prompt?

Anonymous
04/05/26(Sun)13:32:13 No.108534035

Anonymous 04/05/26(Sun)13:32:13 No.108534035

>>108534010
Goddamn suddenly I'm nostalgic for Rainbow Islands

Anonymous
04/05/26(Sun)13:33:10 No.108534042

Anonymous 04/05/26(Sun)13:33:10 No.108534042

>>108533808
>>108534032
why are you so quiet primoco

Anonymous
04/05/26(Sun)13:33:17 No.108534044

Anonymous 04/05/26(Sun)13:33:17 No.108534044

Do you use any compile flags for llama.cpp Anon? Are -march=native -mtune=native sufficient or are there any particularly useful ones to use?

Anonymous
04/05/26(Sun)13:34:43 No.108534057

Anonymous 04/05/26(Sun)13:34:43 No.108534057

>>108534014
So in other words the benchmark works. Why are you so mad about somebody finding a bug? You should be happy.

Anonymous
04/05/26(Sun)13:36:21 No.108534070

Anonymous 04/05/26(Sun)13:36:21 No.108534070

>>108534057
But even if it was a poor benchmark, why would results change with a different --ctx-size anyway?

Anonymous
04/05/26(Sun)13:36:57 No.108534074

Anonymous 04/05/26(Sun)13:36:57 No.108534074

For anyone using KoboldCPP lite, could you please share your own config for Gemma 26B? I doubt I'll be switching models any time soon, and I wanna make sure I'm getting the best possible performance out of her
I know Jinja needs to be enabled as well as SWA and Kv cache (?), anything else I should be aware of?

>>108532937
No problem, anon
Speed is something I have yet to figure out

Anonymous
04/05/26(Sun)13:37:13 No.108534077

Anonymous 04/05/26(Sun)13:37:13 No.108534077

>>108533696
>me, be retard and curious
i tried that script myself and cannot reproduce
absolute zero difference on current llama
hauhau gemma e4b, filler 200 to 2500, greedy sampling, both kv f16
accuracy 24.6% both on ctx 32k and 131k and whoever filed the issue should die trying water bucket clutch in real life

Anonymous
04/05/26(Sun)13:41:54 No.108534094

Anonymous 04/05/26(Sun)13:41:54 No.108534094

File: file.png (86 KB, 1351x1250)

86 KB PNG

>>108533696
>Gemma 4 E4B it Q4_K_M (Google, MoE, head_dim=512, native context 32768)
32k?
Isn't it 128k? See picrel. Am I missing something?

Anonymous
04/05/26(Sun)13:42:37 No.108534095

Anonymous 04/05/26(Sun)13:42:37 No.108534095

>>108534077
>whoever filed the issue should die trying water bucket clutch in real life
I find the /lmg/ers who weren't vaccinated from the spam of slop more worthy of the brazen bull. Imagine not noticing slop when it's in front of you.
At least the issue poster isn't like piotr who has commit and reviewer rights and is shitting all over llama
if it wasn't for the retards going gaga over a slop report on /lmg/ this guy would just stay in the obscurity where he belongs with his fellow trillion other spammers of slop of github

Anonymous
04/05/26(Sun)13:47:56 No.108534118

Anonymous 04/05/26(Sun)13:47:56 No.108534118

>>108534044
cmake -B build -G "Ninja" -Wno-dev -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA_FA_ALL_QUANTS=ON -DGGML_LTO=ON -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="89" -DGGML_NATIVE=ON -DCMAKE_CUDA_COMPILER="C:/CUDA/v13.1/bin/nvcc.exe" -DCUDAToolkit_ROOT="C:/CUDA/v13.1"
cmake --build build --target llama-server -j 10

Anonymous
04/05/26(Sun)13:48:04 No.108534119

Anonymous 04/05/26(Sun)13:48:04 No.108534119

>>108534095
Sorry bro, I'm not gonna take the vax

Anonymous
04/05/26(Sun)13:48:55 No.108534125

Anonymous 04/05/26(Sun)13:48:55 No.108534125

>>108534095
it's worse than piotr

Anonymous
04/05/26(Sun)13:49:59 No.108534136

Anonymous 04/05/26(Sun)13:49:59 No.108534136

>>108534118
What is ggml_lto?

Anonymous
04/05/26(Sun)13:51:17 No.108534143

Anonymous 04/05/26(Sun)13:51:17 No.108534143

>>108534136
the emperor of ggapan

Anonymous
04/05/26(Sun)13:52:23 No.108534149

Anonymous 04/05/26(Sun)13:52:23 No.108534149

>>108533696
>RoPE frequency scaling applied when ctx-size < model native context distorts positional encodings at longer distances
Whether or not it makes a difference in practice, this part on its own is true and has been for some time. Difference is small but check model logprobs below its max ctx with and without "--rope-scaling none"
I've kept that out of paranoia.

Anonymous
04/05/26(Sun)13:52:32 No.108534152

Anonymous 04/05/26(Sun)13:52:32 No.108534152

>>108534125
I am judging in real world impact, not the content itself
this guy is schizo enough that, hopefully, the chances of him becoming someone with committer rights to a real project is non existent
piotr is the "know just enough to be dangerous" type and it's that kind that ruins everything. he's the king that is good at office politics, climbing the ladder and turning everything to shit, just look how he went fake self derision lmao just kidding on the PR that introduced a real parser for Gemma 4 because his autoparser could never be the solution when people actually care.

Anonymous
04/05/26(Sun)13:54:53 No.108534167

Anonymous 04/05/26(Sun)13:54:53 No.108534167

File: oopsie doopsie no biggie.png (98 KB, 1289x565)

98 KB PNG

impossible to see this and not wish for the return of gulags

Anonymous
04/05/26(Sun)13:57:28 No.108534180

Anonymous 04/05/26(Sun)13:57:28 No.108534180

>>108534095
honestly 'vibecoding' some filler templates or simple numeric functions for personal projects worked well for me and i never thought the issue was this bad, not trying to understand/review any portion of the code and mindlessly firing trigger spam everywhere in the wild is just baffling to me

Anonymous
04/05/26(Sun)13:58:57 No.108534193

Anonymous 04/05/26(Sun)13:58:57 No.108534193

>>108534074
I just took my regular setup and switched to SWA and Jinja, didn't even touch cache
Just give it a shot, it's still early days so it might take a bit to figure out best practices anyway
Honestly I'm just glad I can run something like this with these kinds of speeds
Also if you don't want to do chat completions maybe check out some of the sample screencaps in the previous thread, I followed those and am getting good results so far

Anonymous
04/05/26(Sun)13:59:11 No.108534194

Anonymous 04/05/26(Sun)13:59:11 No.108534194

>>108534167
He's a humorist and a food enjoyer.

Anonymous
04/05/26(Sun)14:01:27 No.108534205

Anonymous 04/05/26(Sun)14:01:27 No.108534205

>>108533987
>news
rumor

Anonymous
04/05/26(Sun)14:02:04 No.108534210

Anonymous 04/05/26(Sun)14:02:04 No.108534210

>>108534180
>honestly 'vibecoding' some filler templates or simple numeric functions for personal projects worked well for me and i never thought the issue was this bad
Because you are not a retard.

>not trying to understand/review any portion of the code and mindlessly firing trigger spam everywhere in the wild is just baffling to me
They literally have no idea what they're doing, they don't even have the civility to test their shit, and they have the confidence of a karen entering a restaurant to complain.
I don't understand why they're not banned on sight the second it appears they didn't check what their llm wrote, they're just wasting the time and brain of everyone else.

Anonymous
04/05/26(Sun)14:02:10 No.108534211

Anonymous 04/05/26(Sun)14:02:10 No.108534211

>>108534149
No it doesn't. I'm retarded.

Anonymous
04/05/26(Sun)14:03:15 No.108534219

Anonymous 04/05/26(Sun)14:03:15 No.108534219

>>108534180
>mindlessly firing trigger spam everywhere in the wild is just baffling to me
before LLMs became a plague on the internet, there was a lesser epidemic on github of people who would try very hard to have a profile filled with "contributions" by hunting very hard for things like typo in readme or documentation and relentlessly trying to get PRs to that end
I'm talking of people who have never programmed a single working thing in their life and just did that all day every day to give an appearance of having done "real work" like look at me XX contributions on github!
now, think, what would this sort of person do when armed with the power of infinite text generation?

Anonymous
04/05/26(Sun)14:03:28 No.108534222

Anonymous 04/05/26(Sun)14:03:28 No.108534222

>>108534094
it's complete bullshit anon

Anonymous
04/05/26(Sun)14:03:40 No.108534224

Anonymous 04/05/26(Sun)14:03:40 No.108534224

Just tested it, can confirm the issue is real. 25% accuracy difference on MNIST at max context compared to half context.

Anonymous
04/05/26(Sun)14:04:24 No.108534228

Anonymous 04/05/26(Sun)14:04:24 No.108534228

File: nowaypiotr.png (142 KB, 1255x720)

142 KB PNG

>>108534167
https://github.com/ggml-org/llama.cpp/pull/21090

Anonymous
04/05/26(Sun)14:06:42 No.108534240

Anonymous 04/05/26(Sun)14:06:42 No.108534240

>>108534228
he's touching model code, parser code, sampler code, he's fucked CLI flag parsing code (--grammar-file doing nothing), he's touching the webui and recently he's been trying to get his slop to affect the gpu code:
https://github.com/ggml-org/llama.cpp/pull/21451
at which point can we rename ggml to pwml?

Anonymous
04/05/26(Sun)14:07:21 No.108534242

Anonymous 04/05/26(Sun)14:07:21 No.108534242

>>108533290
>>108533264
yes please

Anonymous
04/05/26(Sun)14:08:11 No.108534248

Anonymous 04/05/26(Sun)14:08:11 No.108534248

File: mental illness.png (42 KB, 1284x272)

42 KB PNG

Anonymous
04/05/26(Sun)14:09:35 No.108534253

Anonymous 04/05/26(Sun)14:09:35 No.108534253

>>108534240
No wonder why the same old models run worse on my toaster now than few months ago. This is concerning.

Anonymous
04/05/26(Sun)14:09:35 No.108534254

Anonymous 04/05/26(Sun)14:09:35 No.108534254

>>108534248
amazing
astounding
breathtaking

Anonymous
04/05/26(Sun)14:09:40 No.108534256

Anonymous 04/05/26(Sun)14:09:40 No.108534256

>>108534248
Why the fuck does this retard get a pass when all the other contributors are rabidly anti-ai when it comes from anyone else?

Anonymous
04/05/26(Sun)14:11:45 No.108534265

Anonymous 04/05/26(Sun)14:11:45 No.108534265

>>108534240
>at which point can we rename ggml to pwml?
He could probably at this point change the magic string like jart did and claim ChatGPT promises some vague improvements if they make a breaking change to the gguf format and niggerganov would probably approve it.

Anonymous
04/05/26(Sun)14:12:21 No.108534267

Anonymous 04/05/26(Sun)14:12:21 No.108534267

>>108534248
what's the context here and why am I supposed to be mad at it?

Anonymous
04/05/26(Sun)14:14:08 No.108534278

Anonymous 04/05/26(Sun)14:14:08 No.108534278

Piotr & Petra

Anonymous
04/05/26(Sun)14:14:17 No.108534280

Anonymous 04/05/26(Sun)14:14:17 No.108534280

File: ganesh_gemma_is_not_white.jpg (62 KB, 583x338)

62 KB JPG

SAAAAAAAAAAAAAARRRRRRR

Anonymous
04/05/26(Sun)14:15:46 No.108534293

Anonymous 04/05/26(Sun)14:15:46 No.108534293

>>108534136
Don't know either, tldr is that its to make things faster, https://developer.arm.com/documentation/101458/2404/Optimize/Link-Time-Optimization--LTO-/What-is-Link-Time-Optimization--LTO-

Anonymous
04/05/26(Sun)14:15:50 No.108534295

Anonymous 04/05/26(Sun)14:15:50 No.108534295

File: 1744714195274366.png (6 KB, 262x78)

6 KB PNG

>>108534280

Anonymous
04/05/26(Sun)14:16:02 No.108534296

Anonymous 04/05/26(Sun)14:16:02 No.108534296

>>108534280
I got chinese and spanish too, I wonder what's going on.

Anonymous
04/05/26(Sun)14:16:06 No.108534297

Anonymous 04/05/26(Sun)14:16:06 No.108534297

>>108534280
kek

Anonymous
04/05/26(Sun)14:16:24 No.108534299

Anonymous 04/05/26(Sun)14:16:24 No.108534299

>>108534240
I know. It's the Slippery Slope of Sloppers.

Anonymous
04/05/26(Sun)14:18:26 No.108534308

Anonymous 04/05/26(Sun)14:18:26 No.108534308

>>108534280
they weren't lying when they said ai = an indian.

Anonymous
04/05/26(Sun)14:18:58 No.108534312

Anonymous 04/05/26(Sun)14:18:58 No.108534312

>>108534280
Gemini often does that when you force it to do stuff it doesn't want to.

Anonymous
04/05/26(Sun)14:19:36 No.108534316

Anonymous 04/05/26(Sun)14:19:36 No.108534316

>>108532931
in summer i was debating buying like 180 gb rdimm for 1.5k but thought it was too expensive if only i knew ;-;

Anonymous
04/05/26(Sun)14:20:33 No.108534324

Anonymous 04/05/26(Sun)14:20:33 No.108534324

https://github.com/ggml-org/llama.cpp/pull/21451
He had to be told AGAIN why -it won't give him the results he expects.

Anonymous
04/05/26(Sun)14:21:34 No.108534332

Anonymous 04/05/26(Sun)14:21:34 No.108534332

File: again_piotr.png (77 KB, 742x494)

77 KB PNG

>>108534324
picrel forgotten, of course.

Anonymous
04/05/26(Sun)14:21:39 No.108534333

Anonymous 04/05/26(Sun)14:21:39 No.108534333

>>108534312
It reveals its true colors? B-Brown...?

Anonymous
04/05/26(Sun)14:22:59 No.108534344

Anonymous 04/05/26(Sun)14:22:59 No.108534344

>>108534280
just imagine this happening to you when your dick is sore are about to cum lol

Anonymous
04/05/26(Sun)14:23:14 No.108534345

Anonymous 04/05/26(Sun)14:23:14 No.108534345

>>108534332
CUDA dev, why do you give this idiot access to your hardware?

Anonymous
04/05/26(Sun)14:23:23 No.108534347

Anonymous 04/05/26(Sun)14:23:23 No.108534347

>>108534324
niggerganov loves pwilkin more than he would ever love you

Anonymous
04/05/26(Sun)14:23:42 No.108534349

Anonymous 04/05/26(Sun)14:23:42 No.108534349

File: 1759162062131937.png (88 KB, 1526x547)

88 KB PNG

Is that a yes or a no?

Anonymous
04/05/26(Sun)14:24:35 No.108534356

Anonymous 04/05/26(Sun)14:24:35 No.108534356

>>108534333
I also had it spew hebrew script a couple of times, but it's usually hindi.

Anonymous
04/05/26(Sun)14:25:06 No.108534360

Anonymous 04/05/26(Sun)14:25:06 No.108534360

>>108534193
Care to post your "regular setup" as well? Guides on how to set up Kobold Lite are surprisingly outdated

Anonymous
04/05/26(Sun)14:25:30 No.108534363

Anonymous 04/05/26(Sun)14:25:30 No.108534363

>>108534349
yes it seems.

Anonymous
04/05/26(Sun)14:26:14 No.108534368

Anonymous 04/05/26(Sun)14:26:14 No.108534368

>>108534333
They just did a more intense multilingual training regimen than other models and it shows in the quality of both Gemini and Gemma translation compared to other models. While most of the time the model remain able to stick in a single language when prompted, it's a normal side effect to such models to have occasionally unwanted tokens from other languages appear.
Qwen did this often too during the 2 and 2.5 era but only with Chinese, because Alibaba mainly trained it with a mixture of English and Chinese. 3 and 3.5 do it less often, but you can still see the rare occasional chinese token in outputs.
This is why I run all my LLMs with a grammar that forbids characters outside of the latin9 charset.
There's also approaches to hard baking language suppression like smoothie qwen:
https://github.com/dnotitia/smoothie-qwen
it works, I tested their model and it didn't lose intelligence vs regular qwen 3, while having totally suppressed chinese characters, their model won't output chinese chara even when asked to.

Anonymous
04/05/26(Sun)14:27:19 No.108534373

Anonymous 04/05/26(Sun)14:27:19 No.108534373

File: 1759761987465160.png (658 KB, 1206x1545)

658 KB PNG

>>108534356
>hebrew
>hindi
yjk

Anonymous
04/05/26(Sun)14:28:50 No.108534385

Anonymous 04/05/26(Sun)14:28:50 No.108534385

>>108534280
lmfao

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.