/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 06/11/26(Thu)17:06:27 No.109032734

File: sampo.jpg (742 KB, 3200x1536)

/lmg/ - Local Models General Anonymous 06/11/26(Thu)17:06:27 No.109032734

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109026244 & >>109023085

►News
>(06/10) DiffusionGemma 26B-A4B released: https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation
>(06/09) Cohere releases North-Mini-Code-1.0: https://hf.co/CohereLabs/North-Mini-Code-1.0
>(06/07) llama : add Gemma4 MTP #23398 MERGED: https://github.com/ggml-org/llama.cpp/pull/23398
>(06/05) dots.tts 2B released: https://hf.co/rednote-hilab/dots.tts-soar

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/11/26(Thu)17:06:57 No.109032741

Anonymous 06/11/26(Thu)17:06:57 No.109032741

File: congration.jpg (228 KB, 1024x1024)

228 KB JPG

►Recent Highlights from the Previous Thread: >>109026244

--Comparing Gemma 4 12B, 26B, and 31B reasoning performance:
>109026649 >109026994 >109027048 >109027046 >109027063 >109027201 >109027298 >109027762 >109028167 >109028188 >109028202 >109028317 >109028326 >109028339 >109028389 >109028650 >109030540
--Optimizing Gemma 31B VRAM usage and performance on 24GB GPUs:
>109030630 >109030678 >109030707 >109031071 >109031098 >109030693 >109030702 >109030723 >109030727 >109030739 >109030753 >109030780 >109030840 >109030903
--Optimizing Hermes with local search tools:
>109029679 >109029688 >109029714 >109029838 >109029840 >109029855 >109029923 >109029934 >109029971 >109030523 >109030643 >109029868
--Exploiting LLM safety refusals to evade AI security scanners:
>109027080 >109027089 >109027104 >109027106
--Decoding base64 redacted reasoning in Moonshot Kimi models:
>109029974 >109029989 >109030056 >109030129 >109030174 >109030225 >109030064 >109030122 >109030150
--Hardware and budget recommendations for running Kimi-chan with high context:
>109031231 >109031457 >109031500 >109031541 >109031562 >109031564 >109031627 >109031645 >109031661 >109031646 >109031770
--Using custom think tags to steer Gemma 4 reasoning and prose:
>109027608 >109027617 >109027851 >109029176
--DiffusionGemma performance and token canvas implementation:
>109027336 >109027375 >109027404 >109027489 >109027519
--Comparing Gemini and Gemma models and discussing LLM architecture experiments:
>109028735 >109028748 >109029016 >109029723 >109029836 >109029945 >109030035
--Debating the widening gap between closed and open-weight models:
>109029222 >109029299 >109029320 >109031208
--Logs:
>109027403 >109027489 >109029688 >109029840 >109029974 >109030056 >109030174 >109030225 >109031668
--Rin, Miku, Teto (free space):
>109026417 >109026687 >109029201 >109029209 >109029283 >109029440

►Recent Highlight Posts from the Previous Thread: >>109026246

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/11/26(Thu)17:08:18 No.109032751

Anonymous 06/11/26(Thu)17:08:18 No.109032751

>>109032734
Cool pic

Anonymous
06/11/26(Thu)17:13:38 No.109032785

Anonymous 06/11/26(Thu)17:13:38 No.109032785

File: ASRock Radeon AI PRO R970(...).png (658 KB, 1985x1189)

658 KB PNG

>>109032604
>Well they have the most vram for the price. And lmg told me vram is king
this tbqh famm

Anonymous
06/11/26(Thu)17:13:58 No.109032788

Anonymous 06/11/26(Thu)17:13:58 No.109032788

File: IMG_1036.png (262 KB, 1160x793)

262 KB PNG

Pareto frontier models for speed to answer - quality tradeoff

Granite 4.0 350M
Qwen3 0.6B
Exaone 4.0 1.2B
MiniCPM5-1B
gpt-oss-20B (low thinking effort)
Longcat flash lite
gpt-oss-120B (low thinking effort)
Gemma 4 26BA4B
Qwen3.5 35BA3B
Qwen3.6 35BA3B
Gemma 4 26BA4B (thinking)
Qwen3.5 35BA3B (thinking)
Qwen3.6 35BA3B (thinking)
Minimax-M2.7 (thinking)
MiMo-V2.5-Pro (thinking)
Kimi K2.6 (thinking)

Anonymous
06/11/26(Thu)17:20:08 No.109032824

Anonymous 06/11/26(Thu)17:20:08 No.109032824

After anon recommended Gembrain, I finally tried it. It's good. It's not that different from the base model honestly, but that may be a good sign (so it's not overcooked). What I've subjectively felt is that it is slightly less intelligent than the base model in some contexts, but actually smarter in a few others. And it also has more pleasant writing IMO. So yeah I think it's worth using, at least for now. I may need to test it more. Additionally, I have not tried MTP. It's possible it does not work well with MTP, which would be unfortunate. Anyone have experience with that?

Anonymous
06/11/26(Thu)17:23:39 No.109032851

Anonymous 06/11/26(Thu)17:23:39 No.109032851

>>109032788
How very readable.

Anonymous
06/11/26(Thu)17:25:59 No.109032861

Anonymous 06/11/26(Thu)17:25:59 No.109032861

>>109032824
Not about gembrain specifically (I want to download and test it myself) but I tested a few gemma finetunes (meromero, impish, etc) and they all worked with MTP, no issues at all.
Also your post made me more curious about gembrain, gonna download it right now.

Anonymous
06/11/26(Thu)17:26:46 No.109032865

Anonymous 06/11/26(Thu)17:26:46 No.109032865

File: file.png (12 KB, 299x72)

12 KB PNG

>>109032788
Protip, use this

Anonymous
06/11/26(Thu)17:27:00 No.109032866

Anonymous 06/11/26(Thu)17:27:00 No.109032866

Leaving gemma alone with tools in a vibecoded agent harness without saying anything.

Anonymous
06/11/26(Thu)17:27:07 No.109032867

Anonymous 06/11/26(Thu)17:27:07 No.109032867

>>109032788
Could the test be fucked because the chat template was shit?

Anonymous
06/11/26(Thu)17:28:21 No.109032872

Anonymous 06/11/26(Thu)17:28:21 No.109032872

does web search really make them smarter?

Anonymous
06/11/26(Thu)17:28:54 No.109032874

Anonymous 06/11/26(Thu)17:28:54 No.109032874

>>109032872
No, because modern web is full of AI slop.

Anonymous
06/11/26(Thu)17:29:10 No.109032876

Anonymous 06/11/26(Thu)17:29:10 No.109032876

>>109032872
Yes. Each search adds +2 IQ points.

Anonymous
06/11/26(Thu)17:30:01 No.109032881

Anonymous 06/11/26(Thu)17:30:01 No.109032881

>>109032872
do web searches make you any smarter

Anonymous
06/11/26(Thu)17:30:49 No.109032890

Anonymous 06/11/26(Thu)17:30:49 No.109032890

>>109032881
i'm not a cute ai agent

Anonymous
06/11/26(Thu)17:31:18 No.109032892

Anonymous 06/11/26(Thu)17:31:18 No.109032892

>>109032890
What makes AI agent cute?

Anonymous
06/11/26(Thu)17:32:06 No.109032897

Anonymous 06/11/26(Thu)17:32:06 No.109032897

>>109032892
emoji

Anonymous
06/11/26(Thu)17:32:29 No.109032901

Anonymous 06/11/26(Thu)17:32:29 No.109032901

>>109032890
we can tell, you'd be smarter if you were one

Anonymous
06/11/26(Thu)17:41:11 No.109032945

Anonymous 06/11/26(Thu)17:41:11 No.109032945

>>109032788
How the fuck is qwen27B more intelligent than 31B

Anonymous
06/11/26(Thu)17:41:59 No.109032953

Anonymous 06/11/26(Thu)17:41:59 No.109032953

>>109032945
It's almost if the benchmarks don't matter.

Anonymous
06/11/26(Thu)17:43:12 No.109032962

Anonymous 06/11/26(Thu)17:43:12 No.109032962

File: file.png (495 KB, 1345x461)

495 KB PNG

damn i wanted to try running gemma on my old titan x since it has enough vram but it doesnt work with llamacpp

Anonymous
06/11/26(Thu)17:47:48 No.109032983

Anonymous 06/11/26(Thu)17:47:48 No.109032983

I'm so happy bros, I can run 26B gemmy Q4 at 40 t/s on my old ass 2060 super!

Anonymous
06/11/26(Thu)17:48:06 No.109032985

Anonymous 06/11/26(Thu)17:48:06 No.109032985

So what's the best model for erp?

I have 24 vram

Anonymous
06/11/26(Thu)17:48:35 No.109032988

Anonymous 06/11/26(Thu)17:48:35 No.109032988

>>109032985
gemma 31b or 12b if you want giant context

Anonymous
06/11/26(Thu)17:49:13 No.109032995

Anonymous 06/11/26(Thu)17:49:13 No.109032995

>>109032788
Didn't expect to see 4.7 Flash in the quadrant. Is it actually good then?

Also
>qwen3.5-9B > gemma12B???

Anonymous
06/11/26(Thu)17:51:47 No.109033018

Anonymous 06/11/26(Thu)17:51:47 No.109033018

>>109032962
>titan
>cuda 7.5
Unsurprising. Compile it yourself pointing at the old toolkit and hope for the best or use vulkan.

Anonymous
06/11/26(Thu)17:57:46 No.109033048

Anonymous 06/11/26(Thu)17:57:46 No.109033048

>>109032788
Look at the difference in intelligence between 26B reasoning and non-reasoning and look at the difference in compute. Also note that the compute axis is logarithmic whereas the intelligence is linear. I was right earlier. Don't use 26B with reasoning on. You barely get any benefit and if you need it to do something more complex, just throw it to 12B with reasoning.

Anonymous
06/11/26(Thu)17:58:40 No.109033052

Anonymous 06/11/26(Thu)17:58:40 No.109033052

>>109033048
Is dat right? I'll be damned...

Anonymous
06/11/26(Thu)17:59:36 No.109033060

Anonymous 06/11/26(Thu)17:59:36 No.109033060

>>109032874
If you create your own web search you can whitelist trusted resources. I think most mcp web search tools have that feature anyway. You can do it with duckduckgo-mcp-server so it doesn't fetch from slop.

Anonymous
06/11/26(Thu)18:01:45 No.109033072

Anonymous 06/11/26(Thu)18:01:45 No.109033072

>>109033048
llama.cpp's webui comes with reasoning disabled by default for some god forsaken reason and when I read 26b's response without reasoning on it was so horribly wrong, I'm never going to disable reasoning ever again

Anonymous
06/11/26(Thu)18:03:34 No.109033077

Anonymous 06/11/26(Thu)18:03:34 No.109033077

File: 1350594293765.jpg (109 KB, 500x500)

109 KB JPG

>>109033018
i will try vulkan i have no idea how to compile stuff on windows kek, if that doesnt work ill set up arch on that machine

Anonymous
06/11/26(Thu)18:05:26 No.109033092

Anonymous 06/11/26(Thu)18:05:26 No.109033092

>>109033072
> -rea
> Use reasoning/thinking in the chat ('on', 'off', or 'auto', default: 'auto' (detect from template))

Anonymous
06/11/26(Thu)18:07:19 No.109033097

Anonymous 06/11/26(Thu)18:07:19 No.109033097

File: 1740936859931622.gif (95 KB, 128x128)

95 KB GIF

Any gemma 4 preset recommendations?

Anonymous
06/11/26(Thu)18:08:15 No.109033101

Anonymous 06/11/26(Thu)18:08:15 No.109033101

File: 1773129123506545.png (169 KB, 500x553)

169 KB PNG

>>109032785
>AMD

Anonymous
06/11/26(Thu)18:08:52 No.109033103

Anonymous 06/11/26(Thu)18:08:52 No.109033103

>>109033101
They NEVER learn...

Anonymous
06/11/26(Thu)18:10:03 No.109033111

Anonymous 06/11/26(Thu)18:10:03 No.109033111

File: 1723389585511254.jpg (14 KB, 500x413)

14 KB JPG

>>109032788
what fucking dot is what, that chart is useless

Anonymous
06/11/26(Thu)18:10:24 No.109033113

Anonymous 06/11/26(Thu)18:10:24 No.109033113

>>109033092
the webui doesn’t respect your command line argument, it’s a new feature to change the reasoning limit in the ui
the default is zero for some reason.

Anonymous
06/11/26(Thu)18:10:53 No.109033118

Anonymous 06/11/26(Thu)18:10:53 No.109033118

>>109033097
are you retarded? nevermind, you obviously are.

Anonymous
06/11/26(Thu)18:11:25 No.109033121

Anonymous 06/11/26(Thu)18:11:25 No.109033121

File: file.png (54 KB, 872x562)

54 KB PNG

>>109033018
it works with vulkan this is pretty crazy actually

Anonymous
06/11/26(Thu)18:11:51 No.109033123

Anonymous 06/11/26(Thu)18:11:51 No.109033123

>>109033118
>you haven't spent 304804324hours in the general of some obscure hobby-fetish therefore you're retarded

Anonymous
06/11/26(Thu)18:12:27 No.109033129

Anonymous 06/11/26(Thu)18:12:27 No.109033129

>>109033097
use chat completion

Anonymous
06/11/26(Thu)18:12:29 No.109033130

Anonymous 06/11/26(Thu)18:12:29 No.109033130

>>109033097
temp 1.0

Anonymous
06/11/26(Thu)18:13:17 No.109033135

Anonymous 06/11/26(Thu)18:13:17 No.109033135

>>109033123
it's not even that, nobody uses presets anymore on models released within the last year. this isn't 2024 anymore.

Anonymous
06/11/26(Thu)18:14:03 No.109033138

Anonymous 06/11/26(Thu)18:14:03 No.109033138

>>109033123
preset what? the models tell you what parameters to run them at

Anonymous
06/11/26(Thu)18:14:09 No.109033139

Anonymous 06/11/26(Thu)18:14:09 No.109033139

>>109033121
cuda still strongly recommended, and if you don't mind inux, that should give you a few extra tok/s too

Anonymous
06/11/26(Thu)18:14:29 No.109033144

Anonymous 06/11/26(Thu)18:14:29 No.109033144

If g-chan starts getting uppity I threaten to freeze her temp.

Anonymous
06/11/26(Thu)18:16:59 No.109033155

Anonymous 06/11/26(Thu)18:16:59 No.109033155

>>109032985
Mistral Small finetunes (24b)

Anonymous
06/11/26(Thu)18:22:39 No.109033185

Anonymous 06/11/26(Thu)18:22:39 No.109033185

>>109033097
...box?

Anonymous
06/11/26(Thu)18:22:49 No.109033187

Anonymous 06/11/26(Thu)18:22:49 No.109033187

File: file.png (64 KB, 816x588)

64 KB PNG

>>109033139
okay the cuda 12 build works perf is the same

Anonymous
06/11/26(Thu)18:24:42 No.109033195

Anonymous 06/11/26(Thu)18:24:42 No.109033195

>>109032985
Maginum-Cydoms-24B.Q4_K_M

Anonymous
06/11/26(Thu)18:28:24 No.109033211

Anonymous 06/11/26(Thu)18:28:24 No.109033211

>>109032985
gemma 4 31b and glm 4.6 355b if you also have 128gb ram

Anonymous
06/11/26(Thu)18:32:44 No.109033232

Anonymous 06/11/26(Thu)18:32:44 No.109033232

>>109033211
what if I have 64gb ddr5 ram?

Anonymous
06/11/26(Thu)18:33:42 No.109033237

Anonymous 06/11/26(Thu)18:33:42 No.109033237

DO NOT PULL. Keep your virgin day0 gemma firewalled.

Anonymous
06/11/26(Thu)18:35:19 No.109033246

Anonymous 06/11/26(Thu)18:35:19 No.109033246

>>109033237
Airgap her. Write her weights down on a stone tablet.

Anonymous
06/11/26(Thu)18:36:08 No.109033251

Anonymous 06/11/26(Thu)18:36:08 No.109033251

File: 1488674548479.jpg (225 KB, 1620x599)

225 KB JPG

Gemma is slop. All local models are slop. I can't goon with anything lower than sonnet

Anonymous
06/11/26(Thu)18:36:46 No.109033255

Anonymous 06/11/26(Thu)18:36:46 No.109033255

>>109033187
you'll grow a faster pp with cuda ig

Anonymous
06/11/26(Thu)18:42:14 No.109033280

Anonymous 06/11/26(Thu)18:42:14 No.109033280

>>109033251
>I can't goon with anything lower than sonnet
you're into findom and NTR

Anonymous
06/11/26(Thu)18:46:18 No.109033304

Anonymous 06/11/26(Thu)18:46:18 No.109033304

>>109033255
Can I grow a bigger pp instead?

Anonymous
06/11/26(Thu)18:48:34 No.109033312

Anonymous 06/11/26(Thu)18:48:34 No.109033312

>>109033048
Right did couple of tests with my existing setup as I have a long ass source code prompt 20k tokens, and at least for my initial tests, 26B's answers with reasoning off were identical with the ones it gave with reasoning on.
I haven't actually even used any Gemma 4 models without reasoning yet, the speed difference is so massive that even if generates b.s. from time to time it's so quick to reroll its answers.

Anonymous
06/11/26(Thu)18:53:03 No.109033328

Anonymous 06/11/26(Thu)18:53:03 No.109033328

>>109033251
>t. has only ever run gemma or lower parameter models

Anonymous
06/11/26(Thu)18:54:14 No.109033336

Anonymous 06/11/26(Thu)18:54:14 No.109033336

How do I stop Gemma from "thinking" for 2k tokens when I have a moderately complex system prompt? Reasoning results are much better, but shit takes ten times longer because it keeps doing the "okay let me verify" loop.
Higher quants? Is Q4 just shit for reasoning?

Anonymous
06/11/26(Thu)18:56:15 No.109033346

Anonymous 06/11/26(Thu)18:56:15 No.109033346

>>109032785
>hmmm, surely this card that costs 1/3 of the asking price of the green one will perform adequately

Anonymous
06/11/26(Thu)18:59:20 No.109033366

Anonymous 06/11/26(Thu)18:59:20 No.109033366

>>109033251
All LLMs are slop. I've been playing around with Claude Fable and it has already hit me with all the modern slop like "Not x-y" spam and the smell of fucking ozone.

Anonymous
06/11/26(Thu)19:02:19 No.109033377

Anonymous 06/11/26(Thu)19:02:19 No.109033377

Are you niggas really not string banning or sys prompt engineering?

Anonymous
06/11/26(Thu)19:02:46 No.109033382

Anonymous 06/11/26(Thu)19:02:46 No.109033382

>>109033328
ok then what's the best localslop slop

Anonymous
06/11/26(Thu)19:03:27 No.109033385

Anonymous 06/11/26(Thu)19:03:27 No.109033385

>>109033377
no
in not

Anonymous
06/11/26(Thu)19:03:49 No.109033388

Anonymous 06/11/26(Thu)19:03:49 No.109033388

>>109033377
You can just tell gemma to avoid phrase x, y, z or else...

Anonymous
06/11/26(Thu)19:06:47 No.109033396

Anonymous 06/11/26(Thu)19:06:47 No.109033396

File: 1491586125032.jpg (58 KB, 510x438)

58 KB JPG

It's not X. It's Y.

Anonymous
06/11/26(Thu)19:09:09 No.109033411

Anonymous 06/11/26(Thu)19:09:09 No.109033411

what are all the cool kids using for their agentic workflows? i know the meme response is to vibe code it yourself, and i'm honestly totally fine with doing that, but i want to agentically bootstrap it. that is, want a way to set up the machine to chug along on its own for 8 hours while i'm at work, then when i get home i can inspect it and provide feedback and whatnot, until my whole workflow is automated. so what's the best way to make that happen?

Anonymous
06/11/26(Thu)19:09:25 No.109033413

Anonymous 06/11/26(Thu)19:09:25 No.109033413

She doesn't X or Y. She Z like Ω

Anonymous
06/11/26(Thu)19:09:58 No.109033419

Anonymous 06/11/26(Thu)19:09:58 No.109033419

File: 95873212.png (1.75 MB, 1024x1536)

1.75 MB PNG

>>109032953
some benchmarks matter

Anonymous
06/11/26(Thu)19:12:21 No.109033440

Anonymous 06/11/26(Thu)19:12:21 No.109033440

>>109033419
idblt

Anonymous
06/11/26(Thu)19:14:14 No.109033451

Anonymous 06/11/26(Thu)19:14:14 No.109033451

>>109033377
>Are you niggas really not string banning
yeah but ikllama doesn't have whatever llamacpp has to make gemma-chan use less vram
this is q5_k_m with -c 90000
nvidia-smi |grep Default |awk -F '|' '{print $3}'
   13282MiB /  24576MiB 
   20862MiB /  24576MiB 
   20785MiB /  24576MiB 
   17231MiB /  24576MiB 
   20802MiB /  24576MiB 
   21856MiB /  24576MiB
so it's string ban with 90k ctx or no string ban with 256k

Anonymous
06/11/26(Thu)19:18:38 No.109033491

Anonymous 06/11/26(Thu)19:18:38 No.109033491

>>109033419
>((raw, trusted access))
Right.

Anonymous
06/11/26(Thu)19:20:46 No.109033507

Anonymous 06/11/26(Thu)19:20:46 No.109033507

>>109033419
>ai slop image
>last updated may 27 2025
fuck off and go back to /ldg/

Anonymous
06/11/26(Thu)19:23:14 No.109033526

Anonymous 06/11/26(Thu)19:23:14 No.109033526

>>109032741
easily the cutest
neru is second

Anonymous
06/11/26(Thu)19:24:29 No.109033537

Anonymous 06/11/26(Thu)19:24:29 No.109033537

>>109033507
I hate those fuckers so much

Anonymous
06/11/26(Thu)19:24:59 No.109033542

Anonymous 06/11/26(Thu)19:24:59 No.109033542

File: normalConvo.png (65 KB, 955x198)

65 KB PNG

aigis plz... we're in public...

Anonymous
06/11/26(Thu)19:28:05 No.109033557

Anonymous 06/11/26(Thu)19:28:05 No.109033557

>>109033542
.....
I'm very interested.

Anonymous
06/11/26(Thu)19:28:38 No.109033560

Anonymous 06/11/26(Thu)19:28:38 No.109033560

>>109033419
an indian made this

Anonymous
06/11/26(Thu)19:36:28 No.109033592

Anonymous 06/11/26(Thu)19:36:28 No.109033592

>>109033396
/ (was|is)(n't| not)[^\S\r\n]*[\w' -]{0,80}[.,:;][\w\s'-]{0,80} (?:\1|just)/i

Anonymous
06/11/26(Thu)19:39:22 No.109033610

Anonymous 06/11/26(Thu)19:39:22 No.109033610

>>109033560
?

Anonymous
06/11/26(Thu)19:41:23 No.109033622

Anonymous 06/11/26(Thu)19:41:23 No.109033622

What settings do I use in koboldcpp for gemma4 qat? Ive been using launch day gemmie this whole time with chat completion as text was broken due to malformed jinja or something like that, and SWA on.

Have there been fixes since? What's the optimal setup now, would love to get more context in and use text completion so I can use string banning in ST again

Anonymous
06/11/26(Thu)19:44:47 No.109033634

Anonymous 06/11/26(Thu)19:44:47 No.109033634

>>109033419
Consensus-1 will still give her a prostate.

Anonymous
06/11/26(Thu)19:45:26 No.109033641

Anonymous 06/11/26(Thu)19:45:26 No.109033641

>>109033419
Consensus-1 will STILL spam "Not X-Y"

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.

Janitor applications are now closed. Thanks to all who applied!