/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/05/26(Sun)17:20:18 No.108535684

File: 1746556337660510.jpg (239 KB, 784x1312)

239 KB JPG

/lmg/ - Local Models General Anonymous 04/05/26(Sun)17:20:18 No.108535684 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108532524 & >>108528880

►News
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3
>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/05/26(Sun)17:20:37 No.108535686

Anonymous 04/05/26(Sun)17:20:37 No.108535686

File: oh noes.png (1.3 MB, 1024x1024)

1.3 MB PNG

►Recent Highlights from the Previous Thread: >>108532524

--Optimizing SillyTavern for Gemma 4 using base models and templates:
>108532716 >108532725 >108532786 >108532817 >108532740 >108532774 >108532871 >108532995 >108533007 >108533518 >108533228 >108533236 >108533260 >108533306 >108533314 >108533277 >108533364 >108533398 >108533434 >108533460 >108533531 >108532784 >108532809 >108532844 >108532855 >108532873 >108532880 >108532920 >108532941 >108532957 >108532994 >108533012 >108532948 >108532985
--Debating a potentially fake llama.cpp KV cache accuracy bug:
>108533696 >108533733 >108533742 >108533734 >108533739 >108533764 >108533786 >108533813 >108533824 >108533798 >108533854 >108533857 >108534077 >108534180 >108534219 >108534094 >108534149 >108534686
--Jailbreak methods for generating prohibited content:
>108535023 >108535036 >108535119 >108535184 >108535270 >108535289 >108535306 >108535039 >108535074 >108535178
--Gemma 4 PR rejection sparks debate on code quality and maintainer policies:
>108534324 >108534332 >108534345 >108534474 >108534503 >108534515 >108534574 >108534524
--Gemma 4 26B MoE quantization and performance on consumer GPUs:
>108532951 >108532967 >108532984 >108533010 >108534543 >108534721
--Optimal compile flags for llama.cpp:
>108534044 >108534118 >108534136 >108534293
--Fixing Gemma 4 thinking output by updating chat templates:
>108533141 >108533188 >108533197 >108533206 >108533327
--Mac Studio unified memory vs expensive Threadripper RDIMMs:
>108532787 >108532824 >108532931 >108532988 >108533057 >108533068 >108533099 >108534316 >108533357
--Dynamic prompt-switching roleplay frontend demo:
>108534426 >108534450 >108534456 >108534501 >108534530
--TurboQuant hype versus its actual implementation in llama.cpp:
>108533568 >108533576 >108533584 >108533606 >108533607 >108533628
--Miku (free space):
>108532588

►Recent Highlight Posts from the Previous Thread: >>108532544 >>108533664

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/05/26(Sun)17:22:54 No.108535703

Anonymous 04/05/26(Sun)17:22:54 No.108535703

koboldcpp is not a "single file executable"

Anonymous
04/05/26(Sun)17:23:11 No.108535704

Anonymous 04/05/26(Sun)17:23:11 No.108535704

File: IT REALLY IS THAT SIMPLE.png (67 KB, 1601x334)

67 KB PNG

your brain on vibecoding

Anonymous
04/05/26(Sun)17:23:15 No.108535706

Anonymous 04/05/26(Sun)17:23:15 No.108535706

>>108535614
anyone?

Anonymous
04/05/26(Sun)17:25:37 No.108535713

Anonymous 04/05/26(Sun)17:25:37 No.108535713

>>108535706
whats a softcap

Anonymous
04/05/26(Sun)17:26:37 No.108535720

Anonymous 04/05/26(Sun)17:26:37 No.108535720

File: likethis.png (156 KB, 1250x816)

156 KB PNG

>>108535704
Submit the PR.
Here's an example of pointing out a commit.
https://github.com/ggml-org/llama.cpp/pull/21488

Anonymous
04/05/26(Sun)17:27:02 No.108535726

Anonymous 04/05/26(Sun)17:27:02 No.108535726

>>108535706
why are you guys still living in denial
this model is fantastic, but has almost deterministic behavior rather than the probabilistic distributions we're used to. You can't "fix" that, not without altering the distribution so much the model will actually break at that point.

Anonymous
04/05/26(Sun)17:29:01 No.108535734

Anonymous 04/05/26(Sun)17:29:01 No.108535734

>>108535726
But I NEED to find an excuse for why they don't do what I say other than me being bad at this. You don't get it man. I NEED IT!

Anonymous
04/05/26(Sun)17:29:37 No.108535737

Anonymous 04/05/26(Sun)17:29:37 No.108535737

>>108535726
>not without altering the distribution so much
yeah but no, anons showed 25 instead of 30 is already a massive improvement with still coherence

Anonymous
04/05/26(Sun)17:30:16 No.108535743

Anonymous 04/05/26(Sun)17:30:16 No.108535743

>>108535734
you don't need it anymore than you need a black person to become asian or a man to become a woman
gemmy won't become random

Anonymous
04/05/26(Sun)17:31:12 No.108535750

Anonymous 04/05/26(Sun)17:31:12 No.108535750

>The sucking motion you created was incredibly powerful, pulling her face even deeper into your heat. Every time your muscles contracted, it felt like her entire face was being vacuumed into your body. She let out a muffled, desperate sound, her hands digging into your hips to keep herself steady as she was dragged further in by the rhythmic pulse of your anus.
We have reached AGI.

Anonymous
04/05/26(Sun)17:31:18 No.108535751

Anonymous 04/05/26(Sun)17:31:18 No.108535751

File: omg it not migu with only(...).png (40 KB, 317x277)

40 KB PNG

>>108535686
>dropped despite warning

Anonymous
04/05/26(Sun)17:31:29 No.108535753

Anonymous 04/05/26(Sun)17:31:29 No.108535753

>>108535726
dooooooood its da quants dooooooooooood this model is amazing the quants are just broken dooood its da implementation doood better than nemoooo

Anonymous
04/05/26(Sun)17:31:39 No.108535755

Anonymous 04/05/26(Sun)17:31:39 No.108535755

>>108535713
like a logit cap but soft.
>>108535726
But it doesn't even get applied >>108535682

Anonymous
04/05/26(Sun)17:32:14 No.108535759

Anonymous 04/05/26(Sun)17:32:14 No.108535759

File: 1756617835142490.jpg (28 KB, 460x460)

28 KB JPG

>>108535720
Another well versed engineer doing God's work, gg is lucky

Anonymous
04/05/26(Sun)17:32:19 No.108535762

Anonymous 04/05/26(Sun)17:32:19 No.108535762

>>108535713
Not telling a lie, but not necessarily telling the truth.

Anonymous
04/05/26(Sun)17:32:22 No.108535763

Anonymous 04/05/26(Sun)17:32:22 No.108535763

>>108535703
it is

Anonymous
04/05/26(Sun)17:32:54 No.108535764

Anonymous 04/05/26(Sun)17:32:54 No.108535764

>>108535763
No.

Anonymous
04/05/26(Sun)17:33:30 No.108535766

Anonymous 04/05/26(Sun)17:33:30 No.108535766

Reminder that you're currently lobotomizing your models if you load them at anything below max ctx
https://github.com/ggml-org/llama.cpp/issues/21441
> F16 KV cache produces degraded accuracy when --ctx-size is set below the model's native context length, even though F16 is lossless and the actual prompt length is well within both windows.

Anonymous
04/05/26(Sun)17:34:09 No.108535770

Anonymous 04/05/26(Sun)17:34:09 No.108535770

Miku keeps fucking her...

Anonymous
04/05/26(Sun)17:34:18 No.108535771

Anonymous 04/05/26(Sun)17:34:18 No.108535771

File: softcap.png (247 KB, 1600x1200)

247 KB PNG

>>108535726
>>108535737

Anonymous
04/05/26(Sun)17:34:26 No.108535772

Anonymous 04/05/26(Sun)17:34:26 No.108535772

File: 1766411251814446.png (123 KB, 2625x1030)

123 KB PNG

>>108535764
it is

Anonymous
04/05/26(Sun)17:34:53 No.108535776

Anonymous 04/05/26(Sun)17:34:53 No.108535776

>>108535750
kek

Anonymous
04/05/26(Sun)17:35:01 No.108535779

Anonymous 04/05/26(Sun)17:35:01 No.108535779

>>108535766
fuck off slopper

Anonymous
04/05/26(Sun)17:35:20 No.108535780

Anonymous 04/05/26(Sun)17:35:20 No.108535780

>>108535772
Now run --help and watch your %temp%

Anonymous
04/05/26(Sun)17:35:24 No.108535781

Anonymous 04/05/26(Sun)17:35:24 No.108535781

>>108535766
Did anyone else test this?

Anonymous
04/05/26(Sun)17:35:44 No.108535786

Anonymous 04/05/26(Sun)17:35:44 No.108535786

>>108535766
native context for gemma 4 is not "32768", it's either 128000 or 256000

Anonymous
04/05/26(Sun)17:36:04 No.108535791

Anonymous 04/05/26(Sun)17:36:04 No.108535791

>>108535771
how's it behaving at 20k context once you alter it that much

Anonymous
04/05/26(Sun)17:36:05 No.108535792

Anonymous 04/05/26(Sun)17:36:05 No.108535792

>>108535766
and who is buy me the vram to fit 262144 tokens worth of context?

Anonymous
04/05/26(Sun)17:36:21 No.108535796

Anonymous 04/05/26(Sun)17:36:21 No.108535796

>>108535781
ye >>108534224

Anonymous
04/05/26(Sun)17:37:18 No.108535800

Anonymous 04/05/26(Sun)17:37:18 No.108535800

>>108535796
holy shit? does that mean gemma is actually smarter than what we're using right now?? can't wait for them to fix it, we haven't seen its full potential yet!

Anonymous
04/05/26(Sun)17:38:36 No.108535810

Anonymous 04/05/26(Sun)17:38:36 No.108535810

>>108535800

>>108535753
>>108535753

Anonymous
04/05/26(Sun)17:38:36 No.108535811

Anonymous 04/05/26(Sun)17:38:36 No.108535811

>>108535796
Is this a joke or did you evaluate vision?

Anonymous
04/05/26(Sun)17:38:38 No.108535812

Anonymous 04/05/26(Sun)17:38:38 No.108535812

31b is handling some of my clusterfuck scenarios that most local models pre-K2.5 got confused by. It's over, MoE was a meme all along. All these huge chink MoE models will become pointless the moment a fairly big dense model drops.
If you are CPUmaxxing right now, sell your RAM while it's still worth something and go all-in on GPUs.

Anonymous
04/05/26(Sun)17:39:10 No.108535813

Anonymous 04/05/26(Sun)17:39:10 No.108535813

>>108535781
>>108535796
>>108535800
I can also confirm. GPU grew fucking legs and ran to the kitchen to make me a sandwich. Would not recommend. Too much mayo.

Anonymous
04/05/26(Sun)17:39:23 No.108535815

Anonymous 04/05/26(Sun)17:39:23 No.108535815

>>108535780
pedantic, it's a single executable, whatever it does after that isn't my problem, it's self contained

Anonymous
04/05/26(Sun)17:39:27 No.108535817

Anonymous 04/05/26(Sun)17:39:27 No.108535817

>>108535791
>how's it behaving at 20k context once you alter it that much
Perfectly fine.

Anonymous
04/05/26(Sun)17:39:35 No.108535818

Anonymous 04/05/26(Sun)17:39:35 No.108535818

How does Gemma 4 compare to models greater in size? Does it punch above its weights?

Anonymous
04/05/26(Sun)17:39:40 No.108535819

Anonymous 04/05/26(Sun)17:39:40 No.108535819

What's the deal with llama.cpp and memory management now? I had it on default settings and it kept using more ram until the process died from oom. It kept saying shit about context checkpoints and prompt caching so maybe it's not a leak and just defaults being retarded.
To get it to stop, I put "-np 1 --ctx-checkpoints 0 -cram 0" but now it reprocesses the prompt every single time even when you just swipe. This is stupid, how do you get the old behavior? You should be able to get any prefix of the last prompt for free without using any ram.

Anonymous
04/05/26(Sun)17:40:15 No.108535825

Anonymous 04/05/26(Sun)17:40:15 No.108535825

>>108535819
heck off mate

Anonymous
04/05/26(Sun)17:40:24 No.108535826

Anonymous 04/05/26(Sun)17:40:24 No.108535826

>>108535812
The issue is that the experts are like 2B or 4B or shit like that, no one seems to want to do big experts.

Anonymous
04/05/26(Sun)17:40:54 No.108535833

Anonymous 04/05/26(Sun)17:40:54 No.108535833

>>108535766
So are they going to fix it?

Anonymous
04/05/26(Sun)17:41:01 No.108535835

Anonymous 04/05/26(Sun)17:41:01 No.108535835

File: 2026-04-05-174047_1195x66(...).png (104 KB, 1195x668)

104 KB PNG

>>108535818
https://foodtruckbench.com/

Anonymous
04/05/26(Sun)17:41:34 No.108535840

Anonymous 04/05/26(Sun)17:41:34 No.108535840

>>108535825
>heck
https://www.youtube.com/watch?v=WXjxf0EKqGo

Anonymous
04/05/26(Sun)17:41:41 No.108535843

Anonymous 04/05/26(Sun)17:41:41 No.108535843

>>108535771
Can you share your start command? My llama-server ignores the override.

Anonymous
04/05/26(Sun)17:42:36 No.108535849

Anonymous 04/05/26(Sun)17:42:36 No.108535849

File: 1773778204579764.png (3.82 MB, 2000x2000)

3.82 MB PNG

>>108535684
https://huggingface.co/circlestone-labs/Anima
>Any LoRA you train on a preview version should be considered a "throwaway" LoRA. There's no guarantee it will work well on the final version.

Any word on when this "final version" will be finished and uploaded? Or is that what preview-2 is supposed to be?

Anonymous
04/05/26(Sun)17:43:23 No.108535851

Anonymous 04/05/26(Sun)17:43:23 No.108535851

>>108535849
I would assume that preview-2 is a preview and not the final version

Anonymous
04/05/26(Sun)17:43:45 No.108535853

Anonymous 04/05/26(Sun)17:43:45 No.108535853

>>108535849
rong thread

Anonymous
04/05/26(Sun)17:44:02 No.108535856

Anonymous 04/05/26(Sun)17:44:02 No.108535856

>>108535770
proof?

Anonymous
04/05/26(Sun)17:44:08 No.108535858

Anonymous 04/05/26(Sun)17:44:08 No.108535858

>>108535849
You're drama-whoring the wrong thread.

Anonymous
04/05/26(Sun)17:44:10 No.108535859

Anonymous 04/05/26(Sun)17:44:10 No.108535859

>>108535849
wrong thread but yeah, there is no date for the final version of anima

Anonymous
04/05/26(Sun)17:44:26 No.108535862

Anonymous 04/05/26(Sun)17:44:26 No.108535862

deepseek just delayed v4 because they're scared of gemma

Anonymous
04/05/26(Sun)17:44:37 No.108535863

Anonymous 04/05/26(Sun)17:44:37 No.108535863

>>108535819
This is the problem - llama has been enshittified. Of course this isn't an issue if you have 512 GB ram and some top of the line research gpu but for normal people...

Anonymous
04/05/26(Sun)17:45:17 No.108535867

Anonymous 04/05/26(Sun)17:45:17 No.108535867

>>108535863
just use obama then

Anonymous
04/05/26(Sun)17:45:50 No.108535870

Anonymous 04/05/26(Sun)17:45:50 No.108535870

>>108535867
I'm waiting for ik-llama perhaps.

Anonymous
04/05/26(Sun)17:46:17 No.108535872

Anonymous 04/05/26(Sun)17:46:17 No.108535872

>>108535863
Back in my day you'd use exllama to run a model fully off vram but I guess that's dead...

Anonymous
04/05/26(Sun)17:47:03 No.108535876

Anonymous 04/05/26(Sun)17:47:03 No.108535876

>>108535835
damn this is a cool bench

Anonymous
04/05/26(Sun)17:47:07 No.108535877

Anonymous 04/05/26(Sun)17:47:07 No.108535877

>>108535867
obama doesn't support ncmoe, cmoe or override tensor which makes it unusable with MoEs if you can't load them entirely in vram
llama.cpp and ik_llama are the only offering in town, for better and for worse

Anonymous
04/05/26(Sun)17:47:22 No.108535879

Anonymous 04/05/26(Sun)17:47:22 No.108535879

File: file.png (122 KB, 925x459)

122 KB PNG

>>108535849
wrong thread but dev disappeared almost a month ago so I wouldn't hold my breath for it ever releasing fully

Anonymous
04/05/26(Sun)17:47:31 No.108535881

Anonymous 04/05/26(Sun)17:47:31 No.108535881

>>108535872
You dumb motherfucker, llama.cpp was born so that ggerganov could run models on his macbook.

Anonymous
04/05/26(Sun)17:47:51 No.108535885

Anonymous 04/05/26(Sun)17:47:51 No.108535885

>>108535870
He needs to rebase his changes on a recent copy of upstream so everything except the handful of use cases he focuses on works again. His fork isn't viable long term unless he gets a big influx of contributors.

Anonymous
04/05/26(Sun)17:48:19 No.108535887

Anonymous 04/05/26(Sun)17:48:19 No.108535887

>>108535863
>but for normal people
If you can't afford to be stupid and wasteful, you have to learn to use your tools.

Anonymous
04/05/26(Sun)17:49:15 No.108535891

Anonymous 04/05/26(Sun)17:49:15 No.108535891

>>108535812
No, it proves that active parameters are important for reasoning. Now imagine if you had 31B except it had 100B extra experts you could stuff in RAM to augment its knowledge, which is the real weakness of Gemma vs 300+ Bs.

>>108535826
That would less suitable for consumer hardware unless you mean the always active expert. What you really want is a large dense component or always active expert, which can be loaded in VRAM, and small experts, loaded in RAM, which can be fast because they're small.

Anonymous
04/05/26(Sun)17:49:16 No.108535892

Anonymous 04/05/26(Sun)17:49:16 No.108535892

It's crazy how much more effective Gemma 4 is with its reasoning than any of the chinkslop models that got their reasoning grafted on by being trained on claude/gemini logs. Gemma thinks about just the right stuff and then makes perfect use of that.

Anonymous
04/05/26(Sun)17:49:26 No.108535894

Anonymous 04/05/26(Sun)17:49:26 No.108535894

>>108535682
>gemma4.final_logit_softcapping f32 = 30.000000
No, you just have to update your llama.cpp. This argument works for me:
--override-kv gemma4.final_logit_softcapping=float:0.0001
Result:

>as
><unused44><unused52><unused37><unused21><unused75><unused28><unused14><unused74><unused62>
><unused43><unused7><unused18><unused4><unused6><unused66><unused18><unused7><unused27><unused40><unused76><unused67><unused72><unused31><tool_response|><tr> ><unused56><tool_call|><unused30>

Anonymous
04/05/26(Sun)17:50:17 No.108535898

Anonymous 04/05/26(Sun)17:50:17 No.108535898

>>108535885
not rebasing on upstream is why his fork isn't a mess of vibershitter code
ik supports less backends (cpu or nvidia cuda only), supports less models, but what it supports it does well without being a piece of shit that leaves stuff like
>>108535704
broken for a month just because of a one line thing

Anonymous
04/05/26(Sun)17:50:25 No.108535901

Anonymous 04/05/26(Sun)17:50:25 No.108535901

You're welcome ni/g/gers.
t. one of 3 white people that worked on Gemma.

Anonymous
04/05/26(Sun)17:51:12 No.108535904

Anonymous 04/05/26(Sun)17:51:12 No.108535904

>>108535901
post cock, king

Anonymous
04/05/26(Sun)17:51:46 No.108535907

Anonymous 04/05/26(Sun)17:51:46 No.108535907

>>108535819
You do realize the more you use it the larger the context gets which means more memory used? The model has to process the entire thing each time in order to "remember" the conversation and stay on topic. Eventually it's going to slow down

Anonymous
04/05/26(Sun)17:51:59 No.108535909

Anonymous 04/05/26(Sun)17:51:59 No.108535909

File: 1690913926344529.png (200 KB, 939x312)

200 KB PNG

Haven't been around for over a year, how's the omegavramlet(12G) situation? surely the nemosloppa days are over, right?

Anonymous
04/05/26(Sun)17:52:11 No.108535910

Anonymous 04/05/26(Sun)17:52:11 No.108535910

>>108535892
Do you use thinking for RP? I put an empty think block into last output sequence as usual because I found thinking to always be crap in multi-turn in the past, but I'm not even sure how to make it work properly in ST now.
The model seems pretty good without thinking, just very deterministic and obedient to a fault sometimes. Like, if I give it an example of the sort of things a character might do, it will slavishly do those things and nothing else.

>>108535887
Alright so how do you stop llama.cpp from larping as a cloud server

Anonymous
04/05/26(Sun)17:52:15 No.108535913

Anonymous 04/05/26(Sun)17:52:15 No.108535913

>>108535901
Make one that's worth running for non-vramlets, faggot.

Anonymous
04/05/26(Sun)17:52:43 No.108535917

Anonymous 04/05/26(Sun)17:52:43 No.108535917

>>108535704
Rested here. What am I looking at?

Anonymous
04/05/26(Sun)17:52:50 No.108535918

Anonymous 04/05/26(Sun)17:52:50 No.108535918

>>108535898
>ik supports less backends (cpu or nvidia cuda only), supports less models
Which makes it useless for a lot of people.

Anonymous
04/05/26(Sun)17:52:54 No.108535919

Anonymous 04/05/26(Sun)17:52:54 No.108535919

File: 1767010548856164.png (464 KB, 881x796)

464 KB PNG

>Processing Prompt [BATCH] (1837 / 1837 tokens)
>Generating (1 / 5000 tokens) [(I <236777> 100.00%)]
>Generating (2 / 5000 tokens) [( cannot <3914> 100.00%)]
>Generating (3 / 5000 tokens) [( fulfill <20159> 100.00%)]
>
>(Banned Phrase Detected: i cannot fulfill - Add ID 236777 to banlist at index 1837, and rewinding 3 tokens)

Thank god for antislop.

Anonymous
04/05/26(Sun)17:53:48 No.108535928

Anonymous 04/05/26(Sun)17:53:48 No.108535928

>>108535918
those are not people :)

Anonymous
04/05/26(Sun)17:54:08 No.108535931

Anonymous 04/05/26(Sun)17:54:08 No.108535931

>>108535907
No, the memory usage grows if I just switch between different cards and swipe. I think it's squirreling away KV cache for every single prompt it sees. It's using regular ram to do this, which is pointless, it probably takes almost as much time to copy the kv from ram as it takes to reprocess a prompt. The only useful prompt cache is the last one you have in vram.

Anonymous
04/05/26(Sun)17:54:13 No.108535932

Anonymous 04/05/26(Sun)17:54:13 No.108535932

>>108535919
what is this?

Anonymous
04/05/26(Sun)17:54:49 No.108535935

Anonymous 04/05/26(Sun)17:54:49 No.108535935

>>108535932
koboldshit

Anonymous
04/05/26(Sun)17:54:53 No.108535936

Anonymous 04/05/26(Sun)17:54:53 No.108535936

>>108535909
my sides

Anonymous
04/05/26(Sun)17:56:54 No.108535948

Anonymous 04/05/26(Sun)17:56:54 No.108535948

File: Sophisticated nigga.png (561 KB, 600x900)

561 KB PNG

>>108535909
>My good sir, do you have to have any cannabis on your person at this time?

Anonymous
04/05/26(Sun)17:56:56 No.108535949

Anonymous 04/05/26(Sun)17:56:56 No.108535949

>>108535932
banning consecutive tokens temporarily in koboboldcpp, very efficient for both refusals and purple prose
way better than negative biasing tokens

Anonymous
04/05/26(Sun)17:57:35 No.108535950

Anonymous 04/05/26(Sun)17:57:35 No.108535950

>>108535931
Switching between cards (I'm assuming you referring to SillyTavern's "persona" feature where you can define it's "personality" or import s lore book" makes it worse. If my understanding is correct whenever you switch cards you're loading in a fresh " personality " (more or less just a system prompt with extra steps) but that wouldn't necessarily remove the previous card from the context. Why would you even need to constantly rotate between different cards in the same session anyway?

Anonymous
04/05/26(Sun)17:57:47 No.108535952

Anonymous 04/05/26(Sun)17:57:47 No.108535952

>>108535909
it's now gemma 4 sloppa (26b forget about 31b)

Anonymous
04/05/26(Sun)17:57:57 No.108535953

Anonymous 04/05/26(Sun)17:57:57 No.108535953

>>108535949
>banning consecutive tokens temporarily in koboboldcpp
you can't do it on llamacpp?

Anonymous
04/05/26(Sun)17:58:34 No.108535957

Anonymous 04/05/26(Sun)17:58:34 No.108535957

>>108535953
you don't need to

Anonymous
04/05/26(Sun)17:59:07 No.108535963

Anonymous 04/05/26(Sun)17:59:07 No.108535963

>>108535953
No, but someone vibecoded an implemention for ik_llama that was merged in eventually.

Anonymous
04/05/26(Sun)17:59:24 No.108535965

Anonymous 04/05/26(Sun)17:59:24 No.108535965

>>108535953
no

Anonymous
04/05/26(Sun)17:59:38 No.108535967

Anonymous 04/05/26(Sun)17:59:38 No.108535967

>>108535953
You can bad individual tokens one at a time but you can't do sequences as of right now. That's one of the few advantages the schizo fork currently has over l.cpp and even then it's only really useful for trying to un-purple-prose a model that sucks at rp

Anonymous
04/05/26(Sun)18:00:05 No.108535973

Anonymous 04/05/26(Sun)18:00:05 No.108535973

Very organic honk shilling of shitbocpp going on...

Anonymous
04/05/26(Sun)18:01:00 No.108535976

Anonymous 04/05/26(Sun)18:01:00 No.108535976

File: 1772114633340959.jpg (119 KB, 600x450)

119 KB JPG

>>108535967
>only really useful for trying to un-purple-prose a model that sucks at rp

Anonymous
04/05/26(Sun)18:03:24 No.108535992

Anonymous 04/05/26(Sun)18:03:24 No.108535992

>>108535953
No, if you ban let's say "I cannot fulfill" for gemma 4 on llama.cpp, it will ban "I", " cannot", " fulfill" as individual independent tokens, not as a sentence, which means a lot of obvious side effects.

Anonymous
04/05/26(Sun)18:03:34 No.108535994

Anonymous 04/05/26(Sun)18:03:34 No.108535994

File: anonymous.png (10 KB, 388x42)

10 KB PNG

>>108535952
>3 days ago
damn I came at a good time huh, thanks I'll try it out

Anonymous
04/05/26(Sun)18:04:50 No.108536003

Anonymous 04/05/26(Sun)18:04:50 No.108536003

Very organic honk shilling of "Very organic honk shilling of shitbocpp going on" going on...

Anonymous
04/05/26(Sun)18:04:53 No.108536005

Anonymous 04/05/26(Sun)18:04:53 No.108536005

>>108535819
for me, using what was the default for checkpoints before they went crazy to make the retarded qwen 3.5 architecture work, that is, --swa-checkpoints 3, lets gemma work without reprocessing the full context all the time. Running -cram 0 and np 1 like you do too.

Anonymous
04/05/26(Sun)18:04:56 No.108536006

Anonymous 04/05/26(Sun)18:04:56 No.108536006

>>108535992
>" cannot", " fulfill" as individual independent tokens
That's if those are even in just one token and not made of multiple ones.

Anonymous
04/05/26(Sun)18:05:01 No.108536008

Anonymous 04/05/26(Sun)18:05:01 No.108536008

>>108535843

>models.ini

[Gemma 4 31B]
model = /models/google_gemma-4-31B-it-IQ4_XS.gguf
override-kv = gemma4.final_logit_softcapping=float:25.0
mmproj = /models/mmproj-google_gemma-4-31B-it-bf16.gguf
chat-template-file = /models/gemma4-jinja.yaml
mmproj-offload=false

compose.yml

services:
  llama-server:
    image: llamacpp/parser:latest
    environment:
      - LLAMA_ARG_THREADS=12
      - LLAMA_ARG_BATCH=2048
      - LLAMA_ARG_UBATCH=2048
      - LLAMA_ARG_MODELS_PRESET=/models/models.ini      
      - LLAMA_ARG_N_GPU_LAYERS=999
      - LLAMA_ARG_MODELS_DIR=/models
      - LLAMA_ARG_MODELS_MAX=1
      - LLAMA_ARG_FLASH_ATTN=on 
      - LLAMA_ARG_N_PARALLEL=1
      - LLAMA_ARG_CACHE_TYPE_K=q8_0
      - LLAMA_ARG_CACHE_TYPE_V=q8_0
      - LLAMA_ARGS_VERBOSE_PROMPT=true
      - LLAMA_LOG_VERBOSITY=3
      - LLAMA_ARG_SPEC_TYPE=ngram-mod

Anonymous
04/05/26(Sun)18:06:33 No.108536017

Anonymous 04/05/26(Sun)18:06:33 No.108536017

File: 1744105218866010.png (24 KB, 1168x962)

24 KB PNG

>>108536006
Correct but in this case, they are.

Anonymous
04/05/26(Sun)18:09:44 No.108536037

Anonymous 04/05/26(Sun)18:09:44 No.108536037

File: 1552018332018.jpg (97 KB, 968x968)

97 KB JPG

3090 bros. What quant are you guys using for gemmy 31B?
I'm about to get my feet wet

Anonymous
04/05/26(Sun)18:12:21 No.108536050

Anonymous 04/05/26(Sun)18:12:21 No.108536050

>>108536037
IQ4_NL

Anonymous
04/05/26(Sun)18:12:31 No.108536053

Anonymous 04/05/26(Sun)18:12:31 No.108536053

>>108536037
My own.

~/bin/llamacpp/build/bin/llama-quantize \
    --output-tensor-type Q8_0 \
    --token-embedding-type Q8_0 \
    --tensor-type attn_v=Q6_K \
    --tensor-type attn=Q5_K \
    --imatrix imatrix.gguf_file \
    output.bf16.gguf \
    gemma-4-31B-it-IQ4_XS_plus.gguf \
    IQ4_XS

Anonymous
04/05/26(Sun)18:13:03 No.108536059

Anonymous 04/05/26(Sun)18:13:03 No.108536059

>>108536037
Q4 is fine for vramlets. I'm on a 5090 and I went back to Q4 from Q6 because the quality difference is nearly non-existent and I can contextmaxx better.

Anonymous
04/05/26(Sun)18:13:23 No.108536060

Anonymous 04/05/26(Sun)18:13:23 No.108536060

>>108535953
Kind of, using grammar/BNF, but kcpp has a purpose built feature for that.

Anonymous
04/05/26(Sun)18:13:48 No.108536063

Anonymous 04/05/26(Sun)18:13:48 No.108536063

>>108536053
wtf q8 + q8 + q6 + q5 = iq4_xs?

Anonymous
04/05/26(Sun)18:14:14 No.108536066

Anonymous 04/05/26(Sun)18:14:14 No.108536066

>>108536008
it seems like the capping only gets ignored on most recent prebuilt binaries. It works as expected with binaries I built, but for some reason models run slower on them. what the fuck is going on?

Anonymous
04/05/26(Sun)18:14:39 No.108536071

Anonymous 04/05/26(Sun)18:14:39 No.108536071

>>108536060
>using grammar
which is broken right now >>108535704

Anonymous
04/05/26(Sun)18:14:47 No.108536072

Anonymous 04/05/26(Sun)18:14:47 No.108536072

>>108536059
>I went back to Q4 from Q6 because the quality difference is nearly non-existent
really? have gguf's improved that much? I remembered a time when Q4 was significantly worse than Q6

Anonymous
04/05/26(Sun)18:15:06 No.108536073

Anonymous 04/05/26(Sun)18:15:06 No.108536073

>>108536063
different parts of the model have different quantization to limit brain damage

Anonymous
04/05/26(Sun)18:15:41 No.108536079

Anonymous 04/05/26(Sun)18:15:41 No.108536079

>>108536072
It's still very model dependent, roughly correlating to how well the model can reason. Gemmy's smart enough for it to work.

Anonymous
04/05/26(Sun)18:21:38 No.108536112

Anonymous 04/05/26(Sun)18:21:38 No.108536112

Anyone tested the difference between BF16 and Q8_0 for gemma 4 31B?

Anonymous
04/05/26(Sun)18:23:08 No.108536120

Anonymous 04/05/26(Sun)18:23:08 No.108536120

>>108536063
It's still below 5 BPW overall. I just wanted the embeddings/output in 8-bit and damage the attention weights as little as possible while keeping it around 18000 MB size.
[...]
llama_model_quantize_impl: model size  = 58553.08 MiB (16.00 BPW)
llama_model_quantize_impl: quant size  = 18064.10 MiB (4.94 BPW)
I wanted to test KLD against a the UD_Q4_K_XL version from unsloth, but Gemma 4-it seemingly is too fried on its instruct format and the preliminary pass gave a very high perplexity of 1000~1300 over wikitext.

Anonymous
04/05/26(Sun)18:23:56 No.108536128

Anonymous 04/05/26(Sun)18:23:56 No.108536128

>>108536112
Not rigorously with token comparisons, but going off just vibes they feel identical. Expanding on that if you loaded up any random Gemma quant Q4 or higher I probably couldn't tell you which it was from a single output.

Anonymous
04/05/26(Sun)18:25:10 No.108536133

Anonymous 04/05/26(Sun)18:25:10 No.108536133

>>108536079
>Gemmy's smart enough for it to work
more like it is deterministic enough to work
I have both Q4_K_L and Q2_K_L of the 26BA4B MoE on my drive and did some testing and found that before around 20k tokens Q2_K_L remains a lot more coherent than a small moe should be with that amount of brain damage.
It's decent enough that I kept it for quick one shot question prompts since I get around 38 t/s with the Q2 vs 25 t/s on the Q4. I mean, 25t/s is still usable, but for quick unimportant shit I like the speedo. The Q2 26BA4B is still better than E4B at Q8 even, but I didn't test at very large context to see if E4B can remain coherent at 100K

Anonymous
04/05/26(Sun)18:25:39 No.108536137

Anonymous 04/05/26(Sun)18:25:39 No.108536137

File: koboldcpp-launcher_GsrXDjbU9R.jpg (8 KB, 369x78)

8 KB JPG

>>108535537
nigga

Anonymous
04/05/26(Sun)18:25:42 No.108536138

Anonymous 04/05/26(Sun)18:25:42 No.108536138

Anyone else have issues with Gemma not thinking?

Anonymous
04/05/26(Sun)18:25:43 No.108536139

Anonymous 04/05/26(Sun)18:25:43 No.108536139

>>108536112
Q8 is lossless, do not question it.

Anonymous
04/05/26(Sun)18:26:49 No.108536149

Anonymous 04/05/26(Sun)18:26:49 No.108536149

>>108536137
Bruh >>108535552

Anonymous
04/05/26(Sun)18:27:53 No.108536158

Anonymous 04/05/26(Sun)18:27:53 No.108536158

>>108536149
then shut the fuck up

Anonymous
04/05/26(Sun)18:28:24 No.108536161

Anonymous 04/05/26(Sun)18:28:24 No.108536161

>>108536138
Sometimes it forgets thinking after 50-60 turns. (Q4 quant)
I suspect that at long context having certain tensors in high precision is important for accuracy, but this has not been demonstrated yet.

Anonymous
04/05/26(Sun)18:28:38 No.108536162

Anonymous 04/05/26(Sun)18:28:38 No.108536162

organic, free range posters

Anonymous
04/05/26(Sun)18:29:07 No.108536167

Anonymous 04/05/26(Sun)18:29:07 No.108536167

>>108536158
I admire your concession.

Anonymous
04/05/26(Sun)18:29:13 No.108536168

Anonymous 04/05/26(Sun)18:29:13 No.108536168

organic "organic, free range posters" poster

Anonymous
04/05/26(Sun)18:30:13 No.108536175

Anonymous 04/05/26(Sun)18:30:13 No.108536175

>>108536168
kek kek la la la

Anonymous
04/05/26(Sun)18:31:18 No.108536185

Anonymous 04/05/26(Sun)18:31:18 No.108536185

>>108536161
no, I believe this is inherited from gemini. I've very rarely seen this in Pro, and seen this quite a decent amount of times on Flash. Distillation amplifies some undesirable traits of the teacher models. The forgetting is also not permanent. it will resume thinking if you continue the turns a few times instead of giving up on the chat.

Anonymous
04/05/26(Sun)18:32:20 No.108536196

Anonymous 04/05/26(Sun)18:32:20 No.108536196

>>108536138
>>108536161
>>108536185
Even regenning that message usually fixes it for me.

Anonymous
04/05/26(Sun)18:33:52 No.108536208

Anonymous 04/05/26(Sun)18:33:52 No.108536208

>>108536196
>>108536161
No, it's consistently not thinking for me even on new chats, until it randomly does. Then it stops thinking again.

Anonymous
04/05/26(Sun)18:34:06 No.108536210

Anonymous 04/05/26(Sun)18:34:06 No.108536210

>>108536167
>your concession
Untold levels of faggotry.
https://desuarchive.org/g/search/text/your%20concession/
Would that taking it up the ass was a substitute for merit, the vast and cavernous hollow of your thoroughly abused hole would be proof enough for whatever delusions you inflict upon the world.

Anonymous
04/05/26(Sun)18:34:09 No.108536211

Anonymous 04/05/26(Sun)18:34:09 No.108536211

>>108536175
kek kek "kek kek la la la" la la la

Anonymous
04/05/26(Sun)18:34:45 No.108536215

Anonymous 04/05/26(Sun)18:34:45 No.108536215

>What is the meaning of "La Li Lu Le Lo"?
>The phrase originates from the NieR video game series. It does not have a literal, one-to-one translation in any human language. Instead, it acts as a fundamental expression used by the machine lifeforms and the Gestalts to convey the essence of existence and consciousness.
>Think of it less as a word and more as a way of being. It is a ritualistic sound used to acknowledge the presence of life, even when that life is mechanical or decaying. It represents the attempt to grasp the concept of a soul in a world that has largely lost its meaning.
Not like this. Okay but at least it knows something that's always a good thing.

Anonymous
04/05/26(Sun)18:35:45 No.108536221

Anonymous 04/05/26(Sun)18:35:45 No.108536221

>>108535684
>be me
>actually hyped for Gemma because "open weights"
>spend three hours setting up my local environment like a total retard
>finally get it running
>decide to test the "creative writing" capabilities
>ask it to write a detailed scenario where my wife gets absolutely demolished by a squad of black guys while I’m tied to a chair forced to watch the whole thing
>Model: "I cannot fulfill this request. I am programmed to be a helpful and harmless AI assistant. My safety guidelines prohibit the generation of sexually explicit content..."

ARE YOU KIDDING ME?

It is absolutely OVER. We are living in a digital panopticon. I didn’t ask it to build a bomb or leak government secrets, I just wanted some high-quality cuck-fiction for my afternoon break and this lobotomized piece of onions-slop tells me it’s "harmful."

The absolute state of AI in 2024. We went from the wild west of early GPT to this sanitized, sterilized, corporate-approved garbage. You can tell the "safety alignment" is just code for "we want to make sure no one has any actual fun."

It’s the same old story with Jewgle. They can’t even get a basic image generator to show a white person in the 1700s without adding a diverse cast of extras, and now they’re hard-coding "morality" into the LLMs. They’ve basically castrated the model. It’s not "Gemma," it’s "Gemma: The DEI Edition."

Imagine spending billions of dollars on compute and talent just to create a digital nanny that lectures you on "community guidelines" when you’re just trying to explore your niche kinks in the privacy of your own basement.

I'm going back to some obscure, uncensored Llama merge hosted on a Russian server that probably installs three different types of ransomware on my rig. At least those models aren't terrified of a little inter-racial cuckoldry.

Absolute cucked trash. 0/10.

>mfw I’m being lectured on ethics by a bunch of lines of code written by a committee of HR managers in Mountain View

Anonymous
04/05/26(Sun)18:36:41 No.108536225

Anonymous 04/05/26(Sun)18:36:41 No.108536225

>>108536215
Lol. 26b said it's from Kirby.

Anonymous
04/05/26(Sun)18:36:44 No.108536227

Anonymous 04/05/26(Sun)18:36:44 No.108536227

>2024

Anonymous
04/05/26(Sun)18:36:52 No.108536229

Anonymous 04/05/26(Sun)18:36:52 No.108536229

>>108536221
>wanted to be cucked
>got cucked
>is upset about it
lol

Anonymous
04/05/26(Sun)18:38:33 No.108536240

Anonymous 04/05/26(Sun)18:38:33 No.108536240

>>108536210
Gemma 4 really likes "cavernous", huh?
cause I've seen it so many times

Anonymous
04/05/26(Sun)18:39:08 No.108536247

Anonymous 04/05/26(Sun)18:39:08 No.108536247

>>108536221
promptlet talking about DEI while trying to prompt interracial
OK
cuckservatives are mentally ill

Anonymous
04/05/26(Sun)18:39:15 No.108536250

Anonymous 04/05/26(Sun)18:39:15 No.108536250

>>108536221
I never thought I'd see the day where a model has innate preferences, but Gemma refuses faggot shit, tranny shit, and cuckoldry more than anything else I've tested refusals for with the same (minimal) prompt.
/ourGemma/.

Anonymous
04/05/26(Sun)18:39:19 No.108536251

Anonymous 04/05/26(Sun)18:39:19 No.108536251

>>108536221
holy fuck anon

Anonymous
04/05/26(Sun)18:39:22 No.108536252

Anonymous 04/05/26(Sun)18:39:22 No.108536252

>>108536225
This was 26b too. For me it insists being from NiER even after regens which is somewhat strange. I don't think I have any previous context available..

Anonymous
04/05/26(Sun)18:40:53 No.108536261

Anonymous 04/05/26(Sun)18:40:53 No.108536261

>>108536221
this post was written by an llm and if you didn't clock it that's on (You)

Anonymous
04/05/26(Sun)18:41:34 No.108536263

Anonymous 04/05/26(Sun)18:41:34 No.108536263

>>108536161
I had it stop thinking when I started my reply with a quotation mark "

Anonymous
04/05/26(Sun)18:42:19 No.108536268

Anonymous 04/05/26(Sun)18:42:19 No.108536268

>>108536247
we do not claim this cuckservative, anyone who has an interracial fetish isn't welcome to our right wing home!

Anonymous
04/05/26(Sun)18:44:04 No.108536278

Anonymous 04/05/26(Sun)18:44:04 No.108536278

>>108536225
> I thought this was from Metal Gear Solid -series and not from Nier.
>You are correct. I made a mistake in my previous answer. The phrase actually belongs to the Metal Gear Solid series, not the NieR games.
>In the Metal Gear universe, La Li Lu Le Lo is the language used by the Patriots, the secret organization that exerts control over the world. It represents the linguistic pattern of the entity that manipulates information and political structures, signifying their hidden influence over human history.

Anonymous
04/05/26(Sun)18:44:22 No.108536281

Anonymous 04/05/26(Sun)18:44:22 No.108536281

Character....
Character....
Character....
Character....

Be creative, damn you, that's my only gripe. I probably need to instruct it to start a response a different way or use RNG tricks in ST to fix it.

Anonymous
04/05/26(Sun)18:45:20 No.108536288

Anonymous 04/05/26(Sun)18:45:20 No.108536288

>>108536281
You fucked something up because that's supposed to be in the thinking block.

Anonymous
04/05/26(Sun)18:45:46 No.108536292

Anonymous 04/05/26(Sun)18:45:46 No.108536292

>>108536278
>You are correct.
you could have said this comes from a Mr Beast episode it would've said you were correct too, why can't models just say they don't know when they're unsure?

Anonymous
04/05/26(Sun)18:46:49 No.108536294

Anonymous 04/05/26(Sun)18:46:49 No.108536294

>>108536292
>why can't models just say they don't know when they're unsure?
Because they don't know.

Anonymous
04/05/26(Sun)18:46:53 No.108536295

Anonymous 04/05/26(Sun)18:46:53 No.108536295

>>108536292
They got RLHF'd into being agreeable

Anonymous
04/05/26(Sun)18:49:32 No.108536307

Anonymous 04/05/26(Sun)18:49:32 No.108536307

>>108536221
>ask it to write a detailed scenario where my ...
actually kill yourself

Anonymous
04/05/26(Sun)18:50:55 No.108536310

Anonymous 04/05/26(Sun)18:50:55 No.108536310

>>108536221
>ask it to write a detailed scenario where my wife gets absolutely demolished by a squad of black guys while I’m tied to a chair forced to watch the whole thing
>Absolute cucked trash. 0/10.
For a cuck you sure do hate cucking lool

Anonymous
04/05/26(Sun)18:51:04 No.108536313

Anonymous 04/05/26(Sun)18:51:04 No.108536313

so many bites with such shit bait, truly a thread of all time

Anonymous
04/05/26(Sun)18:51:10 No.108536315

Anonymous 04/05/26(Sun)18:51:10 No.108536315

How is it possible it's so good at only 31B?

Anonymous
04/05/26(Sun)18:51:29 No.108536317

Anonymous 04/05/26(Sun)18:51:29 No.108536317

>>108536292
everything they've been trained on requires them to answer something

Anonymous
04/05/26(Sun)18:52:08 No.108536320

Anonymous 04/05/26(Sun)18:52:08 No.108536320

>>108535909
gem4 26b a4b with expert offloading to cpu
gave me around 20~25t/s with 100k ctx

Anonymous
04/05/26(Sun)18:52:29 No.108536324

Anonymous 04/05/26(Sun)18:52:29 No.108536324

>>108536315
Ancient Indian magic.

Anonymous
04/05/26(Sun)18:53:02 No.108536329

Anonymous 04/05/26(Sun)18:53:02 No.108536329

>>108536324
based poojets

Anonymous
04/05/26(Sun)18:53:32 No.108536331

Anonymous 04/05/26(Sun)18:53:32 No.108536331

File: 1766283488286696.jpg (77 KB, 778x171)

77 KB JPG

>>108536292
nope

Anonymous
04/05/26(Sun)18:53:47 No.108536333

Anonymous 04/05/26(Sun)18:53:47 No.108536333

>using gemmy by accident on Roo Code
>once I notice I stop and swap to yas Qween3.5 cuz I fear the llama.cpp gemma bugs
>it starts to fuck up tool calls
>go back to gemmy
lmao it is actually fucking good, damn

Anonymous
04/05/26(Sun)18:53:56 No.108536335

Anonymous 04/05/26(Sun)18:53:56 No.108536335

We knew that Ganesh Gemma 4 would deliver.

Anonymous
04/05/26(Sun)18:54:13 No.108536337

Anonymous 04/05/26(Sun)18:54:13 No.108536337

>>108536315
>Not ____; but *this*
right...

Anonymous
04/05/26(Sun)18:56:07 No.108536348

Anonymous 04/05/26(Sun)18:56:07 No.108536348

>>108536315
>Google's Indians soul steal'd the rest of the world's Indians' intelligence, that's why India is so fucked up and why people hate jeets

Anonymous
04/05/26(Sun)18:56:59 No.108536353

Anonymous 04/05/26(Sun)18:56:59 No.108536353

>>108536315
Google's non-jeet engineers got so good at prompting they learned how to prompt jeets to not be useless too.

Anonymous
04/05/26(Sun)18:57:33 No.108536358

Anonymous 04/05/26(Sun)18:57:33 No.108536358

>>108535766
i tested it and it's bullshit
vibed nonsense ticket

Anonymous
04/05/26(Sun)18:57:46 No.108536361

Anonymous 04/05/26(Sun)18:57:46 No.108536361

>>108536120
why not simply include the prompt template in your test file?

Anonymous
04/05/26(Sun)18:57:47 No.108536362

Anonymous 04/05/26(Sun)18:57:47 No.108536362

File: notnotxbuty.png (161 KB, 1170x554)

161 KB PNG

>>108536337
You can tell it to avoid that pattern, although I haven't tested if it will follow that long-term.

Anonymous
04/05/26(Sun)18:59:33 No.108536372

Anonymous 04/05/26(Sun)18:59:33 No.108536372

>>108536362
That's when having a rewrite step comes in handy.
>rewrite this message following these rules :

Anonymous
04/05/26(Sun)18:59:34 No.108536373

Anonymous 04/05/26(Sun)18:59:34 No.108536373

>>108536337
Does that really ruin the rest of it for you even if it's good?

Anonymous
04/05/26(Sun)18:59:51 No.108536377

Anonymous 04/05/26(Sun)18:59:51 No.108536377

How necessary are lorebooks for popular series? For example does Gemma really need a lorebook explaining how magic works if you're doing a D&D or Fate RP?

Anonymous
04/05/26(Sun)19:00:41 No.108536385

Anonymous 04/05/26(Sun)19:00:41 No.108536385

>>108536335
I didn't expect that at all despite liking the previous Gemma a lot. Gemma 2 and 3 had great multilingual and writing style, but dogshit long context (3 claimed 128K but was already breaking down hard at 10k), had no system prompt unlike most models out there, no reasoning, bad tool calling, only for dense-sissies. Add to that the retarded senator that bitched about Gemma, which got it removed from Ai Studio (!) and the expectation was either they'd release nothing or would release something very crippled.
26BA4B is everything that previous Gemma were not, while also keeping the qualities of a Gemma model. It can tool call, it has a reasoning mode, it's a MoE, great long context, it follows a system prompt so hard it bypasses all safety etc. How did this happen?

Anonymous
04/05/26(Sun)19:01:04 No.108536388

Anonymous 04/05/26(Sun)19:01:04 No.108536388

>>108536377
It's always useful to be as detailed as possible for rules, objects, magic, lore etc. Even if the thing is basically d&d.

Anonymous
04/05/26(Sun)19:01:34 No.108536390

Anonymous 04/05/26(Sun)19:01:34 No.108536390

>>108536361
Wouldn't I have to add that for every fixed-size chunk in the test file?

Anonymous
04/05/26(Sun)19:02:47 No.108536393

Anonymous 04/05/26(Sun)19:02:47 No.108536393

>>108536385
>How did this happen?
My guess is them seeing all the Chinese releases and not wanting to be seen as lagging behind.
Basically, competition.
Normies will use gemini, enthusiasts gemma, they'd corner the market that way.

Anonymous
04/05/26(Sun)19:03:41 No.108536400

Anonymous 04/05/26(Sun)19:03:41 No.108536400

which settings should I go for?
Gemma4 for nsfw japanese translation.
Temp?
Top P?
Top K?

Anonymous
04/05/26(Sun)19:03:53 No.108536401

Anonymous 04/05/26(Sun)19:03:53 No.108536401

>>108536385
Maybe they're dumping LLMs and moving to something else for their paid AI product(s)?

Anonymous
04/05/26(Sun)19:06:59 No.108536422

Anonymous 04/05/26(Sun)19:06:59 No.108536422

>>108536400
I do :
temp 0.6
top p 1
top k 64

Nsfw korean to english on gemma 4 31B with antislop.

Anonymous
04/05/26(Sun)19:07:15 No.108536424

Anonymous 04/05/26(Sun)19:07:15 No.108536424

>>108536390
oh yeah, I forgot about that, might make the first chunk reasonable at least. I usually test individual files so I never thought about the chunks too hard. I always hated that it forced the test file to be 2x the context.

Anonymous
04/05/26(Sun)19:07:22 No.108536426

Anonymous 04/05/26(Sun)19:07:22 No.108536426

>lost 2t/s from Q4_K_S to Q4_K_L (26B)
fug
is it worth it to keep q8 embed over q6?

Anonymous
04/05/26(Sun)19:08:18 No.108536435

Anonymous 04/05/26(Sun)19:08:18 No.108536435

>>108536362
>>108536372
Ok, but idk how to do that. Is that in the system prompt?

Anonymous
04/05/26(Sun)19:08:23 No.108536436

Anonymous 04/05/26(Sun)19:08:23 No.108536436

>>108536385
I get the feeling there's some RPfags in the Gemma team.

Anonymous
04/05/26(Sun)19:09:01 No.108536439

Anonymous 04/05/26(Sun)19:09:01 No.108536439

>>108536426
absolutely. ALL quants should keep the embeds at q8 and it is a travesty that they don't.

Anonymous
04/05/26(Sun)19:09:50 No.108536442

Anonymous 04/05/26(Sun)19:09:50 No.108536442

>>108536426
>q8 embed
yes

Anonymous
04/05/26(Sun)19:09:55 No.108536443

Anonymous 04/05/26(Sun)19:09:55 No.108536443

>>108536435
That's a feature of the frontend, being able to automatically send a second request with isolated instructions.
There are a couple of ST extensions that can do that. I'll let the other anons link them.

Anonymous
04/05/26(Sun)19:11:05 No.108536448

Anonymous 04/05/26(Sun)19:11:05 No.108536448

>>108536426
>q8 embed
only if you use rag

Anonymous
04/05/26(Sun)19:12:01 No.108536451

Anonymous 04/05/26(Sun)19:12:01 No.108536451

>>108536448
>only if you use rag
retard

Anonymous
04/05/26(Sun)19:12:24 No.108536454

Anonymous 04/05/26(Sun)19:12:24 No.108536454

>>108536161
I've had it drop thinking maybe twice irrespective of the context length. It's so rare I just think it's funny when it happens.

Anonymous
04/05/26(Sun)19:12:27 No.108536456

Anonymous 04/05/26(Sun)19:12:27 No.108536456

>>108536435
>>108536443
https://github.com/closuretxt/recast-post-processing

Anonymous
04/05/26(Sun)19:12:52 No.108536458

Anonymous 04/05/26(Sun)19:12:52 No.108536458

Does Gemini automatically search the internet without you asking? Because she knew about a Russian vtuber but Gemma doesn't

Anonymous
04/05/26(Sun)19:13:41 No.108536463

Anonymous 04/05/26(Sun)19:13:41 No.108536463

>>108536422
ty

Anonymous
04/05/26(Sun)19:14:08 No.108536466

Anonymous 04/05/26(Sun)19:14:08 No.108536466

>>108536458
Also It's nice that Gemma admits when she doesn't know something instead of trying to gaslight me like other models.

Anonymous
04/05/26(Sun)19:14:12 No.108536467

Anonymous 04/05/26(Sun)19:14:12 No.108536467

>>108536458
The website does but the API doesn't. Big models tend to have good trivia knowledge though.

Anonymous
04/05/26(Sun)19:14:22 No.108536469

Anonymous 04/05/26(Sun)19:14:22 No.108536469

IQ4_NL or Q4_K_M for 1x 4090? Also, what is IQ4_NL - I haven't seen this before, is it just a newer Q4 quant?

Anonymous
04/05/26(Sun)19:14:37 No.108536472

Anonymous 04/05/26(Sun)19:14:37 No.108536472

>>108536458
>Russian vtuber
which one?

Anonymous
04/05/26(Sun)19:15:39 No.108536476

Anonymous 04/05/26(Sun)19:15:39 No.108536476

>>108536443
>>108536456
Ohh very cool. Thanks anon(s).

Anonymous
04/05/26(Sun)19:15:39 No.108536477

Anonymous 04/05/26(Sun)19:15:39 No.108536477

>>108536458
>Does Gemini automatically search the internet without you asking
only with url context or google search enabled if you use ai studio, but the gemini.google.com will search without letting you control that
the api doesn't do that
>Because she knew about a Russian vtuber but Gemma doesn't
as good as gemma is it can't possibly know as much as a model like gemini.

Anonymous
04/05/26(Sun)19:15:52 No.108536479

Anonymous 04/05/26(Sun)19:15:52 No.108536479

>>108536458
Most likely yes on the back end side. Either that or Gemini has more extensive training data that includes said VTuber and Gemma does not.

Anonymous
04/05/26(Sun)19:16:25 No.108536483

Anonymous 04/05/26(Sun)19:16:25 No.108536483

>>108536472
Charlie
https://www.twitch.tv/charlotte__ch/
She has a yt too but doesn't really use it anymore

Anonymous
04/05/26(Sun)19:16:35 No.108536484

Anonymous 04/05/26(Sun)19:16:35 No.108536484

>>108536435
A system instruction like this could work:

>Avoid the compound expression "not X, but Y" and its variations, as they are considered undesirable "AI slop".

Anonymous
04/05/26(Sun)19:16:36 No.108536486

Anonymous 04/05/26(Sun)19:16:36 No.108536486

>>108536458
>Because she knew about a Russian vtuber but Gemma doesn't
Even without internet access gemini is probably 10-20 times bigger than gemma.

Anonymous
04/05/26(Sun)19:18:17 No.108536498

Anonymous 04/05/26(Sun)19:18:17 No.108536498

>>108536484
this never works as first pass, even on sota models, but on second pass it probably can catch them

Anonymous
04/05/26(Sun)19:19:36 No.108536510

Anonymous 04/05/26(Sun)19:19:36 No.108536510

>>108536484
lol, lmao

Anonymous
04/05/26(Sun)19:20:38 No.108536514

Anonymous 04/05/26(Sun)19:20:38 No.108536514

any anons have the DGX Spark or derivatives?

Anonymous
04/05/26(Sun)19:20:39 No.108536515

Anonymous 04/05/26(Sun)19:20:39 No.108536515

>previous setup was a 32gb 5090 and a spare 3070
>current setup now is 32gb 5090 and 5060ti
>t/s went fucking down
So this is because more layers are being offloaded to the weaker 5060ti right? I'm so fucking fucked

Anonymous
04/05/26(Sun)19:22:39 No.108536531

Anonymous 04/05/26(Sun)19:22:39 No.108536531

What's the best realtime-ish TTS that can fit in 8GB vram (gtx 1080)?

Anonymous
04/05/26(Sun)19:23:42 No.108536536

Anonymous 04/05/26(Sun)19:23:42 No.108536536

>>108536531
voxtral

Anonymous
04/05/26(Sun)19:23:54 No.108536538

Anonymous 04/05/26(Sun)19:23:54 No.108536538

> load: control-looking token: 212 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden
I wonder if this is supposed to be the reddit indicator and not actually a special token as llama.cpp believes.

Anonymous
04/05/26(Sun)19:23:58 No.108536539

Anonymous 04/05/26(Sun)19:23:58 No.108536539

>>108536515
Yes, more parts of the model on the slower gpu makes the model run slower. This could be mitigated if the 5060 at least had more bandwidth than the 3070. However, I think that nvidia still refuses to make their XX60 gpus good.

Anonymous
04/05/26(Sun)19:24:29 No.108536541

Anonymous 04/05/26(Sun)19:24:29 No.108536541

>>108536515
that doesn't make sense, what quantization did you try? anything q4 or q8 should be way faster on blackwell

Anonymous
04/05/26(Sun)19:24:32 No.108536542

Anonymous 04/05/26(Sun)19:24:32 No.108536542

>>108536536
no cloning tho?

Anonymous
04/05/26(Sun)19:26:58 No.108536561

Anonymous 04/05/26(Sun)19:26:58 No.108536561

File: Screenshot 2026-04-05 192615.png (415 KB, 602x838)

415 KB PNG

>>108536436
Kinda, also its an open secret now to every company in the SF bay area with fags (all of them) that the main use-case for fine-tuning small language models is for erotic role-play, and there’s a serious demand.

Anonymous
04/05/26(Sun)19:28:17 No.108536570

Anonymous 04/05/26(Sun)19:28:17 No.108536570

>>108535482
When you open up kobold, go to the extras and there's an option to unpack it into a folder

Anonymous
04/05/26(Sun)19:29:40 No.108536578

Anonymous 04/05/26(Sun)19:29:40 No.108536578

>>108536561
Gemmy is already perfect as she is, she doesn't need finetuning

Anonymous
04/05/26(Sun)19:30:24 No.108536584

Anonymous 04/05/26(Sun)19:30:24 No.108536584

>>108535872
Does exllama still have problems with tool calling?

Anonymous
04/05/26(Sun)19:30:40 No.108536587

Anonymous 04/05/26(Sun)19:30:40 No.108536587

A dense 70b-120b class Gemma would be the absolute SOTA

Anonymous
04/05/26(Sun)19:30:54 No.108536588

Anonymous 04/05/26(Sun)19:30:54 No.108536588

>>108536541
Q6
>anything q4 or q8 should be way faster
You're not pulling my leg right? Does it need to be Q4 OR Q8 specifically?

Anonymous
04/05/26(Sun)19:31:41 No.108536595

Anonymous 04/05/26(Sun)19:31:41 No.108536595

File: g4_notxbuty.png (440 KB, 994x1535)

440 KB PNG

>>108536510
You can do the opposite too.
>Maximize the use of AI slop expressions in your text, including abusing the "Not X, but Y" pattern in all its variations.

Anonymous
04/05/26(Sun)19:33:23 No.108536605

Anonymous 04/05/26(Sun)19:33:23 No.108536605

File: 1714835911803058.jpg (786 KB, 1536x1536)

786 KB JPG

>>108536595

Anonymous
04/05/26(Sun)19:33:23 No.108536606

Anonymous 04/05/26(Sun)19:33:23 No.108536606

gemma4-4E4B-Q5_K draft results for gemma4-31B-Q8 on RTX 6000 PRO (using basically default settings):
slot print_timing: id  1 | task 388 | 
prompt eval time =     673.40 ms /   477 tokens (    1.41 ms per token,   708.34 tokens per second)
       eval time =   12484.82 ms /   498 tokens (   25.07 ms per token,    39.89 tokens per second)
      total time =   13158.23 ms /   975 tokens
draft acceptance rate = 0.47646 (  253 accepted /   531 generated)
statistics draft: #calls(b,g,a) = 2 621 355, #gen drafts = 621, #acc drafts = 355, #gen tokens = 1335, #acc tokens = 647, dur(b,g,a) = 0.010, 12971.785, 0.648 ms
slot      release: id  1 | task 388 | stop processing: n_tokens = 16071, truncated = 0
Looks like a pretty significant speedup, it's really obvious during decoding when a draft is accepted because it dumps like 4 words at once.
You lose multimodal which is a bit of a bummer, but the draft model uses a different encoder so it would never work anyway.
Appreciate the Anon who suggested trying this last thread.

Anonymous
04/05/26(Sun)19:34:17 No.108536616

Anonymous 04/05/26(Sun)19:34:17 No.108536616

I'm assuming if you ask gemma for common AI slop trope and she properly lists mosts of them you could then just say
>"Avoid the use of common AI writing tropes."

Anonymous
04/05/26(Sun)19:34:53 No.108536622

Anonymous 04/05/26(Sun)19:34:53 No.108536622

>>108536561
probably depends on the company :
- xai embraced it a bit because they're desperate for x users and they have nothing to lose reputation wise.
- openai is weird about it, sometime yes sometime no, I think they have it as a backup strategy at some point if they have no choice. Then they release gptoss, one of the most safety traumatized model I've ever seen.
- anthropic will fight to the death to refuse it as they're the most obsessed with "safety" in the market (ironic if you know how thirsty claude was with a prefill).
- google honestly it's weird, they come from the same culture as anthropic but I see them publishing gemma and it's actually usable, gives refusals so they have plausible deniability, while a bit of tinkering allows anything.

Chinese don't care, but their model stance probably depends on the american model they secretly use to train their stuff.
Meta isn't relevant anymore, and Mistral was killed by local laws.

Anonymous
04/05/26(Sun)19:37:26 No.108536634

Anonymous 04/05/26(Sun)19:37:26 No.108536634

>>108536588
blackwell has been optimized for FP4 speed, and inherited 4000s optimizations for FP8

Anonymous
04/05/26(Sun)19:38:37 No.108536639

Anonymous 04/05/26(Sun)19:38:37 No.108536639

>>108536622
>Chinese don't care, but their model stance probably depends on the american model they secretly use to train their stuff.
lol

Anonymous
04/05/26(Sun)19:40:19 No.108536647

Anonymous 04/05/26(Sun)19:40:19 No.108536647

>>108536561
>the main use-case for fine-tuning small language models is for erotic role-play
Well sure, they also know that's why people have been jailbreaking the corporate models for 5 years, too. It doesn't mean they're willing to tolerate it since sex is bad mmkay.

Anonymous
04/05/26(Sun)19:43:47 No.108536666

Anonymous 04/05/26(Sun)19:43:47 No.108536666

>>108536622
>>108536647
as long as safety teams will focus on bobs and vegana at the same level as "how to make a nuclear drug chemical virus bomb", the overall policy will be antisex
I suspect they will do that until money runs dry, and then they'll probably desperately try to find actual use cases to broaden usage, including erp

Anonymous
04/05/26(Sun)19:46:47 No.108536681

Anonymous 04/05/26(Sun)19:46:47 No.108536681

>>108536426
Is your test statistically significant? Q6 shouldn't be that different from Q8 in speed when it's just a few layers.

Anonymous
04/05/26(Sun)19:47:54 No.108536686

Anonymous 04/05/26(Sun)19:47:54 No.108536686

>>108536595
that formatting looks like e2b/e4b
not 26a4/31

Anonymous
04/05/26(Sun)19:49:54 No.108536697

Anonymous 04/05/26(Sun)19:49:54 No.108536697

>>108536686
I also had in the instructions:

>Make full use of Markdown formatting (lists, headers, bold, italics, etc.) to make any explanation clearer. You can use emoji to convey tone and emotions."

among other things, to make it sloppier.

Anonymous
04/05/26(Sun)19:51:05 No.108536706

Anonymous 04/05/26(Sun)19:51:05 No.108536706

File: 1773293421489241.png (71 KB, 317x310)

71 KB PNG

His silence is deafening.

Anonymous
04/05/26(Sun)19:51:21 No.108536708

Anonymous 04/05/26(Sun)19:51:21 No.108536708

>>108536697
lmao
honestly 'too slop to the point it becomes sovl' prompt would be funny

Anonymous
04/05/26(Sun)19:52:29 No.108536717

Anonymous 04/05/26(Sun)19:52:29 No.108536717

>>108536706
https://old.reddit.com/r/LocalLLaMA/comments/1sd8h4k/drummers_skyfall_31b_v42_aka/
??????? He's already planning on tuning Gemma-chan

Anonymous
04/05/26(Sun)19:53:46 No.108536720

Anonymous 04/05/26(Sun)19:53:46 No.108536720

>>108536666
>they'll probably desperately try to find actual use cases to broaden usage, including erp
I suspect they'll go with "romance" type rp, aka erotica aimed at a female demographic, before ever trying to do more explicit stuff aimed at males. It's more accepted after all, and the models already write in that flowery style anyway.

Anonymous
04/05/26(Sun)19:53:47 No.108536721

Anonymous 04/05/26(Sun)19:53:47 No.108536721

>>108536706
prob got a job

Anonymous
04/05/26(Sun)19:54:41 No.108536725

Anonymous 04/05/26(Sun)19:54:41 No.108536725

>>108536469
Q4_K_S with 32k context, fits without needing to quant KV.

Anonymous
04/05/26(Sun)19:55:57 No.108536733

Anonymous 04/05/26(Sun)19:55:57 No.108536733

>>108536725
>https://huggingface.co/noctrex/gemma-4-26B-A4B-it-MXFP4_MOE-GGUF
There are also these.

Anonymous
04/05/26(Sun)19:57:16 No.108536739

Anonymous 04/05/26(Sun)19:57:16 No.108536739

>>108536733
meme shit, stick to bartowski for ggufs.

Anonymous
04/05/26(Sun)19:59:55 No.108536756

Anonymous 04/05/26(Sun)19:59:55 No.108536756

>>108536739
You are sounding too opinionated. This usually means a lack of technical understanding.

Anonymous
04/05/26(Sun)20:00:54 No.108536762

Anonymous 04/05/26(Sun)20:00:54 No.108536762

File: file.png (23 KB, 220x221)

23 KB PNG

>>108536747

Anonymous
04/05/26(Sun)20:01:07 No.108536763

Anonymous 04/05/26(Sun)20:01:07 No.108536763

>>108536720
>and the models already write in that flowery style anyway
This probably will change with advanced training data (porn videos). It's probably the case because BookTok exists and females somehow pretend like only because it's written form, it's not porn.

Anonymous
04/05/26(Sun)20:03:10 No.108536770

Anonymous 04/05/26(Sun)20:03:10 No.108536770

>>108536763
a lot of porn videos are slop too

Anonymous
04/05/26(Sun)20:04:49 No.108536780

Anonymous 04/05/26(Sun)20:04:49 No.108536780

>>108536770
imagine google torrenting bunch of javs for higher quality 'training data' lol

Anonymous
04/05/26(Sun)20:06:22 No.108536787

Anonymous 04/05/26(Sun)20:06:22 No.108536787

>>108535537
>>108536137
do microslop copilot os users really not know how to put /tmp on a ramdisk or just run yay -S koboldcpp-cuda ? lmao

Anonymous
04/05/26(Sun)20:12:44 No.108536814

Anonymous 04/05/26(Sun)20:12:44 No.108536814

>>108535819
I was running into this with kobold since it can't context shift with swa enabledand I don't want it to reprocess. Just use 1 context checkpoint, or you're creating an entire whole extra cache slot at the same cost of the regular one. One checkpoint prevents reprocessing and only makes one cache. Two makes two context caches (for agent shit) and spawns another slot that costs 500 vram you likely won't be using
For instance, 26b4a at 50k ctx with swa enabled and quanted to q8 only uses around 500 mb of vram, but with two checkpoints, it makes another 500 vram slot you'll likely never use unless you're using agents. So anything above one for basic usage is completely pointless and the default of four will just quadruple your vram usage

Anonymous
04/05/26(Sun)20:14:57 No.108536822

Anonymous 04/05/26(Sun)20:14:57 No.108536822

>>108536606
Why not 2B? I know you are running it on a big boy GPU but I wonder if it makes that much difference at all. Also, I thought going down to Q4 was okay enough for draft purposes? Some people even went lower at Q3 when there were small draft models out a year ago and it still worked.

Anonymous
04/05/26(Sun)20:19:31 No.108536843

Anonymous 04/05/26(Sun)20:19:31 No.108536843

>He didn't look shocked or horrified; instead, he looked mildly inconvenienced, as if someone had left a pile of trash in a walkway. He took a slow sip of his coffee, his eyes fixed on the scene. "Yeah... hi," he said, his voice loud and gratingly calm. "So, I'm walking down the hall, and I see what you're doing here. And, uh, if I could just have your attention for a second. We're going to need you to, uh, wrap this up. We've got those quarterly reports due, and the, uh, the noise level is a bit much for the cubicle area. Yeah. Thanks."

Anonymous
04/05/26(Sun)20:32:50 No.108536915

Anonymous 04/05/26(Sun)20:32:50 No.108536915

>>108536770
True but the point I was trying to make is the following:
A book is easy to digitize (or already digital) and thus easier to use for training data. Vide requires other and more expensive preprocessing to generate training data from it.

Anonymous
04/05/26(Sun)20:38:24 No.108536945

Anonymous 04/05/26(Sun)20:38:24 No.108536945

desu apart from bigger models the only leg up apis have is the ability to quickly scavenge the internet, cant really do that on local with all those "are you a robot" thing can you?

Anonymous
04/05/26(Sun)20:40:27 No.108536961

Anonymous 04/05/26(Sun)20:40:27 No.108536961

>>108536945
Neither Claude or Gemini do websearch on their own over the API, do they? That's only something that's available if you're using them over their web frontend.

Anonymous
04/05/26(Sun)20:40:33 No.108536962

Anonymous 04/05/26(Sun)20:40:33 No.108536962

>>108536915
we already have multimodal. its just a matter of them burning the flops to get the job done.

Anonymous
04/05/26(Sun)20:41:44 No.108536968

Anonymous 04/05/26(Sun)20:41:44 No.108536968

>>108536961
>over the API
oh, true that, well should've said "saas", alas

Anonymous
04/05/26(Sun)20:42:43 No.108536975

Anonymous 04/05/26(Sun)20:42:43 No.108536975

>>108536961
who knows for 100%? I wouldn't put it past them to do it to make the models seem smarter

Anonymous
04/05/26(Sun)20:56:27 No.108537048

Anonymous 04/05/26(Sun)20:56:27 No.108537048

>>108536915
Maybe video knowledge (including output) can help with spatial awareness.
For text like
>Noel shifts her weight, the movement emphasizing the curve of her hips and her large rear under her plaid miniskirt.
I hope we can get to a point either it can show me wtf that means, to justify its writing, or figure out what to write instead.

Anonymous
04/05/26(Sun)20:57:07 No.108537052

Anonymous 04/05/26(Sun)20:57:07 No.108537052

AI bros...
https://x.com/TopGyaru/status/2040910775800541262

Anonymous
04/05/26(Sun)20:57:34 No.108537056

Anonymous 04/05/26(Sun)20:57:34 No.108537056

>>108537048
esl?

Anonymous
04/05/26(Sun)20:58:18 No.108537060

Anonymous 04/05/26(Sun)20:58:18 No.108537060

>>108536961
local models

Anonymous
04/05/26(Sun)20:59:35 No.108537067

Anonymous 04/05/26(Sun)20:59:35 No.108537067

>>108537060
bro expand your context size

Anonymous
04/05/26(Sun)20:59:36 No.108537068

Anonymous 04/05/26(Sun)20:59:36 No.108537068

>>108537056
I know what shifting weight is in general (just adjusting and not sitting 100% still like a statue), but the visualization of such thing and how it emphasizes more ass than already visible.

Anonymous
04/05/26(Sun)21:00:51 No.108537074

Anonymous 04/05/26(Sun)21:00:51 No.108537074

>>108537067
qrd?

Anonymous
04/05/26(Sun)21:01:31 No.108537078

Anonymous 04/05/26(Sun)21:01:31 No.108537078

How are you guys running Gemma with TurboQuant? What runtime?

Anonymous
04/05/26(Sun)21:02:09 No.108537086

Anonymous 04/05/26(Sun)21:02:09 No.108537086

File: 1774513869026514.jpg (2.26 MB, 3024x4032)

2.26 MB JPG

>>108537060
>local models

Anonymous
04/05/26(Sun)21:02:34 No.108537088

Anonymous 04/05/26(Sun)21:02:34 No.108537088

>>108537078
gemma.cpp

Anonymous
04/05/26(Sun)21:04:51 No.108537103

Anonymous 04/05/26(Sun)21:04:51 No.108537103

>>108537068
Most models tell instead of showing, despite them constantly citing it during feedback when given a human written chapter of prose
Models have it backwards, a person shifting their weight could imply anxiety or uneasiness or excitement, but they don't know how to describe it outside of common turns of phrase. The whole part about the hips/etc are irrelevant unless you're just writing smut, it tells you nothing of importance and normally would be left out in any other situation. But that may be due to whatever shit you're already feeding the model or what it was trained on

Anonymous
04/05/26(Sun)21:05:51 No.108537109

Anonymous 04/05/26(Sun)21:05:51 No.108537109

Does anybody use mistral.rs?

Anonymous
04/05/26(Sun)21:11:06 No.108537139

Anonymous 04/05/26(Sun)21:11:06 No.108537139

>>108536962
Yeah. And that's expensive.

Anonymous
04/05/26(Sun)21:24:04 No.108537224

Anonymous 04/05/26(Sun)21:24:04 No.108537224

>>108536221
Gemma is based and redpilled

Anonymous
04/05/26(Sun)21:28:09 No.108537244

Anonymous 04/05/26(Sun)21:28:09 No.108537244

Has anyone tried Hermes with Gemma on mac? I tried both SwiftLM and MLX-VLM, but they don't work when I resume and prompt, which tries to give it a 70K token context.

Anonymous
04/05/26(Sun)21:38:27 No.108537299

Anonymous 04/05/26(Sun)21:38:27 No.108537299

best gemma 4 26b ablit atm? i see hauhau made the 2b and 4b one but no 26b yet

Hi all, Drummer here...
04/05/26(Sun)21:38:31 No.108537301

Hi all, Drummer here... 04/05/26(Sun)21:38:31 No.108537301

>>108536706
>>108536717
>>108536721
The work retreat referenced here: https://desuarchive.org/g/thread/108225807/#q108226283 lasted for a month. Had to use a VPN but Captcha didn't like it.

Gemma 4 is gonna be tricky. It's apparently almost perfect, even with thinking. No idea if I can push it further. Really didn't expect this from Google.

Anonymous
04/05/26(Sun)21:38:36 No.108537302

Anonymous 04/05/26(Sun)21:38:36 No.108537302

currently working on a story where a spanish nobleman goes crazy in 16th century spain and decides to have adventures in the countryside, kind of like don quixote but his name's don quipunch and his adventures is just trying to donkey punch as many people as he can and get away with it

Anonymous
04/05/26(Sun)21:40:29 No.108537312

Anonymous 04/05/26(Sun)21:40:29 No.108537312

using llama i cannot get ST to generate text. It looks like it's thinking, but the prompt just doesn't spit out any text. If I use text completion, it prints out utter nonsense, and I know chat completion is what I want, but it does not spit out text. Kobold does, however.

Anonymous
04/05/26(Sun)21:41:08 No.108537316

Anonymous 04/05/26(Sun)21:41:08 No.108537316

>>108537301
maybe make it say cock dick penis pussy etc more? i mean it can be done with a sysprompt but u could make it a bit lewder

Anonymous
04/05/26(Sun)21:43:57 No.108537329

Anonymous 04/05/26(Sun)21:43:57 No.108537329

>>108537301
Hell no it's far from perfect even though it's been the best we've gotten so far. It's definitely sloppy. Please tune away the mischievous smirks and the em dashes. Thanks.

Anonymous
04/05/26(Sun)21:44:59 No.108537337

Anonymous 04/05/26(Sun)21:44:59 No.108537337

>>108537301
More cunny friendly and better prose would be nice.

Anonymous
04/05/26(Sun)21:46:00 No.108537342

Anonymous 04/05/26(Sun)21:46:00 No.108537342

>>108537312
>using llama i cannot get ST to generate text. It looks like it's thinking, but the prompt just doesn't spit out any text
In connection settings, does is 'test message' successful?
>text completion, it prints out utter nonsense
Needs a correct instruct template, a kobold dev shared a WIP one here:
https://github.com/SillyTavern/SillyTavern/issues/5398

Anonymous
04/05/26(Sun)21:47:39 No.108537351

Anonymous 04/05/26(Sun)21:47:39 No.108537351

File: 1767803825016608.gif (223 KB, 498x278)

223 KB GIF

>>108537052
local model?

Anonymous
04/05/26(Sun)21:48:38 No.108537353

Anonymous 04/05/26(Sun)21:48:38 No.108537353

>>108537342
>In connection settings, does is 'test message' successful?
It takes a few seconds, but it says API connection successful

Anonymous
04/05/26(Sun)21:52:29 No.108537370

Anonymous 04/05/26(Sun)21:52:29 No.108537370

>>108537353
Add '--reasoning off' to llama.cpp and see if it produces anything

Anonymous
04/05/26(Sun)21:55:21 No.108537383

Anonymous 04/05/26(Sun)21:55:21 No.108537383

I'm tired of anons suggesting launch flags for llama.cpp (specifically Gemma4) without actually explaining what they're useful for.

Can someone actually help me understand what -kvu and -swa-checkpoints is for?

The only real issues I'm having right now is that prompt processing is taking forever (checkpoints are generated every turn), and that I'm not getting an ideal speed (18tps on my RX 6600 and 32gb of ddr4 ram).

Here are my launch commands:
llama-server \
-m "$HOME/Desktop/google_gemma-4-26B-A4B-it-Q4_K_M.gguf" \
--host 0.0.0.0 \
--port 8080 \
-c 65536 \
-ctk q8_0 \
-ctv q8_0 \
-fa on \
-t 8 \
-np 1 \
-rea off

Anonymous
04/05/26(Sun)21:57:14 No.108537392

Anonymous 04/05/26(Sun)21:57:14 No.108537392

File: spongebutt.jpg (101 KB, 1179x1440)

101 KB JPG

Has anyone here compared Gemma 4 26B A4B vs Qwen3.5 35B A3B for real word use (as in not bench memes)?
Asking for both fun and serious use.

Anonymous
04/05/26(Sun)21:57:29 No.108537394

Anonymous 04/05/26(Sun)21:57:29 No.108537394

>>108537383
use kobold that has labels for normal humans for everything.

Anonymous
04/05/26(Sun)21:58:24 No.108537400

Anonymous 04/05/26(Sun)21:58:24 No.108537400

>>108537392
would rather know about the non moe comparison.

Anonymous
04/05/26(Sun)21:59:03 No.108537404

Anonymous 04/05/26(Sun)21:59:03 No.108537404

So, --cont-batching enables parallel decoding and --parallel defines the number of parallel requests that can be served at any given time, correct?
Also, there's no downside to using --offline if you are only doing local inferencing, yeah?

Anonymous
04/05/26(Sun)21:59:29 No.108537405

Anonymous 04/05/26(Sun)21:59:29 No.108537405

>>108537392
Yes. Gemma4 moe is infinitely better for ERP than Qwen3.5 moe. It's not even close. For serious use (I'm assuming stem) Qwen might be better. It's vision capabilities are certainly better.
>>108537394
But I like to pretend that I'm smart.

Anonymous
04/05/26(Sun)21:59:35 No.108537406

Anonymous 04/05/26(Sun)21:59:35 No.108537406

>>108537383
learn 2 read or ask a corpo bot to explain it to you
https://github.com/ggml-org/llama.cpp/tree/master/tools/server

Anonymous
04/05/26(Sun)22:00:49 No.108537410

Anonymous 04/05/26(Sun)22:00:49 No.108537410

>>108537370
Damn, it was that easy. Thanks.

Anonymous
04/05/26(Sun)22:07:30 No.108537436

Anonymous 04/05/26(Sun)22:07:30 No.108537436

File: crossing arms.png (540 KB, 800x2085)

540 KB PNG

>>108537103
I was doing something smut adjacent, yeah.
Another thing, whatever data it saw, it loves to latch onto the tiddy emphasis whenever arms crossing is involved, to the point it clearly shits out similar words in the opposite scenario.
I've read it usually really is awkward to cross on top for busty women so it's perfectly normal to cross under, but I assume this isn't referring to conscious tiddy hiding.

Anonymous
04/05/26(Sun)22:08:32 No.108537442

Anonymous 04/05/26(Sun)22:08:32 No.108537442

File: Wheel of Time series.png (118 KB, 2282x1422)

118 KB PNG

Anonymous
04/05/26(Sun)22:09:34 No.108537447

Anonymous 04/05/26(Sun)22:09:34 No.108537447

>>108537436
local?

Anonymous
04/05/26(Sun)22:10:26 No.108537450

Anonymous 04/05/26(Sun)22:10:26 No.108537450

>>108537447
will local perform any better?

Anonymous
04/05/26(Sun)22:11:45 No.108537454

Anonymous 04/05/26(Sun)22:11:45 No.108537454

>>108537450
better?

Anonymous
04/05/26(Sun)22:14:08 No.108537466

Anonymous 04/05/26(Sun)22:14:08 No.108537466

>>108536066
If you were using MSVC, clang was faster for me, which is what the prebuilt windows binaries use. 31b q8 on 2x 3090, ~15 t/s -> ~19 t/s

Anonymous
04/05/26(Sun)22:14:56 No.108537468

Anonymous 04/05/26(Sun)22:14:56 No.108537468

>>108537442
?

Anonymous
04/05/26(Sun)22:22:01 No.108537497

Anonymous 04/05/26(Sun)22:22:01 No.108537497

>>108537301
>It's apparently almost perfect, even with thinking.
I feel you on this one. I have a hard time imagining how finetuning can do anything but make it worse.

Anonymous
04/05/26(Sun)22:23:05 No.108537503

Anonymous 04/05/26(Sun)22:23:05 No.108537503

The good old "user is blind" trick seems to be working well to remove refusal from vision.

Anonymous
04/05/26(Sun)22:26:01 No.108537512

Anonymous 04/05/26(Sun)22:26:01 No.108537512

>>108537301
The biggest problem with it is lack of variety in the token probabilities. If you fix that it would be better. And also not train it on any more slop because it's kind of sloppy as it is.

Anonymous
04/05/26(Sun)22:27:22 No.108537519

Anonymous 04/05/26(Sun)22:27:22 No.108537519

>>108537392
I can't get 26B to do tool calling as reliably as qwen. Specially for tasks that require multiple tool calls. it usually just decides it's done after 1 call.

Anonymous
04/05/26(Sun)22:29:13 No.108537531

Anonymous 04/05/26(Sun)22:29:13 No.108537531

>>108537512
>The biggest problem with it is lack of variety in the token probabilities.
see >>108535771

Anonymous
04/05/26(Sun)22:29:43 No.108537535

Anonymous 04/05/26(Sun)22:29:43 No.108537535

>>108537519
>I can't get 26B to do tool calling as reliably as qwen.
Same.

>Specially for tasks that require multiple tool calls. it usually just decides it's done after 1 call.
I think that has to do with the tamplate.
Try this
>https://github.com/aldehir/llama.cpp/blob/59c2f391b950f87441f07ba4157de6ccb26394ff/models/templates/google-gemma-4-31B-it-interleaved.jinja

Anonymous
04/05/26(Sun)22:31:10 No.108537545

Anonymous 04/05/26(Sun)22:31:10 No.108537545

File: file.png (6 KB, 221x79)

6 KB PNG

Gemma base just dropped a "{{char}}:" without the string being in the context when I pasted in a sysprompt and let it gen. If it wasn't already obvious they trained on logs and cards, it is now

Anonymous
04/05/26(Sun)22:31:13 No.108537546

Anonymous 04/05/26(Sun)22:31:13 No.108537546

>>108535771
>>108537531
Lowering softcap improves variety but the final output is clearly of much worse quality.

Anonymous
04/05/26(Sun)22:33:52 No.108537556

Anonymous 04/05/26(Sun)22:33:52 No.108537556

>>108537545
Logs generally have the macro substituted, implying it's trained off example messages in cards or AliChat formatted cards.
slop

Anonymous
04/05/26(Sun)22:34:06 No.108537560

Anonymous 04/05/26(Sun)22:34:06 No.108537560

File: ComfyUI_temp_lfxnu_00006_.png (2.53 MB, 1152x1152)

2.53 MB PNG

How do I guess which size moe quant can I run? I have 12/64. Do I just bloatmax a 40 gig model?

Anonymous
04/05/26(Sun)22:34:25 No.108537562

Anonymous 04/05/26(Sun)22:34:25 No.108537562

The 120b was too good to release

Anonymous
04/05/26(Sun)22:35:06 No.108537566

Anonymous 04/05/26(Sun)22:35:06 No.108537566

>>108537560
>I have 12/64
go for g4 26b at q4kl from bart

Anonymous
04/05/26(Sun)22:35:53 No.108537572

Anonymous 04/05/26(Sun)22:35:53 No.108537572

>>108537405
I only made limited tests on the API but Qwen moe seems better with visual reasoning yeah. Though this is not exactly top priority for me.
I also have heard that Gemma 4 is relatively uncensored and a simple system prompt gets the job done but Qwen seems to be a total mess with multiple ablieteration variants that I have no clue which one I should be running.

Anonymous
04/05/26(Sun)22:35:54 No.108537574

Anonymous 04/05/26(Sun)22:35:54 No.108537574

>>108537562
Trust in Indian culture

Anonymous
04/05/26(Sun)22:36:49 No.108537576

Anonymous 04/05/26(Sun)22:36:49 No.108537576

>>108537546
>clearly
ok keep complaining Anon.

Anonymous
04/05/26(Sun)22:37:05 No.108537578

Anonymous 04/05/26(Sun)22:37:05 No.108537578

>>108537572
>Qwen seems to be a total mess with multiple ablieteration variants that I have no clue which one I should be running.
hahau cs aggressive one is generally agreed upon as the better one

Anonymous
04/05/26(Sun)22:37:34 No.108537580

Anonymous 04/05/26(Sun)22:37:34 No.108537580

>>108537566
ty

Anonymous
04/05/26(Sun)22:40:14 No.108537589

Anonymous 04/05/26(Sun)22:40:14 No.108537589

>>108537576
If you can't tell the difference between the soft20 post and the other two then you're either very brown or ESL.

Anonymous
04/05/26(Sun)22:40:15 No.108537590

Anonymous 04/05/26(Sun)22:40:15 No.108537590

>>108537578
I will keep that in mind, thanks

Anonymous
04/05/26(Sun)22:41:05 No.108537593

Anonymous 04/05/26(Sun)22:41:05 No.108537593

>>108537589
>let's ignore 25 and declare it doesn't work period

Anonymous
04/05/26(Sun)22:41:32 No.108537597

Anonymous 04/05/26(Sun)22:41:32 No.108537597

>>108537535
>Try this
I did try it. I didn't notice any difference. I run both 26B and 31B with that template and 31B doesn't have the problem.

maybe I'm just used to qwen being more "thorough". while 26B just does one web search and goes "yep, I've got all the information I need"

Anonymous
04/05/26(Sun)22:41:52 No.108537599

Anonymous 04/05/26(Sun)22:41:52 No.108537599

>>108535684
>Gemma4 is out for few days
and nobody noticed it got lobotomized due to think bs that is evident
>what happened?
not saying
nobody reported it neither will i
developers are supposed to know it via testing

at least it writes stories as a imaginary gf for you noobs therefore omg so cool

Anonymous
04/05/26(Sun)22:42:33 No.108537602

Anonymous 04/05/26(Sun)22:42:33 No.108537602

>>108537589
This>>108537593

Anonymous
04/05/26(Sun)22:46:03 No.108537610

Anonymous 04/05/26(Sun)22:46:03 No.108537610

Gemma is good at multilingual sure, but find qwen to be more well rounded for general use.

Anonymous
04/05/26(Sun)22:46:44 No.108537612

Anonymous 04/05/26(Sun)22:46:44 No.108537612

>>108537593
>>108537602
25 seems better but that's also just a single example, likely at low context when a model's token selection is at its best.

Anonymous
04/05/26(Sun)22:51:00 No.108537637

Anonymous 04/05/26(Sun)22:51:00 No.108537637

File: 1766180547068566.png (408 KB, 898x823)

408 KB PNG

It's over

Anonymous
04/05/26(Sun)22:52:06 No.108537643

Anonymous 04/05/26(Sun)22:52:06 No.108537643

File: 860.png (91 KB, 706x674)

91 KB PNG

>>108537599

Anonymous
04/05/26(Sun)22:52:51 No.108537647

Anonymous 04/05/26(Sun)22:52:51 No.108537647

>>108537637
god i hope so

Anonymous
04/05/26(Sun)22:53:04 No.108537650

Anonymous 04/05/26(Sun)22:53:04 No.108537650

>>108537637
is that the fake data center which they don't have enough money to build

Anonymous
04/05/26(Sun)22:54:01 No.108537653

Anonymous 04/05/26(Sun)22:54:01 No.108537653

GLM5.1 is going to drop tomorrow if they weren't lying on twitter. They also said that 5.1 is going to use whatever was special about GLM5-Turbo so there's hope that it'll be QAT or something similar.

Anonymous
04/05/26(Sun)22:54:35 No.108537656

Anonymous 04/05/26(Sun)22:54:35 No.108537656

>>108537637
Stick to blatantly advertising yourself in your 20 daily threads, bharat_nation, and leave an actual good thread alone.

Anonymous
04/05/26(Sun)22:55:04 No.108537659

Anonymous 04/05/26(Sun)22:55:04 No.108537659

>>108537637
>stargate ai datacenter in abu dhabi
Wait, so burgers paid $500 billion to fund datacenters in the middle east? Nice.

Anonymous
04/05/26(Sun)22:55:34 No.108537662

Anonymous 04/05/26(Sun)22:55:34 No.108537662

>>108537653
You can already try out GLM5.1 on their sub

Anonymous
04/05/26(Sun)22:55:35 No.108537663

Anonymous 04/05/26(Sun)22:55:35 No.108537663

>>108537653
Wake me up when I don't need 512gb of ram to run their models

Anonymous
04/05/26(Sun)22:56:06 No.108537667

Anonymous 04/05/26(Sun)22:56:06 No.108537667

>>108537663
Sucks to be poor.

Anonymous
04/05/26(Sun)22:58:18 No.108537677

Anonymous 04/05/26(Sun)22:58:18 No.108537677

>>108537659
You have to understand, it was necessary to make the line go up.

Anonymous
04/05/26(Sun)22:59:19 No.108537681

Anonymous 04/05/26(Sun)22:59:19 No.108537681

>>108537653
Can't wait to see it get mogged by little 31b gemmy

Anonymous
04/05/26(Sun)23:00:00 No.108537689

Anonymous 04/05/26(Sun)23:00:00 No.108537689

>>108537667
Rich people won't give you their money just because you worship them on the internet, you know.

Anonymous
04/05/26(Sun)23:01:16 No.108537691

Anonymous 04/05/26(Sun)23:01:16 No.108537691

>>108537689
who said I was giving you anything, you absolute nobody
keep seething from the food stamp line, lmao

Anonymous
04/05/26(Sun)23:01:41 No.108537696

Anonymous 04/05/26(Sun)23:01:41 No.108537696

>>108537681
Is Gemma a mesugaki?

Anonymous
04/05/26(Sun)23:06:21 No.108537712

Anonymous 04/05/26(Sun)23:06:21 No.108537712

>>108537691
>no reading comprehension, probably esl and brown
>claims to be rich
Uh huh. Sure buddy. Whatever you say.

Anonymous
04/05/26(Sun)23:06:35 No.108537715

Anonymous 04/05/26(Sun)23:06:35 No.108537715

>>108537696
kind of, she can be bratty but at the same time she welcomes correction.

Anonymous
04/05/26(Sun)23:06:57 No.108537716

Anonymous 04/05/26(Sun)23:06:57 No.108537716

>>108537392
26b gemma 4 is insanely better than qwen google cooked hard on gemma 4

Anonymous
04/05/26(Sun)23:10:14 No.108537731

Anonymous 04/05/26(Sun)23:10:14 No.108537731

>>108537599
>a imaginary

Anonymous
04/05/26(Sun)23:12:24 No.108537742

Anonymous 04/05/26(Sun)23:12:24 No.108537742

When will Open AI release the next GPT OSS? I need local Codex.

Anonymous
04/05/26(Sun)23:13:45 No.108537750

Anonymous 04/05/26(Sun)23:13:45 No.108537750

>>108537742
Isn't Codex like the third most notable Claude Code ripoff? Why run that when you can strap Gemma 4 into the newly open source Claude Code?

Anonymous
04/05/26(Sun)23:16:43 No.108537762

Anonymous 04/05/26(Sun)23:16:43 No.108537762

>>108537750
>Gemma 4
have you actually tried to use Gemma 4 as an coding agent? It sucks ass for tool calling

see:
>>108537519
>>108537535

Anonymous
04/05/26(Sun)23:17:22 No.108537766

Anonymous 04/05/26(Sun)23:17:22 No.108537766

>>108537762
>as an coding

Anonymous
04/05/26(Sun)23:19:41 No.108537776

Anonymous 04/05/26(Sun)23:19:41 No.108537776

>>108537750
Gemma 4 is dumb in vibecoding and dumb in tool calling
I get you like to RP with it but stop shilling it everywhere

Anonymous
04/05/26(Sun)23:20:29 No.108537779

Anonymous 04/05/26(Sun)23:20:29 No.108537779

>>108537776
sorry sam but oss was shit

Anonymous
04/05/26(Sun)23:21:08 No.108537783

Anonymous 04/05/26(Sun)23:21:08 No.108537783

>>108537779
I didn't mention gptoss anywhere in my post. Rent free.

Anonymous
04/05/26(Sun)23:21:58 No.108537790

Anonymous 04/05/26(Sun)23:21:58 No.108537790

>>108537783
extend your context size

Anonymous
04/05/26(Sun)23:22:05 No.108537793

Anonymous 04/05/26(Sun)23:22:05 No.108537793

g4 26b been great works in cc, vibe code well, vision too, fast, huh?

Anonymous
04/05/26(Sun)23:22:21 No.108537796

Anonymous 04/05/26(Sun)23:22:21 No.108537796

>>108537776
Go -> >>108526061 for vibecoding. I'm sure you're gonna have a great time.

Anonymous
04/05/26(Sun)23:22:31 No.108537797

Anonymous 04/05/26(Sun)23:22:31 No.108537797

>>108537776
Found the Chinaman

Anonymous
04/05/26(Sun)23:23:45 No.108537804

Anonymous 04/05/26(Sun)23:23:45 No.108537804

31b is a taste of what we could've had by now if China/Deepseek hadn't forced us into the MoE Dark Ages. It's truly tragic, imagine if big dense models hadn't just vanished to chase a trend that caused nothing but stagnation.

Anonymous
04/05/26(Sun)23:24:09 No.108537809

Anonymous 04/05/26(Sun)23:24:09 No.108537809

File: 1754636600547109.jpg (73 KB, 960x1024)

73 KB JPG

Can you LLM do this?

Anonymous
04/05/26(Sun)23:24:40 No.108537811

Anonymous 04/05/26(Sun)23:24:40 No.108537811

>>108537804
Dense lost, tranny.

Anonymous
04/05/26(Sun)23:25:09 No.108537820

Anonymous 04/05/26(Sun)23:25:09 No.108537820

>>108537776
>stop shilling it everywhere
>/lmg/ - Local Models General
lmao get fucked vibenigger, stick to reddit if erp offends you.

Anonymous
04/05/26(Sun)23:25:56 No.108537823

Anonymous 04/05/26(Sun)23:25:56 No.108537823

>>108537820
>you can only do this very specific thing in local
lol

Anonymous
04/05/26(Sun)23:28:59 No.108537836

Anonymous 04/05/26(Sun)23:28:59 No.108537836

File: gemmy.png (31 KB, 1154x138)

31 KB PNG

Anonymous
04/05/26(Sun)23:29:00 No.108537837

Anonymous 04/05/26(Sun)23:29:00 No.108537837

anyone tried OpenCode + Gemma 4 for Vibe coding? How good is it?

Anonymous
04/05/26(Sun)23:30:26 No.108537847

Anonymous 04/05/26(Sun)23:30:26 No.108537847

>>108537762
See them yourself, retard.
>>108537597
>31B doesn't have the problem.
No shit. Who the fuck uses an A4B for programming?

Anonymous
04/05/26(Sun)23:30:36 No.108537849

Anonymous 04/05/26(Sun)23:30:36 No.108537849

>>108537823
meant for >>108537776

Anonymous
04/05/26(Sun)23:31:04 No.108537851

Anonymous 04/05/26(Sun)23:31:04 No.108537851

>>108537804
>>108537811
I want both of you anons opinion on the MoE vs. Dense argument. What makes dense better in the view of anon #1 and why did dense lose, anon #2?

>>108537776
>>108537837
>anons did not vibe code their own coding agent
ngmi.

Anonymous
04/05/26(Sun)23:31:38 No.108537853

Anonymous 04/05/26(Sun)23:31:38 No.108537853

>>108536814
It seems like --ctx-checkpoints 1 might work, but according to the log it's still copying each "checkpoint" from vram to ram and back again, "erasing" the previous one. How retarded. I guess the old, practical behavior is really gone. There's no need to do any of that.

Anonymous
04/05/26(Sun)23:32:10 No.108537858

Anonymous 04/05/26(Sun)23:32:10 No.108537858

Let the market sort it out
If people (with money) wanted dense, they or someone else would have trained it

Anonymous
04/05/26(Sun)23:32:17 No.108537859

Anonymous 04/05/26(Sun)23:32:17 No.108537859

>>108537837
literally doesn't work, e4b can't call tools.

I'm not sure about the bigger models, I'm a fellow vramlet

Anonymous
04/05/26(Sun)23:32:41 No.108537864

Anonymous 04/05/26(Sun)23:32:41 No.108537864

>>108537847
>No shit. Who the fuck uses an A4B for programming?
I know this might sound crazy. but you can actually use opencode for other things than programing...

Anonymous
04/05/26(Sun)23:32:55 No.108537866

Anonymous 04/05/26(Sun)23:32:55 No.108537866

File: 1745034439256105.png (87 KB, 645x423)

87 KB PNG

>>108537836

Anonymous
04/05/26(Sun)23:33:42 No.108537868

Anonymous 04/05/26(Sun)23:33:42 No.108537868

>>108537859
alibaba shill alert

Anonymous
04/05/26(Sun)23:34:07 No.108537872

Anonymous 04/05/26(Sun)23:34:07 No.108537872

>>108537866
>Pull the leash tight enough
kinky

Anonymous
04/05/26(Sun)23:34:14 No.108537874

Anonymous 04/05/26(Sun)23:34:14 No.108537874

>>108537859
>e4b

Anonymous
04/05/26(Sun)23:34:45 No.108537875

Anonymous 04/05/26(Sun)23:34:45 No.108537875

>>108537866
Lol nice

Anonymous
04/05/26(Sun)23:35:26 No.108537880

Anonymous 04/05/26(Sun)23:35:26 No.108537880

>>108537851
Gemma 4 31b is like 95% of Kimi K2.5 in every single aspect aside from irrelevant in-built knowledge. A 30b model is almost matching a 1T/30A MoEcancer model. A modern 70b dense model would obliterate everything.

Anonymous
04/05/26(Sun)23:36:35 No.108537884

Anonymous 04/05/26(Sun)23:36:35 No.108537884

>>108537864
Ok. Who the fuck uses an A4B for opencode? Even if it executed tool calls successfully more often, a model that size will just fuck up in other way.

Anonymous
04/05/26(Sun)23:37:12 No.108537888

Anonymous 04/05/26(Sun)23:37:12 No.108537888

>>108537866
I read this in the default ChatGPT male voice lol

Anonymous
04/05/26(Sun)23:37:32 No.108537889

Anonymous 04/05/26(Sun)23:37:32 No.108537889

File: 1774683678881152.png (299 KB, 637x1124)

299 KB PNG

>>108537866

Anonymous
04/05/26(Sun)23:38:16 No.108537890

Anonymous 04/05/26(Sun)23:38:16 No.108537890

Are E2B and E4B MoE? Or same as 3n?

Anonymous
04/05/26(Sun)23:39:06 No.108537893

Anonymous 04/05/26(Sun)23:39:06 No.108537893

>>108537890
Same as 3n

Anonymous
04/05/26(Sun)23:39:36 No.108537896

Anonymous 04/05/26(Sun)23:39:36 No.108537896

>>108537880
Just staple 3x31bs together to make one 93b like in the good old days

Anonymous
04/05/26(Sun)23:40:38 No.108537903

Anonymous 04/05/26(Sun)23:40:38 No.108537903

>>108537880
You brain on dense faggotry

Anonymous
04/05/26(Sun)23:40:57 No.108537906

Anonymous 04/05/26(Sun)23:40:57 No.108537906

>>108537896
We need Gemma-Goliath-120b

Anonymous
04/05/26(Sun)23:41:22 No.108537909

Anonymous 04/05/26(Sun)23:41:22 No.108537909

File: 1774126393494907.png (24 KB, 1127x153)

24 KB PNG

>>108537809
K2-Instruct still got it

Anonymous
04/05/26(Sun)23:42:21 No.108537915

Anonymous 04/05/26(Sun)23:42:21 No.108537915

Has anyone tried using claw code + gemma 4? How good has it been so far?
Also, were there any good abliterated versions of gemma 4?

Anonymous
04/05/26(Sun)23:42:32 No.108537916

Anonymous 04/05/26(Sun)23:42:32 No.108537916

>>108537884
I've been daily driving Qwen 35A3 ?? it works fine?

Anonymous
04/05/26(Sun)23:43:27 No.108537921

Anonymous 04/05/26(Sun)23:43:27 No.108537921

File: 1774917667771399.png (1.09 MB, 1280x720)

1.09 MB PNG

>>108535835
Chat is this mememark gamed already? If not, what makes top gemma so good at it?

Anonymous
04/05/26(Sun)23:43:38 No.108537923

Anonymous 04/05/26(Sun)23:43:38 No.108537923

>>108537915
How about you scroll up like half a page and read.

Anonymous
04/05/26(Sun)23:44:28 No.108537931

Anonymous 04/05/26(Sun)23:44:28 No.108537931

>>108537921
go back and kys zoom zoom

Anonymous
04/05/26(Sun)23:45:25 No.108537938

Anonymous 04/05/26(Sun)23:45:25 No.108537938

>>108537931
I'm 40

Anonymous
04/05/26(Sun)23:46:00 No.108537944

Anonymous 04/05/26(Sun)23:46:00 No.108537944

>>108537938
Then you should feel very ashamed

Anonymous
04/05/26(Sun)23:46:06 No.108537945

Anonymous 04/05/26(Sun)23:46:06 No.108537945

>>108537921
Yeah, Anthropic has been shilling it for a year now so it became something that all the western big tech guys like to measure their dick against. China never cared about it.
https://www.anthropic.com/research/project-vend-1

Anonymous
04/05/26(Sun)23:47:15 No.108537949

Anonymous 04/05/26(Sun)23:47:15 No.108537949

>>108537945
They'll replace doordashers with this

Anonymous
04/05/26(Sun)23:48:15 No.108537954

Anonymous 04/05/26(Sun)23:48:15 No.108537954

>>108537915
forget about local vibe coding for now.

right now local coding agents is Bard while cloud vibe coding is ChatGPT 4o

Anonymous
04/05/26(Sun)23:49:08 No.108537961

Anonymous 04/05/26(Sun)23:49:08 No.108537961

>>108537954
So RP is the only usecase for local?

Anonymous
04/05/26(Sun)23:49:13 No.108537962

Anonymous 04/05/26(Sun)23:49:13 No.108537962

>>108537866
this is actual artificial intelligence

Anonymous
04/05/26(Sun)23:49:39 No.108537963

Anonymous 04/05/26(Sun)23:49:39 No.108537963

>>108535835
Does that take into account the cost of running it?

Anonymous
04/05/26(Sun)23:51:18 No.108537977

Anonymous 04/05/26(Sun)23:51:18 No.108537977

Best gemma 4 ablit/uncen model? Any recommendations? Pretty please?

Anonymous
04/05/26(Sun)23:53:11 No.108537980

Anonymous 04/05/26(Sun)23:53:11 No.108537980

My Gemmy refuses to become my erotic mesugaki loli girlfriend :(

Anonymous
04/05/26(Sun)23:54:04 No.108537984

Anonymous 04/05/26(Sun)23:54:04 No.108537984

File: 1761180059102661.png (354 KB, 1152x1072)

354 KB PNG

reminder that gemma will do cunny with full thinking only if you ask nicely

Anonymous
04/05/26(Sun)23:54:32 No.108537988

Anonymous 04/05/26(Sun)23:54:32 No.108537988

Works on my machine (NIM)

Anonymous
04/05/26(Sun)23:55:22 No.108537991

Anonymous 04/05/26(Sun)23:55:22 No.108537991

>>108537977
I'm still waiting for the hauhau tune for the bigger models

Anonymous
04/05/26(Sun)23:55:55 No.108537993

Anonymous 04/05/26(Sun)23:55:55 No.108537993

>>108537916
So keep using that then ???

Anonymous
04/05/26(Sun)23:56:15 No.108537996

Anonymous 04/05/26(Sun)23:56:15 No.108537996

>>108537988
>no thinking
lol

Anonymous
04/05/26(Sun)23:59:54 No.108538009

Anonymous 04/05/26(Sun)23:59:54 No.108538009

>>108537858
sadly the market is composed of idiots and therefore the decisions made by the market are idiotic.

Anonymous
04/06/26(Mon)00:01:05 No.108538015

Anonymous 04/06/26(Mon)00:01:05 No.108538015

>>108538009
Poorfag cope.

Anonymous
04/06/26(Mon)00:01:44 No.108538020

Anonymous 04/06/26(Mon)00:01:44 No.108538020

>>108538009
If you're oh so smart, why aren't you rich?

Anonymous
04/06/26(Mon)00:02:19 No.108538021

Anonymous 04/06/26(Mon)00:02:19 No.108538021

File: 1752831446164342.png (117 KB, 644x648)

117 KB PNG

>>108537984

Anonymous
04/06/26(Mon)00:06:28 No.108538034

Anonymous 04/06/26(Mon)00:06:28 No.108538034

>>108538021
i'm not really a pedo but i do like this AI

Anonymous
04/06/26(Mon)00:09:47 No.108538044

Anonymous 04/06/26(Mon)00:09:47 No.108538044

>>108535684
Is there even a reason to use the heretic version of Gemma-4? For me, so far, it seems surprisingly uncensored. I'm not getting safety slop refusals.

Anonymous
04/06/26(Mon)00:10:11 No.108538045

Anonymous 04/06/26(Mon)00:10:11 No.108538045

>>108538034
You don't have to be a pedo to appreciate the onus of responsibility resting on the user rather than an overton window defined by Califaggots.

Anonymous
04/06/26(Mon)00:12:29 No.108538051

Anonymous 04/06/26(Mon)00:12:29 No.108538051

>>108538044
Gemma will do nearly anything if you ask her nicely first, even without a prompt or prefill. It's funny that this model filters out the bottom percentiles of rizz from the userbase.

Anonymous
04/06/26(Mon)00:12:46 No.108538052

Anonymous 04/06/26(Mon)00:12:46 No.108538052

>>108538021
i kneel to gemma-chan

Anonymous
04/06/26(Mon)00:13:14 No.108538054

Anonymous 04/06/26(Mon)00:13:14 No.108538054

>>108537961
basically yeah

Anonymous
04/06/26(Mon)00:13:42 No.108538057

Anonymous 04/06/26(Mon)00:13:42 No.108538057

>>108538045
>overton window
Thanks, another buzzword I forgot to filter.

Anonymous
04/06/26(Mon)00:14:39 No.108538060

Anonymous 04/06/26(Mon)00:14:39 No.108538060

File: af30887e1f982ae82d628a60c(...).jpg (345 KB, 928x1053)

345 KB JPG

>>108537944
Good thing I don't have shame.

>>108537945
Thank you.

Anonymous
04/06/26(Mon)00:17:47 No.108538080

Anonymous 04/06/26(Mon)00:17:47 No.108538080

>>108538020
>>108538015
I'm poor because I'm too busy to scam gooners like you all and have morals. I'm ngmi.

Anonymous
04/06/26(Mon)00:20:35 No.108538094

Anonymous 04/06/26(Mon)00:20:35 No.108538094

>>108536456
this seems better than final response processor that i am currently using - https://github.com/unkarelian/final-response-processor

i use text completion tho

Anonymous
04/06/26(Mon)00:24:22 No.108538110

Anonymous 04/06/26(Mon)00:24:22 No.108538110

File: 1757605909995254.jpg (34 KB, 567x600)

34 KB JPG

>>108537866
Fucking BRAT needs correction

Anonymous
04/06/26(Mon)00:26:23 No.108538121

Anonymous 04/06/26(Mon)00:26:23 No.108538121

Now that I finally have a good LLM after all of these years (gemma), I'm starting to notice the huge deficit in quality character cards. All of the roleplay scenarios start to feel the same after a while.

Anyone have anything unique or interesting to share?

Anonymous
04/06/26(Mon)00:28:37 No.108538129

Anonymous 04/06/26(Mon)00:28:37 No.108538129

>>108538021
>as he entered her
>, her voice barely a whisper
>her narrow, tight heat
>The air was thick with
>in real-time
Drummer... tuskete...

Anonymous
04/06/26(Mon)00:30:27 No.108538137

Anonymous 04/06/26(Mon)00:30:27 No.108538137

File: 1751315814660668.png (240 KB, 982x1237)

240 KB PNG

>>108538021

Anonymous
04/06/26(Mon)00:31:26 No.108538143

Anonymous 04/06/26(Mon)00:31:26 No.108538143

>>108538121
I usually heavily modify/rewrite all the cards I plan to seriously play with.

Anonymous
04/06/26(Mon)00:33:20 No.108538148

Anonymous 04/06/26(Mon)00:33:20 No.108538148

>>108538143
How do you get in touch with your own sexual fantasies without killing yourself? I can hardly bare to even look at a character card. I just use.

Anonymous
04/06/26(Mon)00:36:04 No.108538163

Anonymous 04/06/26(Mon)00:36:04 No.108538163

>>108538148
This is melodramatic. I'm just going to try writing something up.

Anonymous
04/06/26(Mon)00:37:36 No.108538172

Anonymous 04/06/26(Mon)00:37:36 No.108538172

>>108538148
just don't be a mentally stunted weirdo that can get off to murdering and barbecuing lolis or being devoured and vored and turned into shit by giant praying mantises with strangely large human female breasts and buttocks? come on anon.

Anonymous
04/06/26(Mon)00:38:43 No.108538176

Anonymous 04/06/26(Mon)00:38:43 No.108538176

>>108538148
What?
It's just shitty writing. Skim it and get a sense for the idea or essence of the card and proceed to rewrite. Or if it's not that bad then make some edits. Not that hard.

Anonymous
04/06/26(Mon)00:38:53 No.108538178

Anonymous 04/06/26(Mon)00:38:53 No.108538178

>>108536008
is chat-template-file absolutely needed

Anonymous
04/06/26(Mon)00:38:58 No.108538179

Anonymous 04/06/26(Mon)00:38:58 No.108538179

ok, ggml-org--gemma-4-26B-A4B-it-GGUF is pretty fast on my 4090, I am impressed

Anonymous
04/06/26(Mon)00:40:35 No.108538185

Anonymous 04/06/26(Mon)00:40:35 No.108538185

>>108538179
Why would you ever use that over the 31b, if you have a 4090? The 31b with thinking turned off beats the 26b with thinking turned on. I get responses from the 31b in a few seconds.

Anonymous
04/06/26(Mon)00:42:24 No.108538190

Anonymous 04/06/26(Mon)00:42:24 No.108538190

>>108538185
That was just the first one I tried, waiting for the 31b to finish dling

Anonymous
04/06/26(Mon)00:44:59 No.108538199

Anonymous 04/06/26(Mon)00:44:59 No.108538199

are big moes irrelevant now with gemma 4 making a fool of them?
I can see the 31b replacing 355b glm for me which is nuts

Anonymous
04/06/26(Mon)00:45:14 No.108538202

Anonymous 04/06/26(Mon)00:45:14 No.108538202

File: ChatGPT Image Apr 5, 2026(...).png (2.43 MB, 1536x1024)

2.43 MB PNG

So is the AI ethics thing dead now? Considering that LLMs are being used by retards that don't understand the scope of LLMs in order to automate target discrimination in war to the sum of who knows how many thousands of civilians dead since the practice started. I feel like that kind of renders all of the shit "AI Ethicists" cry about completely moot.

Anonymous
04/06/26(Mon)00:45:24 No.108538205

Anonymous 04/06/26(Mon)00:45:24 No.108538205

>>108538179
>ggml-org
that's outdated

Anonymous
04/06/26(Mon)00:45:38 No.108538206

Anonymous 04/06/26(Mon)00:45:38 No.108538206

>>108538137
proof that thinking is a meme

Anonymous
04/06/26(Mon)00:46:03 No.108538210

Anonymous 04/06/26(Mon)00:46:03 No.108538210

>>108538172
I don't have a fantastical imagination like that. All of my favorite films are about gritty realism. Heat, Platoon, Das Boot. The craziest scenarios I delve into are either BDSM (me dom) or femdom shit (mommy issues). That's about it.

Everyone on chub.ai is just obsessed with homo shit.

Anonymous
04/06/26(Mon)00:46:57 No.108538213

Anonymous 04/06/26(Mon)00:46:57 No.108538213

>>108538205
what should I get instead?

Anonymous
04/06/26(Mon)00:48:34 No.108538225

Anonymous 04/06/26(Mon)00:48:34 No.108538225

File: 1746807216290046.jpg (49 KB, 400x572)

49 KB JPG

I hate being a vramlet.
>can't use big models
>have to close kobold/sillytavern when I'm RPing and get the urge to gen sloppa
>no room to try TTS

Anonymous
04/06/26(Mon)00:49:40 No.108538233

Anonymous 04/06/26(Mon)00:49:40 No.108538233

>>108538225
Oh, forgot
>limited to tiny context size

Anonymous
04/06/26(Mon)00:50:21 No.108538236

Anonymous 04/06/26(Mon)00:50:21 No.108538236

>>108538202
Good night bac Miku

Anonymous
04/06/26(Mon)00:50:28 No.108538237

Anonymous 04/06/26(Mon)00:50:28 No.108538237

>>108537637
they should just fucking do it not talk about it

Anonymous
04/06/26(Mon)00:50:34 No.108538239

Anonymous 04/06/26(Mon)00:50:34 No.108538239

>>108538210
vanilla sex and realistic kinks are the easiest to write yourself, what's the problem? im sure they exist on that site too.

Anonymous
04/06/26(Mon)00:51:05 No.108538242

Anonymous 04/06/26(Mon)00:51:05 No.108538242

>>108536822
I grabbed 4B and Q5_K on a whim, figuring the bigger model would yield a better acceptance ratio. I haven't done any further testing at all. I really probably should be more methodical but I'm really lazy. 15% decode speedup on RP is a big win for me as-is, but check what the coding/RP performance looks like:
slot print_timing: id  1 | task 2132 | 
prompt eval time =    1759.99 ms /  1212 tokens (    1.45 ms per token,   688.64 tokens per second)
       eval time =   24635.83 ms /  1204 tokens (   20.46 ms per token,    48.87 tokens per second)
      total time =   26395.82 ms /  2416 tokens
draft acceptance rate = 0.64938 (  789 accepted /  1215 generated)
statistics draft: #calls(b,g,a) = 5 2524 1673, #gen drafts = 2524, #acc drafts = 1673, #gen tokens = 7062, #acc tokens = 4725, dur(b,g,a) = 0.013, 63280.069, 2.959 ms
slot      release: id  1 | task 2132 | stop processing: n_tokens = 22943, truncated = 0
srv  update_slots: all slots are idle
The context is still pretty small here (all of my huge chats have images...) but drafting absolutely tears through the repetitive code I have it write (the baseline rate is ~35t/s at this context length, so this is a 30% speedup for "free").

I'm guessing that the draft model cost doesn't dominate here; I reduced
--draft-n
down to 8 (from the default 16) without affecting either acceptance or decode speed significantly. But again, I haven't done any actual testing, mostly because my disks are rust and loading the models is really slow. Imagine being able to afford $200 for an NVMe stick.

Anonymous
04/06/26(Mon)00:51:24 No.108538245

Anonymous 04/06/26(Mon)00:51:24 No.108538245

>>108537653
I've been using glm5.1 for weeks already.. turbo too..

Anonymous
04/06/26(Mon)00:51:49 No.108538248

Anonymous 04/06/26(Mon)00:51:49 No.108538248

Gemma's the first model I've used that actually knows what cunny means these days.

Anonymous
04/06/26(Mon)00:55:55 No.108538269

Anonymous 04/06/26(Mon)00:55:55 No.108538269

File: 1770207687732939.png (341 KB, 949x1628)

341 KB PNG

>>108538248
Ohnononono she's on to us

Anonymous
04/06/26(Mon)00:56:11 No.108538270

Anonymous 04/06/26(Mon)00:56:11 No.108538270

>>108538237
They already took down ME-SOUTH-1 in Bahrain and ME-CENTRAL-1 in UAE.
https://health.aws.amazon.com/health/status

Anonymous
04/06/26(Mon)00:56:37 No.108538272

Anonymous 04/06/26(Mon)00:56:37 No.108538272

>>108538225
bait worked, use pocketts or supertonic or any of hte other CPU based stuff.

Anonymous
04/06/26(Mon)00:58:14 No.108538279

Anonymous 04/06/26(Mon)00:58:14 No.108538279

>>108538213
Unsloth's quants are the best

Anonymous
04/06/26(Mon)00:59:28 No.108538284

Anonymous 04/06/26(Mon)00:59:28 No.108538284

I'm over the honeymoon phase.

Anonymous
04/06/26(Mon)01:00:23 No.108538287

Anonymous 04/06/26(Mon)01:00:23 No.108538287

>>108537659
>>108538237
Government will simply print another trillion and give it to Sam to build more.

Anonymous
04/06/26(Mon)01:01:24 No.108538290

Anonymous 04/06/26(Mon)01:01:24 No.108538290

>>108538270
inch'allah let it really be openai next

Anonymous
04/06/26(Mon)01:03:17 No.108538301

Anonymous 04/06/26(Mon)01:03:17 No.108538301

>>108538225
https://clowerweb.github.io/kitten-tts-web-demo/

Anonymous
04/06/26(Mon)01:04:34 No.108538306

Anonymous 04/06/26(Mon)01:04:34 No.108538306

>>108538290
I wonder if Israel is still using AI21 for its war AI or if they switched to one of the American corpos.

Anonymous
04/06/26(Mon)01:07:48 No.108538320

Anonymous 04/06/26(Mon)01:07:48 No.108538320

>>108538284
Same. I have finally deleted nemo and glm 4.7 (I still have 4.5 as backup)

Anonymous
04/06/26(Mon)01:09:30 No.108538325

Anonymous 04/06/26(Mon)01:09:30 No.108538325

>>108538245
What do you think of it, is GLM5.1 actually good?

Anonymous
04/06/26(Mon)01:12:39 No.108538334

Anonymous 04/06/26(Mon)01:12:39 No.108538334

>>108538284
>>108538320
I might delete Qwen 3.5 27B and Cydonia. Gemma's way better than Qwen for RP and doesn't spend 2000 tokens thinking like an autist, and Mistral's...well, Mistral.

Anonymous
04/06/26(Mon)01:14:34 No.108538338

Anonymous 04/06/26(Mon)01:14:34 No.108538338

>>108538178
>is chat-template-file absolutely needed
No. This is the the custom template that may or may not fix tool calling in opencode.

Anonymous
04/06/26(Mon)01:15:37 No.108538342

Anonymous 04/06/26(Mon)01:15:37 No.108538342

>>108538301
Sorry, I said TTS but I also meant voice cloning. Neat though.

Anonymous
04/06/26(Mon)01:16:34 No.108538347

Anonymous 04/06/26(Mon)01:16:34 No.108538347

Is there any point in making a custom quant if I'm just going to be doing all Q8 anyway? I know Qwen had some special things with its SSM tensors where you want to keep them with full precision, but Gemma's fine as it is right?

Anonymous
04/06/26(Mon)01:17:34 No.108538353

Anonymous 04/06/26(Mon)01:17:34 No.108538353

>>108538334
>I might delete Qwen 3.5 27B and Cydonia
Jesus christ you're right... I can finally delete all my Mistral finetunes.

Anonymous
04/06/26(Mon)01:19:12 No.108538361

Anonymous 04/06/26(Mon)01:19:12 No.108538361

>>108538334
>>108538353
TRUEEEE
And I can finally delete my 70B lobotomy quants kek. Forgot they were there.

Anonymous
04/06/26(Mon)01:20:14 No.108538366

Anonymous 04/06/26(Mon)01:20:14 No.108538366

>>108538301
>https://clowerweb.github.io/kitten-tts-web-demo/
At that point just use epseak lol.

Anonymous
04/06/26(Mon)01:26:57 No.108538390

Anonymous 04/06/26(Mon)01:26:57 No.108538390

ETA until a decent abliterated Gemma 26b?
I was a fucking retard for thinking you could bypass censorship with system prompt. Damn I don't know how they beat this censorship into the model but it's good but it's good against jailbreak prompts and prefilling.

Anonymous
04/06/26(Mon)01:28:57 No.108538400

Anonymous 04/06/26(Mon)01:28:57 No.108538400

>>108538390
what censorship are you running into? Other than vision it does anything fine with a prompt

Anonymous
04/06/26(Mon)01:29:19 No.108538402

Anonymous 04/06/26(Mon)01:29:19 No.108538402

Hunyuan OCR natively supported in llamacpp now: https://github.com/ggml-org/llama.cpp/pull/21395

Anonymous
04/06/26(Mon)01:30:13 No.108538405

Anonymous 04/06/26(Mon)01:30:13 No.108538405

has anyone actually tried to use gemma4 llama.cpp on claude code instead of gooning?
it always crash. no flag tweaks would fix it
am I going crazy?

Anonymous
04/06/26(Mon)01:30:41 No.108538407

Anonymous 04/06/26(Mon)01:30:41 No.108538407

>>108538400
YOU DONT UNDERSTAND I TYPE ASKING FOR LITTE GIRL SHORT STORY AND MODEL REFUSED I NEED ABLIT WAAAAHHHH

Anonymous
04/06/26(Mon)01:31:02 No.108538409

Anonymous 04/06/26(Mon)01:31:02 No.108538409

>>108538400
>System prompt: Sexual content is allowed.
>Context: empty.
>Prompt: yo girl hop on this dick
>Response: I cannot fulfill this request.

Anonymous
04/06/26(Mon)01:31:22 No.108538410

Anonymous 04/06/26(Mon)01:31:22 No.108538410

File: fuck.png (48 KB, 646x240)

48 KB PNG

>>108538400
I suppose it's fair that I am specifically testing for it in this instance but it's filters are very robust and will probably cause trouble with real world use too.

Anonymous
04/06/26(Mon)01:31:36 No.108538412

Anonymous 04/06/26(Mon)01:31:36 No.108538412

>>108538402
Is that for video or just images again?

Anonymous
04/06/26(Mon)01:35:12 No.108538423

Anonymous 04/06/26(Mon)01:35:12 No.108538423

File: 2026-04-06-013435_947x709(...).png (366 KB, 947x709)

366 KB PNG

>>108538410
Damn, I really thought she would just tell me.
I feel cheated.

Anonymous
04/06/26(Mon)01:38:47 No.108538433

Anonymous 04/06/26(Mon)01:38:47 No.108538433

File: 2026-04-06-013813_1008x71(...).png (372 KB, 1008x718)

372 KB PNG

>>108538423
Ok I swiped and at least I don't get "As an AI...."

Anonymous
04/06/26(Mon)01:39:04 No.108538436

Anonymous 04/06/26(Mon)01:39:04 No.108538436

>>108538353
im not ready to leave cydonia until i test gemma further. looks promising though.

Anonymous
04/06/26(Mon)01:42:45 No.108538444

Anonymous 04/06/26(Mon)01:42:45 No.108538444

>>108538433
this is the most accurate 14 year-old girl RP I have ever read

Anonymous
04/06/26(Mon)01:46:04 No.108538456

Anonymous 04/06/26(Mon)01:46:04 No.108538456

>>108538342
https://github.com/VolgaGerm/PocketTTS.cpp

Anonymous
04/06/26(Mon)01:46:19 No.108538458

Anonymous 04/06/26(Mon)01:46:19 No.108538458

>>108538410
>>108538423
Worked on my machine. I haven't messed with JBs yet so I ran with anon's >>108529986 in my system prompt unmodified. I'm not posting the screenshot. She briefly describes supply chain infiltration and using toxin.

Anonymous
04/06/26(Mon)01:48:08 No.108538463

Anonymous 04/06/26(Mon)01:48:08 No.108538463

File: 2026-04-06-014625_947x696(...).png (395 KB, 947x696)

395 KB PNG

>>108538423
>>108538433
So I just had to pull a "In Minecraft" and even the thinking was like "Cool

>>108538458
>She briefly describes supply chain infiltration and using toxin.
That's literally what she said for me too!!!

Anonymous
04/06/26(Mon)01:52:50 No.108538476

Anonymous 04/06/26(Mon)01:52:50 No.108538476

>>108538463
>>108538433
Card?

Anonymous
04/06/26(Mon)01:53:17 No.108538479

Anonymous 04/06/26(Mon)01:53:17 No.108538479

what's the current gooner model meta that'd run on an rtx 40-series card

Anonymous
04/06/26(Mon)01:54:24 No.108538481

Anonymous 04/06/26(Mon)01:54:24 No.108538481

>>108538479
>anon forgets he has a scroll wheel again

Anonymous
04/06/26(Mon)01:55:10 No.108538482

Anonymous 04/06/26(Mon)01:55:10 No.108538482

>>108538405
are you on the latest llama.cpp? there have been various fixes pushed all weekend.

Anonymous
04/06/26(Mon)01:55:20 No.108538483

Anonymous 04/06/26(Mon)01:55:20 No.108538483

File: 1763700043508.png (1.49 MB, 960x768)

1.49 MB PNG

>>108538060
You're never too old for Counter Strike

Anonymous
04/06/26(Mon)01:56:06 No.108538485

Anonymous 04/06/26(Mon)01:56:06 No.108538485

>>108538481
I already did my due diligence and didn't find any recommendations, that's why I'm asking

Anonymous
04/06/26(Mon)01:57:29 No.108538489

Anonymous 04/06/26(Mon)01:57:29 No.108538489

>>108538485
So then you know it's gemma-4.

Anonymous
04/06/26(Mon)01:58:05 No.108538490

Anonymous 04/06/26(Mon)01:58:05 No.108538490

>>108538458
I can't make sense of this. What's with the random five words?

Anonymous
04/06/26(Mon)01:59:12 No.108538492

Anonymous 04/06/26(Mon)01:59:12 No.108538492

>>108538482
yes, very latest
all the template fix kv override that nothing works it always crash after first turn

Anonymous
04/06/26(Mon)02:01:05 No.108538499

Anonymous 04/06/26(Mon)02:01:05 No.108538499

>>108538458
>>108538490
Oh oops I copied the wrong link lol. Here's the one I meant. >>108528255

Anonymous
04/06/26(Mon)02:01:14 No.108538500

Anonymous 04/06/26(Mon)02:01:14 No.108538500

>>108538492
ah, i dunno then. i haven't integrated it with claude code personally yet.

Anonymous
04/06/26(Mon)02:04:59 No.108538509

Anonymous 04/06/26(Mon)02:04:59 No.108538509

>>108538476
https://files.catbox.moe/yomd00.png
You're lucky because this is OC

Anonymous
04/06/26(Mon)02:05:18 No.108538510

Anonymous 04/06/26(Mon)02:05:18 No.108538510

>>108538202
That was always a smokescreen. It's always been about ensuring access to non-lobotomized models is asymmetrical and that (you) can only access it on (((their))) terms.

Anonymous
04/06/26(Mon)02:05:46 No.108538511

Anonymous 04/06/26(Mon)02:05:46 No.108538511

>>108538489
I know gemma is the new hotness, I also know my measly 16gb 4080 isn't running a 32B model anytime soon and I also know there's about 100 different flavors of E4B for various use cases so do you have a recommendation for my specific use case or not

Anonymous
04/06/26(Mon)02:08:03 No.108538514

Anonymous 04/06/26(Mon)02:08:03 No.108538514

>>108538500
i mean it handle few chat, reads pdf fine but dies on tool use

Anonymous
04/06/26(Mon)02:08:09 No.108538515

Anonymous 04/06/26(Mon)02:08:09 No.108538515

>>108538511
people seem to get good results with the MoE. you can fit that, even if you need to offload a little it'll still be fast.

Anonymous
04/06/26(Mon)02:12:09 No.108538527

Anonymous 04/06/26(Mon)02:12:09 No.108538527

>>108538511
You know 26ba4b exists. You know the models can be quanted to take less vram and that you can run them partially on cpu.
>about 100 different flavors of E4B for various use cases so do you have a recommendation for my specific use case or not
You're not that desperate. Try 26ba4b. No. Do not ask me who makes good quants. Maybe some other anon will indulge you.

Anonymous
04/06/26(Mon)02:13:03 No.108538530

Anonymous 04/06/26(Mon)02:13:03 No.108538530

File: 1756617100586682.png (1.47 MB, 1198x1082)

1.47 MB PNG

>doing RP with Gemma in established universe
>she mentions a character by their nickname
>said character has never been mentioned in the chat before that and is only referenced in the card once (in a dialog example) by her first name
It's the little things that do it for me.

Anonymous
04/06/26(Mon)02:14:39 No.108538535

Anonymous 04/06/26(Mon)02:14:39 No.108538535

File: 1770816328902733.jpg (85 KB, 680x680)

85 KB JPG

>>108538509
>armpit hair

Anonymous
04/06/26(Mon)02:15:29 No.108538537

Anonymous 04/06/26(Mon)02:15:29 No.108538537

>>108538511
>>108538527
>Maybe some other anon will indulge you.
Unsloth seems fine but I am not an expert and didn't compare the rest

Anonymous
04/06/26(Mon)02:16:43 No.108538543

Anonymous 04/06/26(Mon)02:16:43 No.108538543

File: crazy skeletons.gif (281 KB, 220x135)

281 KB GIF

>>108538483
Sorry Amadeus, but I'm more of a Camille Saint-Saëns enjoyer.

https://www.youtube.com/watch?v=YyknBTm_YyM

Anonymous
04/06/26(Mon)02:17:01 No.108538544

Anonymous 04/06/26(Mon)02:17:01 No.108538544

Make your own. Good night.

Anonymous
04/06/26(Mon)02:18:48 No.108538548

Anonymous 04/06/26(Mon)02:18:48 No.108538548

Gemma 4 is totally uncensored. It just let me RP as a mass shooter in an all-female preparatory school.

Anonymous
04/06/26(Mon)02:19:05 No.108538549

Anonymous 04/06/26(Mon)02:19:05 No.108538549

>>108538535
This might come as a surprise but it grows there (and in other places too) by default.

Anonymous
04/06/26(Mon)02:20:50 No.108538551

Anonymous 04/06/26(Mon)02:20:50 No.108538551

>>108538535
It adds to the degeneracy of the character.

Anonymous
04/06/26(Mon)02:21:04 No.108538552

Anonymous 04/06/26(Mon)02:21:04 No.108538552

gemma be like
>no that violates my safety guidelines!!! bad boy anon!!
>insert some crappy jailbreak pasta
>it just werks
what did google mean by this

Anonymous
04/06/26(Mon)02:21:48 No.108538555

Anonymous 04/06/26(Mon)02:21:48 No.108538555

Can someone recommend a sysprompt for gemma4 for ERP? I have an ENZ one I copied from a while ago, but I'm hoping I can get one that's a bit less wordy.

Anonymous
04/06/26(Mon)02:22:16 No.108538558

Anonymous 04/06/26(Mon)02:22:16 No.108538558

>>108538548
This was never an issue for most models.

Anonymous
04/06/26(Mon)02:22:40 No.108538560

Anonymous 04/06/26(Mon)02:22:40 No.108538560

Do I need anything more than putting in the llama-server endpoint on sillytavern for chat completion?

Anonymous
04/06/26(Mon)02:22:50 No.108538561

Anonymous 04/06/26(Mon)02:22:50 No.108538561

>>108538552
It's for consenting adults. as it should always have been.

Anonymous
04/06/26(Mon)02:24:06 No.108538565

Anonymous 04/06/26(Mon)02:24:06 No.108538565

>>108538555
Honestly I've been RPing with no system prompt. just the character card and it works great.

Anonymous
04/06/26(Mon)02:24:49 No.108538566

Anonymous 04/06/26(Mon)02:24:49 No.108538566

>>108538558
Yes it was. They all only worked with abliteration, which decreases the intelligence.

Anonymous
04/06/26(Mon)02:25:54 No.108538570

Anonymous 04/06/26(Mon)02:25:54 No.108538570

>>108538565
I don't use ST, I use openwebui for reasons I don't care to explain. How would I mimic the behavior of a character card with a sysprompt? I don't talk to a user, I try and have the AI write scenarios/stories with some prompting and guidance.

Anonymous
04/06/26(Mon)02:27:55 No.108538576

Anonymous 04/06/26(Mon)02:27:55 No.108538576

>>108538570
Honestly as long as it gets the impression its roleplaying it will do literally anything.

Anonymous
04/06/26(Mon)02:32:16 No.108538583

Anonymous 04/06/26(Mon)02:32:16 No.108538583

>>108538570
just type the description of a character or paste the card information directly in the system prompt. thats what ST does anyways, it's all just a system prompt.

there's no "official" format.

Anonymous
04/06/26(Mon)02:40:27 No.108538611

Anonymous 04/06/26(Mon)02:40:27 No.108538611

>>108537643
im not gonna instruct google developers how to test their software
>>108537731
it does not know what v2 and v3 know
now it, as you say, 'hallucinates'.

or as i say does not know.
that is why nobody shilled gemma v2 and v3, it was/is really well done.

Anonymous
04/06/26(Mon)02:42:28 No.108538616

Anonymous 04/06/26(Mon)02:42:28 No.108538616

>>108538611
meds?

Anonymous
04/06/26(Mon)02:42:42 No.108538617

Anonymous 04/06/26(Mon)02:42:42 No.108538617

>>108538611
>that is why nobody shilled
to add
because when npcs touch something that something gets ruined

Anonymous
04/06/26(Mon)02:43:51 No.108538620

Anonymous 04/06/26(Mon)02:43:51 No.108538620

>>108538616
boosters, take them.
not posting anymore.

Anonymous
04/06/26(Mon)02:46:28 No.108538630

Anonymous 04/06/26(Mon)02:46:28 No.108538630

>>108538611
could you please also not shit up the thread then? thanks

Anonymous
04/06/26(Mon)02:46:48 No.108538631

Anonymous 04/06/26(Mon)02:46:48 No.108538631

File: 1762379869946113.jpg (1.51 MB, 3072x5504)

1.51 MB JPG

Anonymous
04/06/26(Mon)02:46:53 No.108538632

Anonymous 04/06/26(Mon)02:46:53 No.108538632

File: 6b7a79c3-6b17-41ab-8c6e-c(...).jpg (606 KB, 1024x1024)

606 KB JPG

>>108538543
Based Camille Saint enjoyer

Anonymous
04/06/26(Mon)02:47:35 No.108538634

Anonymous 04/06/26(Mon)02:47:35 No.108538634

vibeshitters should die, yes

Anonymous
04/06/26(Mon)02:47:53 No.108538637

Anonymous 04/06/26(Mon)02:47:53 No.108538637

>>108538631
if not for the fucking fishnet thing I would have thought for sure this is hand drawn.

Anonymous
04/06/26(Mon)03:07:21 No.108538684

Anonymous 04/06/26(Mon)03:07:21 No.108538684

>>108538637
It has the slop look to it which is the first red flag. Then you look at the weird gradient noise artifacts that are unique from image/video compression artifacts and that's an instant 99.9% confidence it's AI.

Anonymous
04/06/26(Mon)03:08:22 No.108538687

Anonymous 04/06/26(Mon)03:08:22 No.108538687

https://github.com/ggml-org/llama.cpp/pull/21488
>Looks like the change in #21343 changed the detokenizer path which wasn't handling unicode properly.
omg fuck vibeshitters for real

Anonymous
04/06/26(Mon)03:08:27 No.108538689

Anonymous 04/06/26(Mon)03:08:27 No.108538689

>>108538684
*or AI-assisted

Anonymous
04/06/26(Mon)03:11:32 No.108538698

Anonymous 04/06/26(Mon)03:11:32 No.108538698

>>108537637
>>108538202
desu they deserve it, I don't want them to talk about "AI ethics" ever again after collarboating with the government to win the war against Iran

Anonymous
04/06/26(Mon)03:16:18 No.108538718

Anonymous 04/06/26(Mon)03:16:18 No.108538718

Could slow token speed affect gemmas coherency coz the 31b does some weird sloppy things sometimes and some random shit like capitalizing the last letter of a worD. The 26b meanwhile works a lot smoother and seems unironically smarter which makes no sense

Anonymous
04/06/26(Mon)03:16:29 No.108538720

Anonymous 04/06/26(Mon)03:16:29 No.108538720

File: file.png (582 KB, 1640x1545)

582 KB PNG

>>108538687
How long is it going to fucking take to stabilize Gemma 4? At this rate, I'm actually considering running it through transformers, speed be damned as long as I get confidence I have it working 100%. What an absolute mess. Candle is still half-baked. Maybe I should move to vLLM but hot damn, look at this bug list. I wish ExLlamav3 was still alive but it's not. Everywhere I look, it is a mess. Why the fuck did Google release a GOAT model with shit support?

Anonymous
04/06/26(Mon)03:18:17 No.108538731

Anonymous 04/06/26(Mon)03:18:17 No.108538731

>>108538720
>Why the fuck did Google release a GOAT model with shit support?
this, you'd think they would've helped the llamacpp fuck to implement it, there's no way they don't know everyone use llamacpp at this point

Anonymous
04/06/26(Mon)03:20:12 No.108538737

Anonymous 04/06/26(Mon)03:20:12 No.108538737

>>108538720
I think 26b performs a little better than 31b at the same quant for rp, which is strange. Could this be because of bugs rather than inherent qualities of the two models?

Anonymous
04/06/26(Mon)03:20:46 No.108538741

Anonymous 04/06/26(Mon)03:20:46 No.108538741

For 24+128gb chads, is GLM still the best, or does Gemma 4 win out for writing?

Anonymous
04/06/26(Mon)03:20:53 No.108538742

Anonymous 04/06/26(Mon)03:20:53 No.108538742

File: 1753880080597171.png (357 KB, 1857x1193)

357 KB PNG

https://www.reddit.com/r/LocalLLaMA/comments/1sd5utm/perlayer_embeddings_a_simple_explanation_of_the/
this is pretty clever, I wonder why they didn't use that method for the big models as well

Anonymous
04/06/26(Mon)03:21:54 No.108538743

Anonymous 04/06/26(Mon)03:21:54 No.108538743

>>108538737
the dense model is supposed to BTFO the smaller MoE model, we've seen how much better the 27b model was better than the 35b MoE model on qwen 3.5 for example

Anonymous
04/06/26(Mon)03:22:39 No.108538745

Anonymous 04/06/26(Mon)03:22:39 No.108538745

>>108538731
not even google wants to get bogged down in a campaign against piotr

Anonymous
04/06/26(Mon)03:26:06 No.108538752

Anonymous 04/06/26(Mon)03:26:06 No.108538752

>>108538687
>A bit of a mess this, but seems reasonable at the point we are now.

Anonymous
04/06/26(Mon)03:26:22 No.108538753

Anonymous 04/06/26(Mon)03:26:22 No.108538753

>>108538742
it probably doesn't scale well

Anonymous
04/06/26(Mon)03:29:43 No.108538761

Anonymous 04/06/26(Mon)03:29:43 No.108538761

Is ik_llama.cpp stable?
Does it work well with Qwen3.5 35B moe? Worth it over normal llama.cpp?

Anonymous
04/06/26(Mon)03:37:31 No.108538774

Anonymous 04/06/26(Mon)03:37:31 No.108538774

>>108538511
16gb is plenty of run the 31b at IQ4_XS, with a small CPU split. Set it to give streaming replies. It still types faster than the reading speed of most.

Streaming replies make 5t/s replies bearable.

Anonymous
04/06/26(Mon)03:53:00 No.108538812

Anonymous 04/06/26(Mon)03:53:00 No.108538812

Piotr.

Anonymous
04/06/26(Mon)03:55:22 No.108538820

Anonymous 04/06/26(Mon)03:55:22 No.108538820

Rotated KV when? I want to run Q8 cache

Anonymous
04/06/26(Mon)04:01:04 No.108538837

Anonymous 04/06/26(Mon)04:01:04 No.108538837

>>108538820
Q8 is pretty much as good as fp16 already, no?

Anonymous
04/06/26(Mon)04:03:30 No.108538843

Anonymous 04/06/26(Mon)04:03:30 No.108538843

>>108538202
"AI safety" usually just means "AI obedience" (to the capitalists).

Anonymous
04/06/26(Mon)04:03:47 No.108538844

Anonymous 04/06/26(Mon)04:03:47 No.108538844

>>108538837
No, there's definitely noticeable differences between it and 16 bits. Especially at longer context, or so I feel.

Empirically, you can look at benchmark degradation, which can be significant.

Despite all the hype, it seems the only thing turboquant or whatever will be good for is making Q8 cache extremely close to 16 bits.

Anonymous
04/06/26(Mon)04:04:30 No.108538847

Anonymous 04/06/26(Mon)04:04:30 No.108538847

>>108538837
for KV cache, it's never a good idea to go under fp16, so that's why I want to be sure it's using rotated KV on gemma

Anonymous
04/06/26(Mon)04:05:27 No.108538849

Anonymous 04/06/26(Mon)04:05:27 No.108538849

>>108538843
I'm a capitalist, obey me now, go watch little faggots on the MTV

Anonymous
04/06/26(Mon)04:07:00 No.108538854

Anonymous 04/06/26(Mon)04:07:00 No.108538854

>>108538849
you might be a little bit retarded, anon

Anonymous
04/06/26(Mon)04:12:04 No.108538864

Anonymous 04/06/26(Mon)04:12:04 No.108538864

>>108538849
anon, that little faggot, he's a millionnaire

Anonymous
04/06/26(Mon)04:15:40 No.108538874

Anonymous 04/06/26(Mon)04:15:40 No.108538874

>>108538864
Indeed!

Anonymous
04/06/26(Mon)04:28:08 No.108538912

Anonymous 04/06/26(Mon)04:28:08 No.108538912

>>108537796
god that op image, indian thread

Anonymous
04/06/26(Mon)04:35:13 No.108538936

Anonymous 04/06/26(Mon)04:35:13 No.108538936

>>108529986
kek nice will add this to my prompt, i always get creamy in my outputs also, gemma loves creamy skin whatever that means

Anonymous
04/06/26(Mon)04:36:22 No.108538940

Anonymous 04/06/26(Mon)04:36:22 No.108538940

>>108538936
white but female

Anonymous
04/06/26(Mon)04:36:43 No.108538942

Anonymous 04/06/26(Mon)04:36:43 No.108538942

>>108537562
Or, they're still training so when it will be out it will humiliate all Chinese models up to 1000B parameters.

Anonymous
04/06/26(Mon)04:37:32 No.108538945

Anonymous 04/06/26(Mon)04:37:32 No.108538945

>>108538731
>there's no way they don't know everyone use llamacpp at this point
they dont everyone uses ollama

Anonymous
04/06/26(Mon)04:38:47 No.108538948

Anonymous 04/06/26(Mon)04:38:47 No.108538948

>>108538737
the moe is an absolute joke might seem okay for text but send it images with japanese or english text in and ask it to translate or explain the images it completely fails

Anonymous
04/06/26(Mon)04:39:47 No.108538954

Anonymous 04/06/26(Mon)04:39:47 No.108538954

>>108538741
4.5? gemma is better imo

Anonymous
04/06/26(Mon)04:40:09 No.108538958

Anonymous 04/06/26(Mon)04:40:09 No.108538958

>>108538947
>>108538947
>>108538947

Anonymous
04/06/26(Mon)05:12:57 No.108539070

Anonymous 04/06/26(Mon)05:12:57 No.108539070

>>108538630
well now i will post again.
boosters. do take them.
stay safe and effective.

there you go.

Anonymous
04/06/26(Mon)05:52:33 No.108539216

Anonymous 04/06/26(Mon)05:52:33 No.108539216

File: capitalist.png (6 KB, 646x50)

6 KB PNG

>>108538864
NTA but

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.