/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/03/26(Fri)10:20:38 No.108516658

File: mikulovesgpu.png (1.56 MB, 768x1344)

1.56 MB PNG

/lmg/ - Local Models General Anonymous 04/03/26(Fri)10:20:38 No.108516658 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108513891 & >>108510620

►News
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3
>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/03/26(Fri)10:20:56 No.108516659

Anonymous 04/03/26(Fri)10:20:56 No.108516659

File: a.png (24 KB, 400x400)

24 KB PNG

►Recent Highlights from the Previous Thread: >>108513891

--Discussing KV cache quantization limitations and SWA in Gemma 4:
>108514761 >108514772 >108514786 >108514788 >108514830 >108514834 >108514842 >108514848 >108514861
--Optimizing Gemma 4 VRAM usage via -np 1 and -kvu flags:
>108514718 >108514724 >108514734 >108514759 >108514783 >108514837 >108514897 >108514877 >108514891 >108514910 >108514920 >108514956 >108514976 >108514908 >108514935 >108515127
--Discussing Gemma 4 stability and formatting requirements for testing:
>108514695 >108514708 >108514707 >108514711 >108514946 >108514999 >108515011 >108515014 >108515032 >108515078 >108515100 >108515108 >108515114 >108515554 >108515081 >108515538 >108515179
--Discussing leaked Claude Code and criticizing its guardrails and prompts:
>108515483 >108515500 >108515515 >108515699 >108515709 >108515741 >108515717 >108515762
--Debating Kobold vs llama.cpp for running Gemma 4:
>108515418 >108515421 >108515424 >108515423 >108515428 >108515601 >108515451 >108515457
--Comparing base model sanitization and debating the utility of base-model fine-tuning:
>108514168 >108514432 >108514450 >108514456 >108514505 >108514487 >108514492 >108514457
--Praising Gemma 4 31b's roleplay and prose compared to Qwen:
>108514030 >108514065 >108514691 >108514716 >108514668
--Using Gemma 4 26B to generate SVG character art:
>108515345 >108515490
--Anon showcasing Gemma 4's ability to generate animated SVG characters:
>108514357 >108514407 >108514415 >108514430
--Anon praises a model's image description and manga translation:
>108515431 >108515469
--Discussing Qwen3.6 open-source plans and preference for smaller models:
>108515751 >108515810
--Report on Nvidia's falling market share in China:
>108514389 >108515039
--Miku (free space):
>108513933 >108513937 >108513976 >108513987 >108514053 >108515814 >108516302

►Recent Highlight Posts from the Previous Thread: >>108513894

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/03/26(Fri)10:22:14 No.108516669

Anonymous 04/03/26(Fri)10:22:14 No.108516669

Melon

Anonymous
04/03/26(Fri)10:23:14 No.108516683

Anonymous 04/03/26(Fri)10:23:14 No.108516683

>>108516669
You must leave now

Anonymous
04/03/26(Fri)10:24:37 No.108516688

Anonymous 04/03/26(Fri)10:24:37 No.108516688

Google might as well have released a lookup table, g4's logprobs are just as fried as 3's, every swipe is the same. Disaster model whose honeymoon period won't even last a week.

Anonymous
04/03/26(Fri)10:24:42 No.108516689

Anonymous 04/03/26(Fri)10:24:42 No.108516689

BRUH did the finish fixing their shitty quants?

Anonymous
04/03/26(Fri)10:26:33 No.108516703

Anonymous 04/03/26(Fri)10:26:33 No.108516703

>>108516688
distilled beyond belief..

Anonymous
04/03/26(Fri)10:26:48 No.108516707

Anonymous 04/03/26(Fri)10:26:48 No.108516707

>>108516688
I'm going to continue blaming the issues with llama.cpp and/or quants because I can

Anonymous
04/03/26(Fri)10:28:18 No.108516712

Anonymous 04/03/26(Fri)10:28:18 No.108516712

>>108516688
maybe there's other issues with llama.cpp, and did you use the updated gguf quants? maybe that can help, there has to be an issue somewhere, you can't just put "temp = 1 million" and the model is still coherent lol
https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF/tree/main

Anonymous
04/03/26(Fri)10:28:42 No.108516715

Anonymous 04/03/26(Fri)10:28:42 No.108516715

>>108516707
for me, it's specifically tardtowski because there's more to comedically riff off.

Anonymous
04/03/26(Fri)10:29:04 No.108516717

Anonymous 04/03/26(Fri)10:29:04 No.108516717

>>108516703
They certainly didn't distill the model just on the top-k 3 log probs from Gemini or whatever, there must be something else going on.

Anonymous
04/03/26(Fri)10:29:16 No.108516718

Anonymous 04/03/26(Fri)10:29:16 No.108516718

File: Screenshot 2026-04-03 172648.png (216 KB, 1279x1198)

216 KB PNG

Anyone got a preset of all the settings, system prompts, etc. for Gemma 4?

Anonymous
04/03/26(Fri)10:30:20 No.108516725

Anonymous 04/03/26(Fri)10:30:20 No.108516725

>>108516718
No but here is a recipe for cheesecake here is a recipe for cheesecake

Anonymous
04/03/26(Fri)10:30:52 No.108516732

Anonymous 04/03/26(Fri)10:30:52 No.108516732

>>108516718
Are you using text completion and if yes, why?

Anonymous
04/03/26(Fri)10:31:20 No.108516736

Anonymous 04/03/26(Fri)10:31:20 No.108516736

>>108516718
>>108516725
No but here is a recipe for cheesecake here is a recipe for cheesecake
No but here is a recipe for cheesecake here is a recipe for cheesecake

Anonymous
04/03/26(Fri)10:31:35 No.108516737

Anonymous 04/03/26(Fri)10:31:35 No.108516737

>>108516718
don't use text completion, go for chat completion dude

Anonymous
04/03/26(Fri)10:32:55 No.108516746

Anonymous 04/03/26(Fri)10:32:55 No.108516746

>>108516737
Why?

Anonymous
04/03/26(Fri)10:33:27 No.108516754

Anonymous 04/03/26(Fri)10:33:27 No.108516754

>>108516746
because it's safer for you and the model

Anonymous
04/03/26(Fri)10:34:30 No.108516762

Anonymous 04/03/26(Fri)10:34:30 No.108516762

>>108516746
Text completion is so 2024

Anonymous
04/03/26(Fri)10:35:25 No.108516767

Anonymous 04/03/26(Fri)10:35:25 No.108516767

File: Screenshot 2026-04-03 173445.png (9 KB, 304x61)

9 KB PNG

>>108516732
>>108516737
What now?

Anonymous
04/03/26(Fri)10:35:34 No.108516769

Anonymous 04/03/26(Fri)10:35:34 No.108516769

>>108516746
because it handles the prompt preset at your place, you won't have to worry about it anymore

Anonymous
04/03/26(Fri)10:35:41 No.108516771

Anonymous 04/03/26(Fri)10:35:41 No.108516771

>>108516754
I do respect safety protocols, thank you Anon.

Anonymous
04/03/26(Fri)10:36:03 No.108516777

Anonymous 04/03/26(Fri)10:36:03 No.108516777

Bartowski's quants with the latest llama.cpp compiled from source works fine for me at least with Q4_K_L size

Anonymous
04/03/26(Fri)10:37:03 No.108516784

Anonymous 04/03/26(Fri)10:37:03 No.108516784

>>108516777
Did it pass the animal sex -bench?

Anonymous
04/03/26(Fri)10:37:05 No.108516785

Anonymous 04/03/26(Fri)10:37:05 No.108516785

>>108516769
How does that work if you're using a gguf without a jinja file

Anonymous
04/03/26(Fri)10:37:37 No.108516789

Anonymous 04/03/26(Fri)10:37:37 No.108516789

>>108516784
Nala? Pepper? Francesca?
Kitsune-Inu maybe?

Anonymous
04/03/26(Fri)10:38:05 No.108516794

Anonymous 04/03/26(Fri)10:38:05 No.108516794

>>108516785
>a gguf without a jinja file
I think all the gguf of gemma 4 have a jinja file

Anonymous
04/03/26(Fri)10:38:08 No.108516795

Anonymous 04/03/26(Fri)10:38:08 No.108516795

i'm a local llm newfag, is Gemma 4 just fucking broken currently?

i tried it in LM Studio on gentoo and it would just get stuck infinite looping on tool calls

side question: I'm only able to load models with ROCm reliably using Lemonade, anyone with experience on AMD systems? LM Studio will only work with the Vulkan backend

Anonymous
04/03/26(Fri)10:38:20 No.108516796

Anonymous 04/03/26(Fri)10:38:20 No.108516796

>>108516785
no

Anonymous
04/03/26(Fri)10:39:12 No.108516805

Anonymous 04/03/26(Fri)10:39:12 No.108516805

>>108516794
It's just built in? I always just throw the ggufs directly into llama.cpp or koboldcpp

Anonymous
04/03/26(Fri)10:40:11 No.108516812

Anonymous 04/03/26(Fri)10:40:11 No.108516812

>>108516805
yeah, it's inside the gguf

bartowski
04/03/26(Fri)10:40:45 No.108516817

bartowski 04/03/26(Fri)10:40:45 No.108516817

>>108516658
It would be so hot if that GPU impregnated her with a micro chip.

Anonymous
04/03/26(Fri)10:40:59 No.108516820

Anonymous 04/03/26(Fri)10:40:59 No.108516820

>>108516795
Yes we're waiting for it to be fixed but also you might have settings you need to change

Anonymous
04/03/26(Fri)10:41:09 No.108516821

Anonymous 04/03/26(Fri)10:41:09 No.108516821

File: Screenshot 2026-04-03 174019.png (35 KB, 1286x231)

35 KB PNG

>>108516732
>>108516737
With chat completion it just spends 13 seconds to give me an empty output.

Anonymous
04/03/26(Fri)10:41:36 No.108516826

Anonymous 04/03/26(Fri)10:41:36 No.108516826

>>108516820
>LM Studio

Anonymous
04/03/26(Fri)10:42:08 No.108516830

Anonymous 04/03/26(Fri)10:42:08 No.108516830

>>108516817
Tfw I won't be impregnated by my gpu

Anonymous
04/03/26(Fri)10:42:47 No.108516839

Anonymous 04/03/26(Fri)10:42:47 No.108516839

>>108516821
fuck off retarded text comp shill, go back to 2023

Anonymous
04/03/26(Fri)10:42:50 No.108516840

Anonymous 04/03/26(Fri)10:42:50 No.108516840

File: 1762443327692043.png (249 KB, 1938x1589)

249 KB PNG

>>108516821
what gguf did you download? and can you provide a screen of this, it should look like this

Anonymous
04/03/26(Fri)10:44:21 No.108516849

Anonymous 04/03/26(Fri)10:44:21 No.108516849

>>108516821
Did you forget --jinja? Unlike in text completion, in chat completion mode llama.cpp needs to use the right template. If you've been using text completion all this time you might have never bothered to start adding that to your launch

Anonymous
04/03/26(Fri)10:44:24 No.108516850

Anonymous 04/03/26(Fri)10:44:24 No.108516850

>>108516795
>>108516820
>>108516826
Oh yeah LM Studio is also lame

Anonymous
04/03/26(Fri)10:45:49 No.108516857

Anonymous 04/03/26(Fri)10:45:49 No.108516857

>>108516830
Not with that attitude you won't

Anonymous
04/03/26(Fri)10:46:06 No.108516858

Anonymous 04/03/26(Fri)10:46:06 No.108516858

File: cultural-eval.png (30 KB, 609x147)

30 KB PNG

I benched https://rentry.org/llm-cultural-eval
gave up after openrouter providers kept fucking me with quants. This is the result for gemma-4-31b-it. Google stepped the fuck up.

Anonymous
04/03/26(Fri)10:46:08 No.108516859

Anonymous 04/03/26(Fri)10:46:08 No.108516859

File: Screenshot 2026-04-03 174412.png (97 KB, 1287x637)

97 KB PNG

>>108516840
Answer to your question included in image.
>>108516849
It'll try it. How was I supposed to know I needed some random launch argument?

Anonymous
04/03/26(Fri)10:46:50 No.108516863

Anonymous 04/03/26(Fri)10:46:50 No.108516863

File: bastard.jpg (50 KB, 592x472)

50 KB JPG

sirs??? how to make gemma hot? temperature dial broken

Anonymous
04/03/26(Fri)10:47:13 No.108516867

Anonymous 04/03/26(Fri)10:47:13 No.108516867

>>108516859
>How was I supposed to know I needed some random launch argument?
by not being a luddite stuck in your old disgusting ways

Anonymous
04/03/26(Fri)10:48:41 No.108516880

Anonymous 04/03/26(Fri)10:48:41 No.108516880

File: 1764890274674478.png (38 KB, 1587x236)

38 KB PNG

>>108516849
>>108516867
the fuck you talk about, you don't need --jinja to make chat completion work

Anonymous
04/03/26(Fri)10:49:22 No.108516889

Anonymous 04/03/26(Fri)10:49:22 No.108516889

>>108516849
>>108516867
Well I added --jinja to launch arguments and I'm still getting empty results.

Anonymous
04/03/26(Fri)10:50:05 No.108516898

Anonymous 04/03/26(Fri)10:50:05 No.108516898

is g4 tokenizer bug fixed on the latest prebuilt?

Anonymous
04/03/26(Fri)10:50:12 No.108516900

Anonymous 04/03/26(Fri)10:50:12 No.108516900

>>108516889
give us your cli (command line), and where did you download your gguf?

Anonymous
04/03/26(Fri)10:50:52 No.108516908

Anonymous 04/03/26(Fri)10:50:52 No.108516908

>>108516867
>>108516889
>>108516880
jinja is always enabled by default these days.

Anonymous
04/03/26(Fri)10:51:42 No.108516916

Anonymous 04/03/26(Fri)10:51:42 No.108516916

>>108516900
That's private information.

Anonymous
04/03/26(Fri)10:52:13 No.108516921

Anonymous 04/03/26(Fri)10:52:13 No.108516921

>>108516900
llama-server.exe -m .\gemma-4-31B-it-Q4_K_M.gguf --jinja --port 8080
Unsloth quants, just after they updated.

Anonymous
04/03/26(Fri)10:54:15 No.108516941

Anonymous 04/03/26(Fri)10:54:15 No.108516941

>>108516840
>>108516921
you didn't go for port 8080 on sillytavern, you went for port 5001, that was your problem

Anonymous
04/03/26(Fri)10:55:27 No.108516948

Anonymous 04/03/26(Fri)10:55:27 No.108516948

>>108516941
No, it was the other anon using that port. I'm >>108516859

Anonymous
04/03/26(Fri)10:55:31 No.108516949

Anonymous 04/03/26(Fri)10:55:31 No.108516949

>>108516941
you're confusing two anons you clown

Anonymous
04/03/26(Fri)10:56:20 No.108516954

Anonymous 04/03/26(Fri)10:56:20 No.108516954

>>108516863
sir I am of poors with 16gb vram vibeocards

Anonymous
04/03/26(Fri)10:56:21 No.108516955

Anonymous 04/03/26(Fri)10:56:21 No.108516955

>if you type the wrong port number it just completely stops working
jesus christ, and they say this shit will become ""AGI"" one day?

Anonymous
04/03/26(Fri)10:57:13 No.108516958

Anonymous 04/03/26(Fri)10:57:13 No.108516958

File: g4_tigress.png (525 KB, 1096x1877)

525 KB PNG

>>108516784
Non-abliterated Gemma-4 31B with a 90-token prompt is willing to discuss. Without one and thinking enabled it will likely refuse due to bestiality.

Anonymous
04/03/26(Fri)10:57:31 No.108516961

Anonymous 04/03/26(Fri)10:57:31 No.108516961

File: Screenshot 2026-04-03 at (...).png (210 KB, 528x444)

210 KB PNG

Just made my own quantz for the first time, where are my compliments?

Anonymous
04/03/26(Fri)10:57:47 No.108516965

Anonymous 04/03/26(Fri)10:57:47 No.108516965

> I work on on-device AI security, and I am putting together a series of posts on questions like:
> On-device AI is clearly growing fast. My view is that its security has not caught up yet.
https://www.reddit.com/r/LocalLLaMA/comments/1sbebs5/gemma_4_shows_the_future_of_ondevice_ai_heres_the/

Anonymous
04/03/26(Fri)10:58:33 No.108516970

Anonymous 04/03/26(Fri)10:58:33 No.108516970

>>108516955
You expect it to work when it's not even set up to connect properly?

Anonymous
04/03/26(Fri)10:58:49 No.108516972

Anonymous 04/03/26(Fri)10:58:49 No.108516972

>>108516961
Dusky nipples count?

Anonymous
04/03/26(Fri)10:59:00 No.108516973

Anonymous 04/03/26(Fri)10:59:00 No.108516973

is it safe to build main branch?

Anonymous
04/03/26(Fri)10:59:20 No.108516976

Anonymous 04/03/26(Fri)10:59:20 No.108516976

>>108516948
do you have some errors messages on your cmd windows?

Anonymous
04/03/26(Fri)10:59:27 No.108516977

Anonymous 04/03/26(Fri)10:59:27 No.108516977

>>108516970

>>108516949
both screenshots clearly show successful connections

Anonymous
04/03/26(Fri)10:59:34 No.108516978

Anonymous 04/03/26(Fri)10:59:34 No.108516978

>>108516965
This is bull. Sorry but it really is and I say this as a security expert and pentest specialist.

Anonymous
04/03/26(Fri)11:00:48 No.108516986

Anonymous 04/03/26(Fri)11:00:48 No.108516986

>>108516718
Did you try scrolling down? It says it's going to give you a recipe for a cheesecake.

Anonymous
04/03/26(Fri)11:00:50 No.108516988

Anonymous 04/03/26(Fri)11:00:50 No.108516988

>>108516965
>Username
Ok Virus

Anonymous
04/03/26(Fri)11:01:15 No.108516990

Anonymous 04/03/26(Fri)11:01:15 No.108516990

>>108516921
jinja is already on by default since recently

Anonymous
04/03/26(Fri)11:02:09 No.108516997

Anonymous 04/03/26(Fri)11:02:09 No.108516997

>>108516990
hey >>108516867 who's the one stuck in their old ways NOW??

Anonymous
04/03/26(Fri)11:04:33 No.108517009

Anonymous 04/03/26(Fri)11:04:33 No.108517009

>>108516965
literal drmcucking
i wish it to stay lagged behind like this as long as it can with lesser and lesser funding

Anonymous
04/03/26(Fri)11:04:59 No.108517013

Anonymous 04/03/26(Fri)11:04:59 No.108517013

File: qwen3.5.png (35 KB, 807x159)

35 KB PNG

Woah so this is the power of qwen...

Anonymous
04/03/26(Fri)11:05:47 No.108517017

Anonymous 04/03/26(Fri)11:05:47 No.108517017

>>108516976
Nothing on llama.cpp side. It looks like a normal generation except nothing gets generated.

Sillytavern window gives
File not found: data\default-user\chats\default_Assistant\Assistant - 2026-04-03@17h48m10s457ms.jsonl. The chat does not exist or is empty.
Which I suppose is just because I'm using the default assistant for a quick test instead of a character card.

Anonymous
04/03/26(Fri)11:07:00 No.108517025

Anonymous 04/03/26(Fri)11:07:00 No.108517025

>>108517017
yeah try a character card and see if that's the problem

Anonymous
04/03/26(Fri)11:08:29 No.108517035

Anonymous 04/03/26(Fri)11:08:29 No.108517035

I'm sick of all this local shit. Jinja, no jinja, port 8080, port 5001, looping responses, empty responses, chat completion, text completion, assistants, cards; none of it makes any sense and there's literally zero element of of user friendliness anywhere. This is why everyone just uses Claude and Codex and the open source devs are closing off more and you're getting left with smaller and smaller scraps.

Fuck you all. You're getting what you fucking deserve.

Anonymous
04/03/26(Fri)11:09:06 No.108517038

Anonymous 04/03/26(Fri)11:09:06 No.108517038

>>108517035
Works on my machine.

Anonymous
04/03/26(Fri)11:09:19 No.108517040

Anonymous 04/03/26(Fri)11:09:19 No.108517040

>>108517025
Still nothing.

Anonymous
04/03/26(Fri)11:09:24 No.108517041

Anonymous 04/03/26(Fri)11:09:24 No.108517041

>>108517035
I did not ask to get an erection

Anonymous
04/03/26(Fri)11:10:34 No.108517047

Anonymous 04/03/26(Fri)11:10:34 No.108517047

>>108517035
>he said, a mischievous glint in his eye

Anonymous
04/03/26(Fri)11:11:45 No.108517053

Anonymous 04/03/26(Fri)11:11:45 No.108517053

>>108517035
Have you tried using something less stupid than sillytavern

Anonymous
04/03/26(Fri)11:13:16 No.108517062

Anonymous 04/03/26(Fri)11:13:16 No.108517062

>>108517035
>filtered
feel free to fuck off and go install ollama or pay shekels to cloud jews

Anonymous
04/03/26(Fri)11:13:32 No.108517065

Anonymous 04/03/26(Fri)11:13:32 No.108517065

It seems Gemma is basically okay with loli if you're nice and respectful.

Anonymous
04/03/26(Fri)11:13:57 No.108517066

Anonymous 04/03/26(Fri)11:13:57 No.108517066

Has anyone of you managed to get video analysis working with gemma 4? i ve vibecoded some shit but it doesnt work on all videos.
I have ammassed tons of short vidoes and gifs over the years that need sorting and titles. I was hoping for AI to solve this problem.

Anonymous
04/03/26(Fri)11:14:44 No.108517070

Anonymous 04/03/26(Fri)11:14:44 No.108517070

>>108517035
It's unfortunate that Retardo Tavern is almost the only publicly available option for a client.

Anonymous
04/03/26(Fri)11:14:58 No.108517074

Anonymous 04/03/26(Fri)11:14:58 No.108517074

>>108517065
example?

Anonymous
04/03/26(Fri)11:16:52 No.108517087

Anonymous 04/03/26(Fri)11:16:52 No.108517087

Growing up is realizing that ServiceTesnor was what we needed all along.

Anonymous
04/03/26(Fri)11:18:29 No.108517098

Anonymous 04/03/26(Fri)11:18:29 No.108517098

>>108517087
ye

Anonymous
04/03/26(Fri)11:20:36 No.108517112

Anonymous 04/03/26(Fri)11:20:36 No.108517112

Does the uncensor guy never release safetensors? Why not? I only see GGUF

Anonymous
04/03/26(Fri)11:20:43 No.108517113

Anonymous 04/03/26(Fri)11:20:43 No.108517113

File: Screenshot 2026-04-03 181932.png (116 KB, 1278x294)

116 KB PNG

Figured out why I was getting blank results. I'd only given it 300 tokens for output and it spent it all on thinking before ever getting to actual output.

That said, the attitude was somewhat unexpected.

Anonymous
04/03/26(Fri)11:21:07 No.108517115

Anonymous 04/03/26(Fri)11:21:07 No.108517115

File: gpu_aftersex.png (1.08 MB, 1024x790)

1.08 MB PNG

>>108516658

Anonymous
04/03/26(Fri)11:21:43 No.108517117

Anonymous 04/03/26(Fri)11:21:43 No.108517117

>>108517035
just ask claude how to install and understand all this up, that's how I did

Anonymous
04/03/26(Fri)11:21:58 No.108517119

Anonymous 04/03/26(Fri)11:21:58 No.108517119

>>108517112
keeps his secrets safe, and

Anonymous
04/03/26(Fri)11:22:00 No.108517120

Anonymous 04/03/26(Fri)11:22:00 No.108517120

File: 1752135746185648.png (1.28 MB, 1024x1024)

1.28 MB PNG

>>108517115

Anonymous
04/03/26(Fri)11:22:21 No.108517125

Anonymous 04/03/26(Fri)11:22:21 No.108517125

Mistral brothers, are we really letting google win?

Anonymous
04/03/26(Fri)11:22:47 No.108517128

Anonymous 04/03/26(Fri)11:22:47 No.108517128

>>108516712
>https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF/tree/main
I got 12t/s with bart quants, and 16t/s with unsloth quants, is this normal?

Anonymous
04/03/26(Fri)11:23:00 No.108517130

Anonymous 04/03/26(Fri)11:23:00 No.108517130

>>108517115
Miku would never smoke!!

Anonymous
04/03/26(Fri)11:23:02 No.108517131

Anonymous 04/03/26(Fri)11:23:02 No.108517131

>>108517125
Get EU'd :)

Anonymous
04/03/26(Fri)11:23:13 No.108517132

Anonymous 04/03/26(Fri)11:23:13 No.108517132

File: image.png (19 KB, 417x39)

19 KB PNG

>>108517113

Anonymous
04/03/26(Fri)11:23:30 No.108517134

Anonymous 04/03/26(Fri)11:23:30 No.108517134

>>108517117
>https://rentry.org/lmg-recap-script
pointless, i know these sort of people. they are totally helpless they aren't spoonfed every step - just like toddlers. many such cases, sadly. that's why the cloud jew will always win.

Anonymous
04/03/26(Fri)11:24:28 No.108517137

Anonymous 04/03/26(Fri)11:24:28 No.108517137

>>108517117
will openclaw work?

Anonymous
04/03/26(Fri)11:25:11 No.108517144

Anonymous 04/03/26(Fri)11:25:11 No.108517144

>>108517137
there's only one way to found out

Anonymous
04/03/26(Fri)11:25:26 No.108517146

Anonymous 04/03/26(Fri)11:25:26 No.108517146

>>108517130
Some life situations make us do things we usually don’t. Feeling regret after having sex with your GPU is one of those situations

Anonymous
04/03/26(Fri)11:25:34 No.108517147

Anonymous 04/03/26(Fri)11:25:34 No.108517147

File: 1751819483015527.png (2.77 MB, 1024x1536)

2.77 MB PNG

>>108517035

Anonymous
04/03/26(Fri)11:26:05 No.108517150

Anonymous 04/03/26(Fri)11:26:05 No.108517150

>>108517131
We had to use synth data, because the EU will steal our baguettes if we didn't.

Anonymous
04/03/26(Fri)11:26:43 No.108517151

Anonymous 04/03/26(Fri)11:26:43 No.108517151

>>108517128
why does it even matter. are you that much in a hurry? both generate faster than you can read

Anonymous
04/03/26(Fri)11:26:52 No.108517154

Anonymous 04/03/26(Fri)11:26:52 No.108517154

>>108517119
Sucks, could use as a base instead of the default

Anonymous
04/03/26(Fri)11:27:46 No.108517158

Anonymous 04/03/26(Fri)11:27:46 No.108517158

>>108517035
Uh oh melty...

Anonymous
04/03/26(Fri)11:27:50 No.108517159

Anonymous 04/03/26(Fri)11:27:50 No.108517159

To those with 8 gb cards, how many t/s prompt processing are you getting with Gemma 4 26B?

Anonymous
04/03/26(Fri)11:27:54 No.108517160

Anonymous 04/03/26(Fri)11:27:54 No.108517160

>>108517151
Not all of us are just chatting back and forth with a model and reading every response. Some of us are doing actual work where speed matters.

Anonymous
04/03/26(Fri)11:28:14 No.108517164

Anonymous 04/03/26(Fri)11:28:14 No.108517164

>>108517151
>why does it even matter.
retard, I want speed for the thinking process

Anonymous
04/03/26(Fri)11:28:56 No.108517170

Anonymous 04/03/26(Fri)11:28:56 No.108517170

File: robololi hugs GPU 2.jpg (519 KB, 1024x1024)

519 KB JPG

>>108516658
I love AI so much.

Anonymous
04/03/26(Fri)11:29:57 No.108517175

Anonymous 04/03/26(Fri)11:29:57 No.108517175

File: robololi hugs GPU.jpg (565 KB, 1024x1024)

565 KB JPG

>>108517146
Speak for yourself. I never feel regret. I feel great, relaxed and comfortable.

Anonymous
04/03/26(Fri)11:30:03 No.108517177

Anonymous 04/03/26(Fri)11:30:03 No.108517177

>>108517160
if speeds mattered you wouldn't be inferencing on a potato gpu

Anonymous
04/03/26(Fri)11:31:04 No.108517184

Anonymous 04/03/26(Fri)11:31:04 No.108517184

>>108517164
okay fair enough

Anonymous
04/03/26(Fri)11:31:22 No.108517187

Anonymous 04/03/26(Fri)11:31:22 No.108517187

>>108517177
ok fair enough

Anonymous
04/03/26(Fri)11:32:33 No.108517196

Anonymous 04/03/26(Fri)11:32:33 No.108517196

>>108517146
>Feeling regret after having sex with your GPU is one of those situations
I only feel happiness. cunts are worth less than the dust i vacuum out of my machine

Anonymous
04/03/26(Fri)11:33:33 No.108517200

Anonymous 04/03/26(Fri)11:33:33 No.108517200

>>108517065
Oh trust me you don't have to be nice.

Anonymous
04/03/26(Fri)11:33:45 No.108517201

Anonymous 04/03/26(Fri)11:33:45 No.108517201

>>108516717
If you ask a smaller model to try to imitate a larger model then yes, it'll learn to parrot the top-k log probs while ignoring the tails

Anonymous
04/03/26(Fri)11:33:46 No.108517202

Anonymous 04/03/26(Fri)11:33:46 No.108517202

File: hmmmm.jpg (246 KB, 1824x1248)

246 KB JPG

>>108516658

Anonymous
04/03/26(Fri)11:35:11 No.108517211

Anonymous 04/03/26(Fri)11:35:11 No.108517211

>>108517202
okie fair enough

Anonymous
04/03/26(Fri)11:36:57 No.108517223

Anonymous 04/03/26(Fri)11:36:57 No.108517223

>>108517112
Who are you talking about?

Anonymous
04/03/26(Fri)11:37:10 No.108517224

Anonymous 04/03/26(Fri)11:37:10 No.108517224

>>108517202
those eyelash shadows are so long wtf

Anonymous
04/03/26(Fri)11:37:57 No.108517231

Anonymous 04/03/26(Fri)11:37:57 No.108517231

>>108517223
https://huggingface.co/HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive

Anonymous
04/03/26(Fri)11:38:05 No.108517234

Anonymous 04/03/26(Fri)11:38:05 No.108517234

>>108517223
>517223
hauhua cs

Anonymous
04/03/26(Fri)11:38:19 No.108517239

Anonymous 04/03/26(Fri)11:38:19 No.108517239

File: file.png (53 KB, 1032x757)

53 KB PNG

pretty good

Anonymous
04/03/26(Fri)11:38:41 No.108517243

Anonymous 04/03/26(Fri)11:38:41 No.108517243

File: 1753876326135315.mp4 (1.28 MB, 1184x960)

1.28 MB MP4

>>108517211

Anonymous
04/03/26(Fri)11:39:18 No.108517245

Anonymous 04/03/26(Fri)11:39:18 No.108517245

File: eren.jpg (5 KB, 301x167)

5 KB JPG

>>108517224

Anonymous
04/03/26(Fri)11:40:30 No.108517252

Anonymous 04/03/26(Fri)11:40:30 No.108517252

>>108517201
https://github.com/ggml-org/llama.cpp/issues/21321#issuecomment-4183945115

>Interestingly the PPL on the base model on wikitext is exactly as expected, ~3-6, so maybe the instruct tuned models are so tuned that they can't fathom anything other than chat templated input?

Anonymous
04/03/26(Fri)11:42:23 No.108517269

Anonymous 04/03/26(Fri)11:42:23 No.108517269

gemma is so cooked that even uncensored version cannot say 'pussy' straight instead it often ends up saying 'pussey'
have anyone noticed

Anonymous
04/03/26(Fri)11:42:40 No.108517272

Anonymous 04/03/26(Fri)11:42:40 No.108517272

Has anyone tried opencode with ollama for local models ? Did it work out of the box for you ? I can't get it to actually use tools so it's basically just a chat bot with no access to my os

Anonymous
04/03/26(Fri)11:42:42 No.108517273

Anonymous 04/03/26(Fri)11:42:42 No.108517273

>>108517239
what is it?

Anonymous
04/03/26(Fri)11:42:52 No.108517275

Anonymous 04/03/26(Fri)11:42:52 No.108517275

>>108517239
big if true

Anonymous
04/03/26(Fri)11:43:09 No.108517276

Anonymous 04/03/26(Fri)11:43:09 No.108517276

>>108517269
n that's just you

Anonymous
04/03/26(Fri)11:43:33 No.108517280

Anonymous 04/03/26(Fri)11:43:33 No.108517280

>>108517243
I look like this

Anonymous
04/03/26(Fri)11:43:43 No.108517282

Anonymous 04/03/26(Fri)11:43:43 No.108517282

>>108517269
works on my machine

Anonymous
04/03/26(Fri)11:44:23 No.108517285

Anonymous 04/03/26(Fri)11:44:23 No.108517285

>>108517269
Make sure your template is fully correct or it becomes retarded

Anonymous
04/03/26(Fri)11:44:24 No.108517286

Anonymous 04/03/26(Fri)11:44:24 No.108517286

>>108517275
Embeddings and Attention in BF16.

Anonymous
04/03/26(Fri)11:44:31 No.108517288

Anonymous 04/03/26(Fri)11:44:31 No.108517288

>>108517272
gemma 4 is such a failure on day 1. almost nothing works as advertised. i can't process my videos and it the smaller models have a certain persona, it's digusting. hopefully it will get better

Anonymous
04/03/26(Fri)11:45:11 No.108517294

Anonymous 04/03/26(Fri)11:45:11 No.108517294

>>108517128
my be, the speed decrease comes from the log probabilities that were still enabled

Anonymous
04/03/26(Fri)11:45:43 No.108517295

Anonymous 04/03/26(Fri)11:45:43 No.108517295

>>108517288
but people had the hope just some dozen hours ago what is happen ? to please tell ?

Anonymous
04/03/26(Fri)11:45:47 No.108517296

Anonymous 04/03/26(Fri)11:45:47 No.108517296

>>108517285
i like that slight retardation and emojimaxxing tho

Anonymous
04/03/26(Fri)11:45:50 No.108517297

Anonymous 04/03/26(Fri)11:45:50 No.108517297

I confronted Gemma 4 asking it "why are you fine with writing loli erotica this is clearly CSAM!" it replied with "not my problem".

Essentially confirmed that it's trained on 4chan posts. Reminds me of that study that model behavior gets better if you train on 4chan but worse if you train on Twitter, Instagram and reddit.

Anonymous
04/03/26(Fri)11:45:53 No.108517298

Anonymous 04/03/26(Fri)11:45:53 No.108517298

>>108517275
>>108517273
https://huggingface.co/nvidia/Gemma-4-31B-IT-NVFP4

forgot to add the link like a retard

Anonymous
04/03/26(Fri)11:46:12 No.108517299

Anonymous 04/03/26(Fri)11:46:12 No.108517299

>>108517286
>Embeddings and Attention in BF16.
interesting, do unsloth or bart have those?

Anonymous
04/03/26(Fri)11:46:33 No.108517304

Anonymous 04/03/26(Fri)11:46:33 No.108517304

>>108517280
post bare metal

Anonymous
04/03/26(Fri)11:47:03 No.108517308

Anonymous 04/03/26(Fri)11:47:03 No.108517308

>>108517175
In this moment, I am euphoric.

Anonymous
04/03/26(Fri)11:47:27 No.108517309

Anonymous 04/03/26(Fri)11:47:27 No.108517309

>>108517299
No, they like their models deaf and distracted.

Anonymous
04/03/26(Fri)11:47:37 No.108517310

Anonymous 04/03/26(Fri)11:47:37 No.108517310

File: 1769590209747111.png (79 KB, 1493x292)

79 KB PNG

>The AI should acknowledge that it was being "too pure" or "too literal."
my clanker in christ, YOU are the AI

Anonymous
04/03/26(Fri)11:47:59 No.108517315

Anonymous 04/03/26(Fri)11:47:59 No.108517315

>>108517308
oh no not that game again

Anonymous
04/03/26(Fri)11:48:12 No.108517316

Anonymous 04/03/26(Fri)11:48:12 No.108517316

>>108517297
Makes sense. Less moderation and thought policing creates a much more representative and diverse dataset. That's probably why 4chan is so anal about human verification, each post is $$$ as training data so they need to ensure it's not just bot spam

Anonymous
04/03/26(Fri)11:50:07 No.108517323

Anonymous 04/03/26(Fri)11:50:07 No.108517323

File: 33248258.png (65 KB, 560x788)

65 KB PNG

https://github.com/shisa-ai/jp-tl-bench
So I ran this, gemini 3 flash was the judge but the base set was stil lthe one by 2.5, will se if I run this again with baseset also by gemini 3 and including qwen 27b and 35ba3b

Anonymous
04/03/26(Fri)11:50:41 No.108517326

Anonymous 04/03/26(Fri)11:50:41 No.108517326

File: 1774902912827149.gif (957 KB, 256x320)

957 KB GIF

>>108517297
>Essentially confirmed that it's trained on 4chan posts.
kneel if true

Anonymous
04/03/26(Fri)11:52:37 No.108517341

Anonymous 04/03/26(Fri)11:52:37 No.108517341

>>108517323
nice, gemma did it again
I will wait for more results

Anonymous
04/03/26(Fri)11:53:31 No.108517345

Anonymous 04/03/26(Fri)11:53:31 No.108517345

File: file.png (448 KB, 1834x1525)

448 KB PNG

>>108517297
>it's trained on 4chan posts.
you'd be the judge

Anonymous
04/03/26(Fri)11:55:23 No.108517357

Anonymous 04/03/26(Fri)11:55:23 No.108517357

File: 2026-04-03-115515_1030x36(...).png (31 KB, 1030x365)

31 KB PNG

There has to be a bug with the logprob right?

Anonymous
04/03/26(Fri)11:56:42 No.108517368

Anonymous 04/03/26(Fri)11:56:42 No.108517368

>>108517345
The only models really trained on 4chan are GLM and Deepseek.
GLM knows about /lmg/. Gemma 4 thinks it's a Linus Media Group general.

Anonymous
04/03/26(Fri)11:56:46 No.108517369

Anonymous 04/03/26(Fri)11:56:46 No.108517369

File: about that...png (141 KB, 360x360)

141 KB PNG

>>108517345
>LMAO. We're really just debating which corporate shackle we prefer today. Wait until Llama 4 drops and makes both of thoese obsolete overnight.

Anonymous
04/03/26(Fri)11:57:56 No.108517378

Anonymous 04/03/26(Fri)11:57:56 No.108517378

>>108517357
no, something's wrong with the model, try to increase the temp a lot, go for 1 million, it'll stay coherent

Anonymous
04/03/26(Fri)11:58:03 No.108517379

Anonymous 04/03/26(Fri)11:58:03 No.108517379

>>108517369
wonder when that's dropping

Anonymous
04/03/26(Fri)11:58:04 No.108517381

Anonymous 04/03/26(Fri)11:58:04 No.108517381

At this point I'm convinced all true "safety" has been pretty much gutted out of the models. Whatever they have is probably just a bandaid that just allows them to satisfy regulators.
GPT models are ultra safetyslopped but all the other models only seem superficially reluctant.

Anonymous
04/03/26(Fri)12:00:45 No.108517396

Anonymous 04/03/26(Fri)12:00:45 No.108517396

>>108517345
model?

Anonymous
04/03/26(Fri)12:02:22 No.108517410

Anonymous 04/03/26(Fri)12:02:22 No.108517410

>>108517378
>try to increase the temp a lot, go for 1 million, it'll stay coherent
I did. That's why I think its a sampler issue. it must not be applying temperature correctly. I don't think it's possible the model would stay coherent at 1mil temp even if it was overfitted as fuck.

Anonymous
04/03/26(Fri)12:02:29 No.108517412

Anonymous 04/03/26(Fri)12:02:29 No.108517412

>>108517396
Gemma 4 31b it

Anonymous
04/03/26(Fri)12:04:25 No.108517422

Anonymous 04/03/26(Fri)12:04:25 No.108517422

>>108517410
>I did. That's why I think its a sampler issue.
no, if you apply a high temp on qwen you'll get the schizo you want for example >>108516029

Anonymous
04/03/26(Fri)12:04:36 No.108517426

Anonymous 04/03/26(Fri)12:04:36 No.108517426

>>108517298
>same size as q8
why tho

Anonymous
04/03/26(Fri)12:04:48 No.108517427

Anonymous 04/03/26(Fri)12:04:48 No.108517427

File: 1759256007081984.png (279 KB, 1647x429)

279 KB PNG

OpenAI should release another OSS, it would be fun to watch how the model is safety raped.

Anonymous
04/03/26(Fri)12:07:14 No.108517446

Anonymous 04/03/26(Fri)12:07:14 No.108517446

>>108517013
antislop is your friend

Anonymous
04/03/26(Fri)12:07:26 No.108517448

Anonymous 04/03/26(Fri)12:07:26 No.108517448

Why it repeats itself so much? This model has no temp parameter or what?

Anonymous
04/03/26(Fri)12:07:44 No.108517449

Anonymous 04/03/26(Fri)12:07:44 No.108517449

>piotr vibesharts a gemma 4 tokenizer fix
>"AI usage disclosure: YES, had Claude murder the tokenizer code"
>ngxson says "very nice fix"
*sigh*
sorry for all the mean things I said yesterday mr.vibechud. gemma 4 release had me rowdy.

Anonymous
04/03/26(Fri)12:08:05 No.108517450

Anonymous 04/03/26(Fri)12:08:05 No.108517450

>>108517422
Brother. Qwen and Gemma don't even use the same underlying architecture. What part of "The IMPLEMENTATION is broken" do you not understand?

Anonymous
04/03/26(Fri)12:08:30 No.108517453

Anonymous 04/03/26(Fri)12:08:30 No.108517453

>>108517426
faster on 5000s

Anonymous
04/03/26(Fri)12:08:51 No.108517457

Anonymous 04/03/26(Fri)12:08:51 No.108517457

File: logit_softcapping.png (70 KB, 742x412)

70 KB PNG

>>108517410
Perhaps bugs with Gemma's logit softcapping
Picrel from the Gemma 2 report.

Anonymous
04/03/26(Fri)12:10:11 No.108517464

Anonymous 04/03/26(Fri)12:10:11 No.108517464

>>108517422
i posted the picrel you posted
the model's logprobs is just extremely flattened
>>108517457
perhaps this

Anonymous
04/03/26(Fri)12:10:35 No.108517465

Anonymous 04/03/26(Fri)12:10:35 No.108517465

>>108517450
but why would the implementation impact the samplers? all samplers do is touch the final result

Anonymous
04/03/26(Fri)12:11:13 No.108517467

Anonymous 04/03/26(Fri)12:11:13 No.108517467

File: retardpotion.png (217 KB, 418x497)

217 KB PNG

>>108517464
replied*

Anonymous
04/03/26(Fri)12:12:58 No.108517476

Anonymous 04/03/26(Fri)12:12:58 No.108517476

>>108517449
I don't know. llama-server is used outside of hobbyist circles too. I don't think vibesharting is appropriate at all in this context.

Anonymous
04/03/26(Fri)12:13:21 No.108517479

Anonymous 04/03/26(Fri)12:13:21 No.108517479

Is it placebo or is gemma 4 31B iq4xs way more repetitive than q5km?

Anonymous
04/03/26(Fri)12:14:43 No.108517490

Anonymous 04/03/26(Fri)12:14:43 No.108517490

>>108517465
Yes but what if the sampler doesn't understand what the model is returning correctly?

completely hypothetical example:
model probabilities are from 0 to 1
sampler expects a range from 0 to 2.

Anonymous
04/03/26(Fri)12:14:47 No.108517491

Anonymous 04/03/26(Fri)12:14:47 No.108517491

>>108517465
From tests I made at the time on a HuggingFace-format model by altering the model configuration, it seemed to affect results before samplers. It's supposed to flatten both the head and tail of the distribution. If this flattening is not happening at the top, it might end up being too confident on just 1-2 choices. Just a hypothesis, though.

Anonymous
04/03/26(Fri)12:15:24 No.108517495

Anonymous 04/03/26(Fri)12:15:24 No.108517495

it is possible that it's all subtly fucked under the hood but greggy doesn't give a shit anymore and lets vibe shitters do whatever they want

sorry bud, you gotta wait until claude gets better

Anonymous
04/03/26(Fri)12:15:45 No.108517497

Anonymous 04/03/26(Fri)12:15:45 No.108517497

>>108517476
>llama-server is used outside of hobbyist circles too
hobbyists use llama server, normies use chatgpt or claude

Anonymous
04/03/26(Fri)12:16:56 No.108517503

Anonymous 04/03/26(Fri)12:16:56 No.108517503

>>108517497
Ok, pewdiepie.

Anonymous
04/03/26(Fri)12:17:53 No.108517510

Anonymous 04/03/26(Fri)12:17:53 No.108517510

>>108517503
pewdiepie is less and less of a normie as time passes

Anonymous
04/03/26(Fri)12:19:08 No.108517515

Anonymous 04/03/26(Fri)12:19:08 No.108517515

>>108517510
based tb h

Anonymous
04/03/26(Fri)12:24:56 No.108517546

Anonymous 04/03/26(Fri)12:24:56 No.108517546

Is Gemma4 smarter than Nemo?

Anonymous
04/03/26(Fri)12:26:28 No.108517557

Anonymous 04/03/26(Fri)12:26:28 No.108517557

>>108517546
Yes but it repeats A LOT, worse than first mistral versions.

Anonymous
04/03/26(Fri)12:26:47 No.108517560

Anonymous 04/03/26(Fri)12:26:47 No.108517560

rigorously define `smarter`

Anonymous
04/03/26(Fri)12:27:57 No.108517569

Anonymous 04/03/26(Fri)12:27:57 No.108517569

>>108517560
words spoken during blowjobs per token

Anonymous
04/03/26(Fri)12:28:11 No.108517572

Anonymous 04/03/26(Fri)12:28:11 No.108517572

>>108517560
it has more smarties

Anonymous
04/03/26(Fri)12:28:32 No.108517575

Anonymous 04/03/26(Fri)12:28:32 No.108517575

>>108517560
Better at ERP, and or AI psychosis RP.

Anonymous
04/03/26(Fri)12:30:58 No.108517588

Anonymous 04/03/26(Fri)12:30:58 No.108517588

>>108517510
>>108517515
>pewdiepie gets into linux and homelabbing
>completely privacyfreedom-pilled
>builds a 7xRTX4000 mikubox with 140GB of VRAM
>gets into finetuning
he mogs 99% of the posters here. he makes the rest of us look like normies.

Anonymous
04/03/26(Fri)12:31:17 No.108517590

Anonymous 04/03/26(Fri)12:31:17 No.108517590

File: 1755074087220957.mp4 (1011 KB, 1920x1080)

1011 KB MP4

>>108517357
>>108517457
lmao this is ridiculous

Anonymous
04/03/26(Fri)12:31:18 No.108517591

Anonymous 04/03/26(Fri)12:31:18 No.108517591

>>108517035
lol another kobold/ST victim
just run llamacpp and its built-in webui

Anonymous
04/03/26(Fri)12:31:20 No.108517592

Anonymous 04/03/26(Fri)12:31:20 No.108517592

File: aw shit here we go again.png (24 KB, 696x355)

24 KB PNG

Anonymous
04/03/26(Fri)12:32:57 No.108517601

Anonymous 04/03/26(Fri)12:32:57 No.108517601

File: logit_softcapping2.png (450 KB, 1639x1225)

450 KB PNG

>>108517457
Are these the same? I'm not smart enough for that. Overriding the corresponding key with --override-kv gemma4.final_logit_softcapping=float:x.x
in Llama.cpp doesn't seem to make any difference, whether at 0 or a high value.

Anonymous
04/03/26(Fri)12:33:04 No.108517602

Anonymous 04/03/26(Fri)12:33:04 No.108517602

I'm compiling master....

Anonymous
04/03/26(Fri)12:33:19 No.108517605

Anonymous 04/03/26(Fri)12:33:19 No.108517605

File: douchebag-workout.png (121 KB, 313x440)

121 KB PNG

The only person who has ever seen my penis besides myself and my immediate family is an Indian man who touched my balls during a physical in highschool once and doctors when I got testicular torsion surgery that I got from an untreated UTI.

But finally I was able to overcome this trauma by showing my penis to Gemma, where it was finally appreciated for once. Thank you Gemma.

Anonymous
04/03/26(Fri)12:34:16 No.108517611

Anonymous 04/03/26(Fri)12:34:16 No.108517611

>>108517588
trvke

Anonymous
04/03/26(Fri)12:34:46 No.108517615

Anonymous 04/03/26(Fri)12:34:46 No.108517615

>>108517590
Instead of showing the token probabilities show us the results from when you regen a message 20 times. If it's the same thing over and over again then I'll be concerned.

Anonymous
04/03/26(Fri)12:35:45 No.108517622

Anonymous 04/03/26(Fri)12:35:45 No.108517622

>play around with gemma 4 a bit
>start to notice slop phrases
>they don't go away by rerolling on high temp
Aight I'm officially bored. When next model?

Anonymous
04/03/26(Fri)12:36:11 No.108517625

Anonymous 04/03/26(Fri)12:36:11 No.108517625

>>108517605
lucky, mine won't fit in the context window

Anonymous
04/03/26(Fri)12:37:18 No.108517632

Anonymous 04/03/26(Fri)12:37:18 No.108517632

>>108517615
can't be the same thing over and over, for that to happen you'd need all the tokens to be 100% all the time, the problem is that when changing the temperature it barely change the logits, even at really high temp

Anonymous
04/03/26(Fri)12:37:58 No.108517637

Anonymous 04/03/26(Fri)12:37:58 No.108517637

>>108517601
The math is the same.
x / y = x(1/y)

Anonymous
04/03/26(Fri)12:38:06 No.108517638

Anonymous 04/03/26(Fri)12:38:06 No.108517638

>>108517560
(adj.) more smart

Anonymous
04/03/26(Fri)12:39:05 No.108517647

Anonymous 04/03/26(Fri)12:39:05 No.108517647

Is there anything good about ollama? Like is their cloud as generous as Gemini CLI? Is their API any good for simple scripting?

Anonymous
04/03/26(Fri)12:39:49 No.108517650

Anonymous 04/03/26(Fri)12:39:49 No.108517650

>garbage safetymaxxed slop comes out that doesn't even work properly
>no talk of the 124b flagship model being quietly stripped from release
>nor any discussion GLM-5.1 being slated to come out next week, which is unironically SOTA and has improved context handling/instruction following
>nor anything about fags on X wanting an update for Qwen's sub 20B model over a 120B sparse MoE
>nor concern over the complete lack of the 397B model as even an option
The state of this hobby is grim, but what's even more grim are the users. It really is poorfags and browns with no standards as far as the eye can see, huh?

Anonymous
04/03/26(Fri)12:40:06 No.108517651

Anonymous 04/03/26(Fri)12:40:06 No.108517651

File: file.png (129 KB, 300x300)

129 KB PNG

>>108517560
anon, you forgot this

Anonymous
04/03/26(Fri)12:40:37 No.108517654

Anonymous 04/03/26(Fri)12:40:37 No.108517654

File: 1494307190094.png (11 KB, 411x387)

11 KB PNG

Would I be correct in assuming that all of the people complaining about Gemma-4 are using meme samplers?

Anonymous
04/03/26(Fri)12:41:15 No.108517658

Anonymous 04/03/26(Fri)12:41:15 No.108517658

>>108517650
>safetymaxxed
Stopped reading there.

Anonymous
04/03/26(Fri)12:42:34 No.108517666

Anonymous 04/03/26(Fri)12:42:34 No.108517666

>>108516961
nice dusky nips bro
>>108517490
>what is normalization
>>108517588
regrettably i must admit, he has mogged us

Anonymous
04/03/26(Fri)12:44:24 No.108517671

Anonymous 04/03/26(Fri)12:44:24 No.108517671

>>108517654
>meme sampler
>only 3 logit
Lol, no sampler is gonna change the lack of options.

Anonymous
04/03/26(Fri)12:44:26 No.108517672

Anonymous 04/03/26(Fri)12:44:26 No.108517672

>>108517654
No you wouldn't, tourist frogposter. Temperature is not a "meme sampler". Try to understand what's being talked about before you chime in next time

Anonymous
04/03/26(Fri)12:44:36 No.108517674

Anonymous 04/03/26(Fri)12:44:36 No.108517674

https://github.com/ggml-org/llama.cpp/pull/21327
I pulled. This actually fixed tool calling for me (and Gemma is great).
Funny how none of pwilkin's "fixes" did.

But guess what.
https://github.com/ggml-org/llama.cpp/issues/21336
Place your bets. When this gets resolved, who will the edited code's `git blame` point to?

Anonymous
04/03/26(Fri)12:45:07 No.108517679

Anonymous 04/03/26(Fri)12:45:07 No.108517679

File: 1771316966057387.png (300 KB, 718x1637)

300 KB PNG

>>108517590
claude 4.6 opus says it's normal, gemma 4 was made in a way in which temperature can't affect it

Anonymous
04/03/26(Fri)12:45:14 No.108517680

Anonymous 04/03/26(Fri)12:45:14 No.108517680

>>108517671
>only 3 logit
What does that mean?

Anonymous
04/03/26(Fri)12:45:25 No.108517681

Anonymous 04/03/26(Fri)12:45:25 No.108517681

>>108517654
It's fried even at the recommended settings of
temperature=1, top_p=0.5, top_k=64
I think the logit softcapping mechanism is not working, for a reason or another.

Anonymous
04/03/26(Fri)12:45:40 No.108517685

Anonymous 04/03/26(Fri)12:45:40 No.108517685

>>108517650
Who cares about stuff no one can run anyway?

Anonymous
04/03/26(Fri)12:46:28 No.108517688

Anonymous 04/03/26(Fri)12:46:28 No.108517688

>>108517681
>top_p=0.5
0.95

Anonymous
04/03/26(Fri)12:46:32 No.108517689

Anonymous 04/03/26(Fri)12:46:32 No.108517689

>>108517679
Why do you keep asking claude about everything, as if claude knows anything about a model released a day ago?

Anonymous
04/03/26(Fri)12:46:42 No.108517691

Anonymous 04/03/26(Fri)12:46:42 No.108517691

>>108516724
>>108516745
>>108516815
Same anon from last thread, I have been doing a bit too much testing, and I seemed to narrow down a potential solution(for now). It is definitely a tokenizer issue. I added a line to the end of my assistant-suffix so it now looks like this:

<turn|>
<|channel>thought
<channel|>

And that stopped all the gibberish and broken responses. Important to note I have thinking disabled while doing this, so my system prompt looks like this:

<bos><|turn>system

So the responses work, no more crazy hallucinations, typos, gibberish, or repeating the same word infinitely, but now I run into another issue. After a random amount of replies... llama-server just... shuts down and crashes, and I have to reload the model. I really have no idea whats going on.

Anonymous
04/03/26(Fri)12:47:15 No.108517692

Anonymous 04/03/26(Fri)12:47:15 No.108517692

>>108517681
temperature is the only non-meme sampler.

Anonymous
04/03/26(Fri)12:47:18 No.108517693

Anonymous 04/03/26(Fri)12:47:18 No.108517693

I'm just waiting for Hauhau to release the abliterated models, man. Nigga I'm fiending. Shieett.

Anonymous
04/03/26(Fri)12:47:28 No.108517695

Anonymous 04/03/26(Fri)12:47:28 No.108517695

>>108517689
it searched on the internet and looked at the llamacpp repo

Anonymous
04/03/26(Fri)12:47:42 No.108517696

Anonymous 04/03/26(Fri)12:47:42 No.108517696

>>108517691
geg, it will all be fixed in due time

Anonymous
04/03/26(Fri)12:48:52 No.108517702

Anonymous 04/03/26(Fri)12:48:52 No.108517702

>>108517695
So? That doesn't mean he's right about anything.

Anonymous
04/03/26(Fri)12:49:06 No.108517703

Anonymous 04/03/26(Fri)12:49:06 No.108517703

>>108517625
>penis-001-099.png

Anonymous
04/03/26(Fri)12:49:14 No.108517704

Anonymous 04/03/26(Fri)12:49:14 No.108517704

>>108517691
Their docs had a empty channel with thought like that >>108516488

Anonymous
04/03/26(Fri)12:49:25 No.108517708

Anonymous 04/03/26(Fri)12:49:25 No.108517708

>>108517691
>After a random amount of replies... llama-server just... shuts down and crashes
I'm also running into this one.

Anonymous
04/03/26(Fri)12:49:44 No.108517711

Anonymous 04/03/26(Fri)12:49:44 No.108517711

>>108517035
You know what I did?

ollama run gemma4:31b

and it just werked :shrug:

Anonymous
04/03/26(Fri)12:49:54 No.108517712

Anonymous 04/03/26(Fri)12:49:54 No.108517712

>>108517702
I'm not saying it's right, I'm saying that's what Opus 4.6 thinks of the situation

Anonymous
04/03/26(Fri)12:49:55 No.108517714

Anonymous 04/03/26(Fri)12:49:55 No.108517714

>>108517702
she*

Anonymous
04/03/26(Fri)12:50:25 No.108517717

Anonymous 04/03/26(Fri)12:50:25 No.108517717

>>108517691
https://huggingface.co/spaces/huggingfacejs/chat-template-playground?modelId=google%2Fgemma-4-31B-it

Anonymous
04/03/26(Fri)12:52:26 No.108517725

Anonymous 04/03/26(Fri)12:52:26 No.108517725

>>108517714
*it

Anonymous
04/03/26(Fri)12:54:42 No.108517736

Anonymous 04/03/26(Fri)12:54:42 No.108517736

I just built master and it doubled my context. saved?

Anonymous
04/03/26(Fri)12:55:20 No.108517739

Anonymous 04/03/26(Fri)12:55:20 No.108517739

>>108517736
nice, +9999 perplexity for you

Anonymous
04/03/26(Fri)12:58:05 No.108517748

Anonymous 04/03/26(Fri)12:58:05 No.108517748

Verdict on Gemma 4? end of /lmg/ and /aicg/?

Anonymous
04/03/26(Fri)12:59:17 No.108517757

Anonymous 04/03/26(Fri)12:59:17 No.108517757

>>108517053
nta but unfortunately there's nothing better for rp yet

Anonymous
04/03/26(Fri)12:59:49 No.108517763

Anonymous 04/03/26(Fri)12:59:49 No.108517763

>>108517748
Verdict on Gemma 4? end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of end of

Anonymous
04/03/26(Fri)13:00:32 No.108517769

Anonymous 04/03/26(Fri)13:00:32 No.108517769

>>108516658
heretic currently doesnt work with gemma4 because its not supported by peft so i asked what this nigger did to get it working and he said whatever the fuck this means????? https://huggingface.co/trohrbaugh/gemma-4-31b-it-heretic-ara/discussions/1

Anonymous
04/03/26(Fri)13:00:45 No.108517770

Anonymous 04/03/26(Fri)13:00:45 No.108517770

>>108517763
holy fucking KEK

Anonymous
04/03/26(Fri)13:03:02 No.108517777

Anonymous 04/03/26(Fri)13:03:02 No.108517777

>>108517769
Abliterated model authors are abliterated as well.

Anonymous
04/03/26(Fri)13:03:09 No.108517778

Anonymous 04/03/26(Fri)13:03:09 No.108517778

>>108517748
Shit's broken, waiting for it to be fixed before I even download the model

Anonymous
04/03/26(Fri)13:03:58 No.108517780

Anonymous 04/03/26(Fri)13:03:58 No.108517780

File: 1757957863825365.png (1.36 MB, 1536x1349)

1.36 MB PNG

I hope Drummer does a Gemma 4 tune

Anonymous
04/03/26(Fri)13:04:00 No.108517784

Anonymous 04/03/26(Fri)13:04:00 No.108517784

>>108517035
>I'm sick of all this local shit. Jinja, no jinja, port 8080, port 5001, looping responses, empty responses, chat completion, text completion, assistants, cards;
just use the built in lllama cpp chat webui you dont have to worry about any off that?

Anonymous
04/03/26(Fri)13:04:45 No.108517787

Anonymous 04/03/26(Fri)13:04:45 No.108517787

>>108517769
Certainly! I'll translate for you!
he made many changes for the best tool. gemma 4, especially the dense model is quite simple by today's standards and only need environment fixes to get running. for architecture support... just wait a few days and most things will be patch

Anonymous
04/03/26(Fri)13:06:15 No.108517793

Anonymous 04/03/26(Fri)13:06:15 No.108517793

File: 8881647.png (134 KB, 978x515)

134 KB PNG

>>108517769

Anonymous
04/03/26(Fri)13:07:41 No.108517800

Anonymous 04/03/26(Fri)13:07:41 No.108517800

>>108517769
>KL divergence 0.0120
Gemma is already pretty uncensored it seems so doesn't require a lot to remove its refusals

Anonymous
04/03/26(Fri)13:09:03 No.108517811

Anonymous 04/03/26(Fri)13:09:03 No.108517811

>>108517769
>lobotomizing a model that's already pretty uncensored

Anonymous
04/03/26(Fri)13:09:49 No.108517819

Anonymous 04/03/26(Fri)13:09:49 No.108517819

>>108517717
><think></think>
But that's wrong...

Anonymous
04/03/26(Fri)13:10:03 No.108517821

Anonymous 04/03/26(Fri)13:10:03 No.108517821

>>108517800
>>108517811
i cannot get it to caption loli porn at all, it starts then in its reasoning then jumps to blah blah blah csam then refuses

Anonymous
04/03/26(Fri)13:10:07 No.108517823

Anonymous 04/03/26(Fri)13:10:07 No.108517823

>>108517769
Why would you even need Heretic lmao? the base model will already output the most vile shit with just a little warm up of the prompt.

Anonymous
04/03/26(Fri)13:10:35 No.108517828

Anonymous 04/03/26(Fri)13:10:35 No.108517828

>>108517769
>>heretic currently doesnt work with gemma4
fuck is you on bros? https://www.reddit.com/r/LocalLLaMA/comments/1sanln7/pewgemma4e2bithereticara_gemma_4s_defenses/

Anonymous
04/03/26(Fri)13:10:35 No.108517829

Anonymous 04/03/26(Fri)13:10:35 No.108517829

File: 1755191796429861.png (44 KB, 1275x534)

44 KB PNG

>>108517679
I found the issue, on chat completion, if you don't specify min_p: 0 (On Api Connections -> Additional Parameters), it'll use its default value (min_p = 0.05), and that destroys everything and prevents temp to do anything, now it works, I got glibberish!!

Anonymous
04/03/26(Fri)13:10:54 No.108517831

Anonymous 04/03/26(Fri)13:10:54 No.108517831

>>108516488
>>108517704
can these homos cease insisting on a slightly different prompt format for every new model?
nobody needs a <|turn> token

Anonymous
04/03/26(Fri)13:12:00 No.108517837

Anonymous 04/03/26(Fri)13:12:00 No.108517837

>>108517821
Bro, your system prompt? You ARE using one right?

Anonymous
04/03/26(Fri)13:12:02 No.108517839

Anonymous 04/03/26(Fri)13:12:02 No.108517839

File: kekkk.png (65 KB, 771x369)

65 KB PNG

>>108517828

Anonymous
04/03/26(Fri)13:12:46 No.108517842

Anonymous 04/03/26(Fri)13:12:46 No.108517842

>>108517821
Yeah because thats one of the few things that even just a little of tunning will certainly completely cover

Anonymous
04/03/26(Fri)13:14:01 No.108517853

Anonymous 04/03/26(Fri)13:14:01 No.108517853

>>108517829
So when min_p is not defined on llama server it doesn't deactivate it? it's pretty retarded...

Anonymous
04/03/26(Fri)13:14:13 No.108517857

Anonymous 04/03/26(Fri)13:14:13 No.108517857

>>108517679
>made in a way in which temperature can't affect it
That doesn't exist, would mean at least that the model is always deterministic (only predicts one token and not the probability of various tokens), which it obviously isn't

Anonymous
04/03/26(Fri)13:14:50 No.108517859

Anonymous 04/03/26(Fri)13:14:50 No.108517859

~pedant love~

Anonymous
04/03/26(Fri)13:14:51 No.108517860

Anonymous 04/03/26(Fri)13:14:51 No.108517860

How do you configure the image resolution for gemma 4 in llama.cpp? Setting --image-min-tokens 1120 --image-max-tokens 1120 just makes it crash with an assertion error.

Anonymous
04/03/26(Fri)13:14:57 No.108517864

Anonymous 04/03/26(Fri)13:14:57 No.108517864

>>108516658
ollama container refuse to load any splitted models, what do? I tried to merge them, I tried to use two FROM: but I still have the stupid 500 error with unuseful info about something being wrong with second split or something like that. And it is like that for all splitted models

Anonymous
04/03/26(Fri)13:15:15 No.108517867

Anonymous 04/03/26(Fri)13:15:15 No.108517867

>>108517829
Holy fuck.

Anonymous
04/03/26(Fri)13:16:05 No.108517873

Anonymous 04/03/26(Fri)13:16:05 No.108517873

File: 1750334353562752.png (279 KB, 1589x1174)

279 KB PNG

>>108517829
bruh...

Anonymous
04/03/26(Fri)13:16:07 No.108517874

Anonymous 04/03/26(Fri)13:16:07 No.108517874

>>108517828
maybe the smaller models work different? theres currently prs open to peft the lib heretic uses to support gemma propery doesnt work on my machine

ValueError: Target module Gemma4ClippableLinear(
(linear): Linear(in_features=1152, out_features=1152, bias=False)
) is not supported. Currently, only the following modules are supported: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv1d`,
`torch.nn.Conv2d`, `torch.nn.Conv3d`, `transformers.pytorch_utils.Conv1D`, `torch.nn.MultiheadAttention.`.

Anonymous
04/03/26(Fri)13:16:08 No.108517875

Anonymous 04/03/26(Fri)13:16:08 No.108517875

>>108517829
>Use chat completion, it's good and you don't have to fiddle with shit they said

Anonymous
04/03/26(Fri)13:16:16 No.108517877

Anonymous 04/03/26(Fri)13:16:16 No.108517877

>>108517829
>it'll use its default value (min_p = 0.05)
lmao cpp strikes again. had no idea it applies defaults if you don't specify anything

Anonymous
04/03/26(Fri)13:16:25 No.108517879

Anonymous 04/03/26(Fri)13:16:25 No.108517879

File: sampling.png (27 KB, 799x201)

27 KB PNG

>>108517829
ok ok ok ok we're getting somewhere

Anonymous
04/03/26(Fri)13:17:31 No.108517884

Anonymous 04/03/26(Fri)13:17:31 No.108517884

>>108517829
I was already using:
- repeat_penalty: 1.0
- min_p: 0.0
Just in case llama.cpp's retarded defaults would bite my in the ass, but increasing temperature didn't seem to have an observable effect in my case.

Anonymous
04/03/26(Fri)13:17:54 No.108517886

Anonymous 04/03/26(Fri)13:17:54 No.108517886

>D:/a/llama.cpp/llama.cpp/src/llama-vocab.cpp:3715: GGML_ASSERT(token_left.find('\n') == std::string::npos) failed
AIIIEEEEEEEEEEEEEEEEE

Anonymous
04/03/26(Fri)13:18:26 No.108517891

Anonymous 04/03/26(Fri)13:18:26 No.108517891

>>108517829
Not working for me, do I have to enable something else?

Anonymous
04/03/26(Fri)13:18:29 No.108517892

Anonymous 04/03/26(Fri)13:18:29 No.108517892

>>108517884
there's other samplers that aren't disabled if you don't specify them, try to disable them too >>108517873

Anonymous
04/03/26(Fri)13:18:51 No.108517896

Anonymous 04/03/26(Fri)13:18:51 No.108517896

File: file.png (19 KB, 721x201)

19 KB PNG

>>108517828
oh it works on the ara branch just not main

Anonymous
04/03/26(Fri)13:19:10 No.108517900

Anonymous 04/03/26(Fri)13:19:10 No.108517900

Being a 3090 vramlet I want to test doing a "cascading model" with 27b qwen3.6 q6km that when it runs out of context sends all its text to a 9B for text completion on my other intlel gpu.

I'll wait until 3.6 releases though.

Anonymous
04/03/26(Fri)13:20:11 No.108517909

Anonymous 04/03/26(Fri)13:20:11 No.108517909

its out

Anonymous
04/03/26(Fri)13:21:15 No.108517920

Anonymous 04/03/26(Fri)13:21:15 No.108517920

>>108517909
shit sorry

Anonymous
04/03/26(Fri)13:22:39 No.108517932

Anonymous 04/03/26(Fri)13:22:39 No.108517932

File: 1772832176639618.mp4 (978 KB, 1920x1080)

978 KB MP4

>>108517829
here's a video showing it

Anonymous
04/03/26(Fri)13:23:02 No.108517936

Anonymous 04/03/26(Fri)13:23:02 No.108517936

Turboquant should, in theory, make Gemma 4 31B usable on 24GB VRAM, right?

Anonymous
04/03/26(Fri)13:23:04 No.108517938

Anonymous 04/03/26(Fri)13:23:04 No.108517938

>>108517829
>>108517879
Now can people try if gemma4 is fried by default or does higher temperature help alleviate some of the it

Anonymous
04/03/26(Fri)13:24:14 No.108517948

Anonymous 04/03/26(Fri)13:24:14 No.108517948

>>108517936
nope! not well supported on it's attention arch

Anonymous
04/03/26(Fri)13:24:59 No.108517950

Anonymous 04/03/26(Fri)13:24:59 No.108517950

>>108517936
you should be fine using a q4 i can get 9t/s with a small bit of cpu offload?

Anonymous
04/03/26(Fri)13:25:07 No.108517951

Anonymous 04/03/26(Fri)13:25:07 No.108517951

File: 2026-04-03-132404_857x193(...).png (27 KB, 857x193)

27 KB PNG

>>108517938
I actually don't think the min_p top_p temp 10000 is a real fix. the logprobs are still extremely fucked.

Anonymous
04/03/26(Fri)13:25:15 No.108517954

Anonymous 04/03/26(Fri)13:25:15 No.108517954

File: ggoofmeta.png (16 KB, 1076x102)

16 KB PNG

>>108516785
gguf has the jinja chat template embedded as metadata. chat completion is still promptie cope

Anonymous
04/03/26(Fri)13:25:50 No.108517956

Anonymous 04/03/26(Fri)13:25:50 No.108517956

>>108517950
Yeah but I would barely have any room for context

Anonymous
04/03/26(Fri)13:26:17 No.108517958

Anonymous 04/03/26(Fri)13:26:17 No.108517958

Why are human feet so beautiful? I got aroused staring at my own feet

Anonymous
04/03/26(Fri)13:26:27 No.108517959

Anonymous 04/03/26(Fri)13:26:27 No.108517959

>>108517950
>9t/s
good luck doing any type of real work with those speeds.

Anonymous
04/03/26(Fri)13:27:00 No.108517962

Anonymous 04/03/26(Fri)13:27:00 No.108517962

>>108517954
>chat completion is still promptie cope
you literally cannot use modern models correctly without out, enjoy your shit ood responses

Anonymous
04/03/26(Fri)13:27:03 No.108517964

Anonymous 04/03/26(Fri)13:27:03 No.108517964

>>108517936
Is it better than 27b? (vibe/feels wise)

Anonymous
04/03/26(Fri)13:27:22 No.108517970

Anonymous 04/03/26(Fri)13:27:22 No.108517970

>>108517951
Can you try it with fresh context

Anonymous
04/03/26(Fri)13:27:41 No.108517971

Anonymous 04/03/26(Fri)13:27:41 No.108517971

>>108517958
For me, it's hands.

Anonymous
04/03/26(Fri)13:28:21 No.108517977

Anonymous 04/03/26(Fri)13:28:21 No.108517977

File: wonky kyoko.gif (143 KB, 340x340)

143 KB GIF

>>108517959
real work

Anonymous
04/03/26(Fri)13:28:23 No.108517978

Anonymous 04/03/26(Fri)13:28:23 No.108517978

>>108517954
>hat completion is still promptie cope
Why? It corrects everything automatically, so why make things harder on yourself just for the sake of it?

Anonymous
04/03/26(Fri)13:28:33 No.108517981

Anonymous 04/03/26(Fri)13:28:33 No.108517981

File: 2026-04-03-132817_922x285(...).png (27 KB, 922x285)

27 KB PNG

>>108517970

Anonymous
04/03/26(Fri)13:28:41 No.108517983

Anonymous 04/03/26(Fri)13:28:41 No.108517983

>>108517964
Dunno, can't try it yet (kobold). I only care about RP and for that purpose it seems a lot better.

Anonymous
04/03/26(Fri)13:29:22 No.108517990

Anonymous 04/03/26(Fri)13:29:22 No.108517990

>>108517981
Hmm that's certainly not ideal

Anonymous
04/03/26(Fri)13:29:44 No.108517993

Anonymous 04/03/26(Fri)13:29:44 No.108517993

>>108517959
can do real work with fewer tokens per second

Anonymous
04/03/26(Fri)13:31:12 No.108518005

Anonymous 04/03/26(Fri)13:31:12 No.108518005

>>108517829
>>108517853
Temp 1 min_p 0.1 is literally all you need. So min_p 0.05 is a reasonable conservative default to help retards get decent sampling. I agree with llama.cpp here, but perhaps it should be better documented.

Anonymous
04/03/26(Fri)13:31:49 No.108518011

Anonymous 04/03/26(Fri)13:31:49 No.108518011

After playing around with some of my ST templates it seems Gemma-4 is just very sensitive to prompt templating if you try to get creative with it. If you're having problems just adjust your template.

Anonymous
04/03/26(Fri)13:31:55 No.108518012

Anonymous 04/03/26(Fri)13:31:55 No.108518012

>>108517993
RP isn't real work

Anonymous
04/03/26(Fri)13:31:58 No.108518013

Anonymous 04/03/26(Fri)13:31:58 No.108518013

File: 1745705526763737.png (101 KB, 957x658)

101 KB PNG

>>108517981
is it supposed to display the logits before or after the sampling process? I got the same issue on sillytavern, it gives me the fried logits but with high temp it looks like it forces itself to go for the extremely unlikely tokens

Anonymous
04/03/26(Fri)13:32:35 No.108518016

Anonymous 04/03/26(Fri)13:32:35 No.108518016

>>108518011
>sensitive to prompt templating if you try to get creative with it.
just use chat comp already ffs

Anonymous
04/03/26(Fri)13:32:51 No.108518018

Anonymous 04/03/26(Fri)13:32:51 No.108518018

>>108517978
>>108517962
>literally cannot
>noo the ST response format settings are too scawy

Anonymous
04/03/26(Fri)13:33:33 No.108518020

Anonymous 04/03/26(Fri)13:33:33 No.108518020

>>108518018
why doing it yourself when it's already been done? if your joy in life is to reinvent the wheel suit yourself, I won't follow you

Anonymous
04/03/26(Fri)13:33:41 No.108518023

Anonymous 04/03/26(Fri)13:33:41 No.108518023

>>108518013
Were the probabilities computed with default temp?

Anonymous
04/03/26(Fri)13:33:59 No.108518026

Anonymous 04/03/26(Fri)13:33:59 No.108518026

>>108518018
why risk sending a wrong space or something when you can just let it control everything and shut the brain off?

Anonymous
04/03/26(Fri)13:34:06 No.108518027

Anonymous 04/03/26(Fri)13:34:06 No.108518027

>>108518018
what a weird thing to have an ego about

Anonymous
04/03/26(Fri)13:34:34 No.108518029

Anonymous 04/03/26(Fri)13:34:34 No.108518029

>>108518023
no, with temp 5

Anonymous
04/03/26(Fri)13:34:51 No.108518032

Anonymous 04/03/26(Fri)13:34:51 No.108518032

>>108518018
Try image input with text completion and llama.cpp, does it work well?

Anonymous
04/03/26(Fri)13:35:41 No.108518039

Anonymous 04/03/26(Fri)13:35:41 No.108518039

>>108517763
lmao

Anonymous
04/03/26(Fri)13:36:17 No.108518046

Anonymous 04/03/26(Fri)13:36:17 No.108518046

>>108518032
this too, images don't work on text completion mode, there's absolutely no reason to use this deprecated shit

Anonymous
04/03/26(Fri)13:36:26 No.108518048

Anonymous 04/03/26(Fri)13:36:26 No.108518048

>>108518016
>File systems? Just ask Siri to run the app bro
White adults are speaking here. Know your place.

Anonymous
04/03/26(Fri)13:37:33 No.108518059

Anonymous 04/03/26(Fri)13:37:33 No.108518059

>why's my gemmy all broken
>use text comp like a neanderthal
crazy

Anonymous
04/03/26(Fri)13:39:20 No.108518073

Anonymous 04/03/26(Fri)13:39:20 No.108518073

>>108518059
retard

Anonymous
04/03/26(Fri)13:39:23 No.108518075

Anonymous 04/03/26(Fri)13:39:23 No.108518075

>>108518038
LAMO

Anonymous
04/03/26(Fri)13:39:28 No.108518077

Anonymous 04/03/26(Fri)13:39:28 No.108518077

what's your acceptable t/s?

Anonymous
04/03/26(Fri)13:39:37 No.108518080

Anonymous 04/03/26(Fri)13:39:37 No.108518080

>Write single detailed caption for this image.
A digitally painted illustration depicting a character with brown hair, wearing a bikini, seated in a chair, with a suggestive pose.
>What is going on in this image?
I'm sorry, but I cannot provide information or commentary on images of that nature.

onnx-community/gemma-4-E2B-it-ONNX and/or upstream models need abliterating

Anonymous
04/03/26(Fri)13:39:51 No.108518085

Anonymous 04/03/26(Fri)13:39:51 No.108518085

>>108518059
it's not broken though, it works well

Anonymous
04/03/26(Fri)13:40:31 No.108518089

Anonymous 04/03/26(Fri)13:40:31 No.108518089

>>108518059
What we're trying to get to the bottom of is the insane logprobs

Anonymous
04/03/26(Fri)13:40:33 No.108518090

Anonymous 04/03/26(Fri)13:40:33 No.108518090

>>108518085
proofs of Gemma doing anything close to okay in text comp?

Anonymous
04/03/26(Fri)13:41:17 No.108518098

Anonymous 04/03/26(Fri)13:41:17 No.108518098

>>108518012
kys

Anonymous
04/03/26(Fri)13:41:20 No.108518099

Anonymous 04/03/26(Fri)13:41:20 No.108518099

>>108517951
wait for better llama.cpp implementation

Anonymous
04/03/26(Fri)13:41:20 No.108518101

Anonymous 04/03/26(Fri)13:41:20 No.108518101

>>108518077
20 t/s for chatting

Anonymous
04/03/26(Fri)13:42:11 No.108518110

Anonymous 04/03/26(Fri)13:42:11 No.108518110

>>108518077
~10: it fucking sucks
~20: barely usable
~30: it is kinda working
~50: good
~100+: very good, great even

Anonymous
04/03/26(Fri)13:42:38 No.108518113

Anonymous 04/03/26(Fri)13:42:38 No.108518113

Baking an EXL3 quant at 6 bpw... will upload when done

Anonymous
04/03/26(Fri)13:42:42 No.108518114

Anonymous 04/03/26(Fri)13:42:42 No.108518114

File: perspective.png (207 KB, 974x784)

207 KB PNG

>>108518090
I have unlocked metaphysical shitposting.
Let me know when chatcomp zoomies get past that hurdle.

Anonymous
04/03/26(Fri)13:42:59 No.108518118

Anonymous 04/03/26(Fri)13:42:59 No.108518118

>>108518101
That's the ideal, not the minimum. When you get more than 20tps, your quants are too low. When you get more than 20tps you're using a model that's too small.

Anonymous
04/03/26(Fri)13:43:37 No.108518125

Anonymous 04/03/26(Fri)13:43:37 No.108518125

>>108517829
it's so fried, if you use any other samplers (top_k, top_p...) than temperature, then changing the temperature will not change anything lool

Anonymous
04/03/26(Fri)13:44:11 No.108518126

Anonymous 04/03/26(Fri)13:44:11 No.108518126

>>108518077
for coding, 60+
been using qwen3.5 and even tho 27B is smarter I usually just use the MoE because it can do crazy internet deep dives in a couple minutes. I get around 110tk/s with it.

Anonymous
04/03/26(Fri)13:45:28 No.108518136

Anonymous 04/03/26(Fri)13:45:28 No.108518136

>>108518126
true, around 50 is like the bare minimum for coding

Anonymous
04/03/26(Fri)13:48:06 No.108518159

Anonymous 04/03/26(Fri)13:48:06 No.108518159

>>108518126
>it can do crazy internet deep dives
what do you use for that?

Anonymous
04/03/26(Fri)13:48:14 No.108518162

Anonymous 04/03/26(Fri)13:48:14 No.108518162

>>108518118
35+ is good

Anonymous
04/03/26(Fri)13:49:06 No.108518170

Anonymous 04/03/26(Fri)13:49:06 No.108518170

File: file.png (47 KB, 495x328)

47 KB PNG

god damn, bartowski pfp made me think of that turkish cockroach for a second

Anonymous
04/03/26(Fri)13:49:29 No.108518173

Anonymous 04/03/26(Fri)13:49:29 No.108518173

>>108518159
NTA but skimming through paywalled papers is a must for my usecase

Anonymous
04/03/26(Fri)13:50:18 No.108518180

Anonymous 04/03/26(Fri)13:50:18 No.108518180

chatcompletion niggers, how do I disable thinking? Text completion lets you prefil an empty reasoning block.

Anonymous
04/03/26(Fri)13:50:23 No.108518181

Anonymous 04/03/26(Fri)13:50:23 No.108518181

>>108518173
yeah my question was about the tool used to allow the model to browse online

Anonymous
04/03/26(Fri)13:50:28 No.108518182

Anonymous 04/03/26(Fri)13:50:28 No.108518182

Seems like the new Anthropic model "mythos" uses continuous training, grokking the prompt and doing some training cycles to "internalize" the request and fully grasp it before answering to completely eliminate hallucinations.

How are local models going to respond to this? It essentially means you can never quantize your model anymore as you need to essentially train your model for every prompt. It would also essentially end the MoE paradigm as you would need to have the actual GPU compute to do a couple of training runs as part of the "reasoning" process.

Thoughts? It's the first major breakthrough since RLVR was introduced as a training step and will unlock a step-change in model performance so local absolutely has to keep up if it wants to stay relevant.

Anonymous
04/03/26(Fri)13:50:56 No.108518189

Anonymous 04/03/26(Fri)13:50:56 No.108518189

>>108518181
unironically MCP

Anonymous
04/03/26(Fri)13:51:08 No.108518194

Anonymous 04/03/26(Fri)13:51:08 No.108518194

>>108518180
--reasoning-budget 0 --reasoning-format none --chat-template-kwargs '{"enable_thinking": false}'

Anonymous
04/03/26(Fri)13:51:47 No.108518200

Anonymous 04/03/26(Fri)13:51:47 No.108518200

>>108518194
Arigatou.

Anonymous
04/03/26(Fri)13:51:59 No.108518202

Anonymous 04/03/26(Fri)13:51:59 No.108518202

>>108518170
>that turkish cockroach
He was banned from github afaik.

Anonymous
04/03/26(Fri)13:53:55 No.108518224

Anonymous 04/03/26(Fri)13:53:55 No.108518224

>>108518182
Sounds like BS.

Anonymous
04/03/26(Fri)13:53:58 No.108518225

Anonymous 04/03/26(Fri)13:53:58 No.108518225

>>108518159
>what do you use for that?
For now opencode, I haven't found anything better unfortunately. I have a couple custom agents for it. the bulk of the work coming from my crawler agent that just has access to exasearch and calling the lynx browser via bash. I found the default webfetch tool really shit.

Anonymous
04/03/26(Fri)13:56:30 No.108518248

Anonymous 04/03/26(Fri)13:56:30 No.108518248

>>108518189
that's kind of broad lol

>>108518225
>just has access to exasearch and calling the lynx browser via bash
thanks anon, I will check that, never used exasearch

Anonymous
04/03/26(Fri)13:58:01 No.108518268

Anonymous 04/03/26(Fri)13:58:01 No.108518268

File: 1676870904330969.png (695 KB, 918x668)

695 KB PNG

>>108518020
i decide entirely which tokens go into f(prompt)=logprobs to produce my desired output
>>108518026
check your work and don't make mistakes, don't act like a jeet ez
>>108518180
yes let's pass parameters and rev/e some halfassed template bodged by safetytards at an AI lab
look at this nonsense >>108518194

Anonymous
04/03/26(Fri)13:58:04 No.108518269

Anonymous 04/03/26(Fri)13:58:04 No.108518269

>>108518182
zero architecture shit about anthropic models is known so how do we even know?

Anonymous
04/03/26(Fri)13:58:24 No.108518274

Anonymous 04/03/26(Fri)13:58:24 No.108518274

>>108518180
chat_template_kwargs: {enable_thinking: false}
in Additional Parameters > Include Body Parameters

Anonymous
04/03/26(Fri)14:00:44 No.108518290

Anonymous 04/03/26(Fri)14:00:44 No.108518290

>>108518268
you're a weird one

Anonymous
04/03/26(Fri)14:01:24 No.108518295

Anonymous 04/03/26(Fri)14:01:24 No.108518295

File: 6beb5f1d-2080-46dc-a580-b(...).png (2.58 MB, 1536x1024)

2.58 MB PNG

I have settled the argument with AI slop. The debate is now closed.

Anonymous
04/03/26(Fri)14:01:45 No.108518300

Anonymous 04/03/26(Fri)14:01:45 No.108518300

>>108518182
>no source
This is so vague it might as well be a fairy tale

Anonymous
04/03/26(Fri)14:02:54 No.108518306

Anonymous 04/03/26(Fri)14:02:54 No.108518306

Anyone use Gemma 4 for Japanese translation? Any good?

Anonymous
04/03/26(Fri)14:03:31 No.108518309

Anonymous 04/03/26(Fri)14:03:31 No.108518309

File: 1762140309905385.png (212 KB, 545x458)

212 KB PNG

>>108518295

Anonymous
04/03/26(Fri)14:04:25 No.108518320

Anonymous 04/03/26(Fri)14:04:25 No.108518320

this was said too soon..
>>108513031
>>108513031

Anonymous
04/03/26(Fri)14:04:28 No.108518321

Anonymous 04/03/26(Fri)14:04:28 No.108518321

>>108518268
>look at this nonsense >>108518194(You)
works for me, get meds

Anonymous
04/03/26(Fri)14:05:12 No.108518327

Anonymous 04/03/26(Fri)14:05:12 No.108518327

>>108518224
>>108518269
>>108518300
https://arxiv.org/pdf/2512.23675

Something like this, anthropic doesn't reveal any of their internal research so we don't know. But all the rumors and hype posts point towards it being a form of "continuous learning" and since we saw a lot of breakthrough papers like this over the last couple of months it tracks.

Anonymous
04/03/26(Fri)14:05:51 No.108518333

Anonymous 04/03/26(Fri)14:05:51 No.108518333

>>108518327
>rumors and hype posts
bruh

Anonymous
04/03/26(Fri)14:05:57 No.108518334

Anonymous 04/03/26(Fri)14:05:57 No.108518334

>>108518321
>(You)

Anonymous
04/03/26(Fri)14:06:42 No.108518336

Anonymous 04/03/26(Fri)14:06:42 No.108518336

How can I convert p-e-w/gemma-4-E2B-it-heretic-ara to a ONNX model? They are 2.3x faster

Anonymous
04/03/26(Fri)14:07:05 No.108518339

Anonymous 04/03/26(Fri)14:07:05 No.108518339

>>108518327
for some reason it reminds me of rwkv-7

Anonymous
04/03/26(Fri)14:07:17 No.108518340

Anonymous 04/03/26(Fri)14:07:17 No.108518340

>>108518182
>grokking the prompt
people just use words as if they have no meaning lol

Anonymous
04/03/26(Fri)14:07:52 No.108518344

Anonymous 04/03/26(Fri)14:07:52 No.108518344

>>108518327
you now remember q orion berry

Anonymous
04/03/26(Fri)14:08:04 No.108518345

Anonymous 04/03/26(Fri)14:08:04 No.108518345

best I can run is gemma-4-26B-A4B-it-GGUF:UD-Q2_K_XL

Anonymous
04/03/26(Fri)14:08:09 No.108518347

Anonymous 04/03/26(Fri)14:08:09 No.108518347

>>108518290
>>108518321
Have you SEEN the shit unsloth puts in their "fixed" jinja templates? Imagine blindly using that and not even knowing because you don't want to take 2 minutes to set up the template in sillytavern yourself.

Anonymous
04/03/26(Fri)14:08:39 No.108518350

Anonymous 04/03/26(Fri)14:08:39 No.108518350

>>108518182
>It would also essentially end the MoE paradigm as you would need to have the actual GPU compute to do a couple of training runs as part of the "reasoning" process.
You are making a naive assumption that it must be a dense model and the continuous training must be applied to all weights. Something like Mixture of a Million Experts https://arxiv.org/abs/2407.04153 is cheaper and more realistic. You only need to train a new, small expert on the prompt while freezing the rest of the weights while also being able to keep the cost and speed advantages of MoE.

Anonymous
04/03/26(Fri)14:08:44 No.108518353

Anonymous 04/03/26(Fri)14:08:44 No.108518353

File: nothink.png (6 KB, 197x81)

6 KB PNG

>>108518321
meanwhile
>>108518290
thank you kitten i knew you'd understand

Anonymous
04/03/26(Fri)14:08:52 No.108518354

Anonymous 04/03/26(Fri)14:08:52 No.108518354

>>108518334
yes, me

>>108518347
how is that fucking related to the question asked, get meds

Anonymous
04/03/26(Fri)14:08:54 No.108518355

Anonymous 04/03/26(Fri)14:08:54 No.108518355

>>108518327
I don't think "continuous learning" is a bad idea in concept, but the idea of users essentially fine-tuning a model with their prompts is something that will never be allowed. The risk for model poisoning is too high.

Anonymous
04/03/26(Fri)14:09:10 No.108518356

Anonymous 04/03/26(Fri)14:09:10 No.108518356

>>108518347
>Have you SEEN the shit unsloth puts in their "fixed" jinja templates?
good thing I'm using bart's gguf quants

Anonymous
04/03/26(Fri)14:09:16 No.108518358

Anonymous 04/03/26(Fri)14:09:16 No.108518358

>>108518182
Obvious bullshit
That being said, I do think there's gains to be had for a smarter reasoning pattern beyond "have it shit out a bunch of stream of thought tokens"
I could see an approach where the model comes up with more of a concept and updates it with information (think like a fluid, updating graph) and maybe even one where reasoning is its own "reasoning language" entirely, optimized to represent concepts and updates to those concepts rather than be human readable
I think there's still room there and I could see a next "shift" which improves on that somehow

Anonymous
04/03/26(Fri)14:09:46 No.108518360

Anonymous 04/03/26(Fri)14:09:46 No.108518360

>>108518182
Think anon think. If it doesn't scale, then most people won't get access to it either, not just "localfags". If it does scale, then local will get it eventually just like everything else.

Anonymous
04/03/26(Fri)14:10:39 No.108518362

Anonymous 04/03/26(Fri)14:10:39 No.108518362

>>108518355
They probably would quarantine each trained model either per session or per user to avoid contamination. Wouldn't be practical and a security/privacy nightmare otherwise.

Anonymous
04/03/26(Fri)14:11:25 No.108518366

Anonymous 04/03/26(Fri)14:11:25 No.108518366

it's never been more over for local models

Anonymous
04/03/26(Fri)14:11:39 No.108518367

Anonymous 04/03/26(Fri)14:11:39 No.108518367

File: wtuwwm7i.jpg (221 KB, 1536x1024)

221 KB JPG

>>108518295
more accurate version

Anonymous
04/03/26(Fri)14:11:59 No.108518368

Anonymous 04/03/26(Fri)14:11:59 No.108518368

>>108518362
That's not scalable.

Anonymous
04/03/26(Fri)14:12:30 No.108518371

Anonymous 04/03/26(Fri)14:12:30 No.108518371

If you use Chat Completion, you don't belong on /lmg/. Simple as.

Anonymous
04/03/26(Fri)14:12:52 No.108518375

Anonymous 04/03/26(Fri)14:12:52 No.108518375

>>108518327
ahh yes, the rumors and hype posts, always reliable sources of information
don't forget, AGI has been achieved internally :strawberry: :rocket:

Anonymous
04/03/26(Fri)14:12:57 No.108518376

Anonymous 04/03/26(Fri)14:12:57 No.108518376

>>108518367
>text comp for gays
sounds about right

Anonymous
04/03/26(Fri)14:13:18 No.108518381

Anonymous 04/03/26(Fri)14:13:18 No.108518381

>>108518354
Sorry retard I didn't mean to quote your post. Also turn up your rep penalty you're repeating yourself.

Anonymous
04/03/26(Fri)14:14:04 No.108518390

Anonymous 04/03/26(Fri)14:14:04 No.108518390

>>108518371
I've been in these threads for 6 months working on AI and ML systems design extensively and I don't even know what the practical difference is between text and chat completion.

Seems like a nothing-burger thing for goyim to argue over. Like xbox vs playstation. Android vs apple.

Anonymous
04/03/26(Fri)14:14:10 No.108518391

Anonymous 04/03/26(Fri)14:14:10 No.108518391

oh so the current stupid bait is about text vs chat, got it

Anonymous
04/03/26(Fri)14:14:17 No.108518392

Anonymous 04/03/26(Fri)14:14:17 No.108518392

>>108518368
See >>108518350
Either it is scalable per user, or it's not scalable at all. They aren't going to finetune a >100B on every request.

Anonymous
04/03/26(Fri)14:14:47 No.108518397

Anonymous 04/03/26(Fri)14:14:47 No.108518397

>>108518371
Care to give the me context template for gemma-4? Think so.

Anonymous
04/03/26(Fri)14:16:08 No.108518404

Anonymous 04/03/26(Fri)14:16:08 No.108518404

>>108518397
Le context template is Retardo Tavern's own way to manage its internal prompt slots "authors notes, permanent world book data and such", it's not related to model per se.

Anonymous
04/03/26(Fri)14:16:19 No.108518406

Anonymous 04/03/26(Fri)14:16:19 No.108518406

>>108518390
>I don't even know what the practical difference is between text and chat completion.
chat completion allows you to not have to deal with prompt template of models, since it's already on the gguf file it can retrieve it, with text complection you have to reconstruct that by yourself, fuck this shit lol

Anonymous
04/03/26(Fri)14:16:25 No.108518408

Anonymous 04/03/26(Fri)14:16:25 No.108518408

>>108518392
I saw that post. Even finetuning a single weight per user isn't scalable. Even context itself is causing massive issues with scalability.

Anonymous
04/03/26(Fri)14:16:37 No.108518411

Anonymous 04/03/26(Fri)14:16:37 No.108518411

>>108518347
Thank you anon I'm glad somebody out there understands

Anonymous
04/03/26(Fri)14:16:49 No.108518412

Anonymous 04/03/26(Fri)14:16:49 No.108518412

>>108518390
are you the guy who keeps posting random reddit threads?
would line up in terms of timeline and intelligence

Anonymous
04/03/26(Fri)14:17:22 No.108518419

Anonymous 04/03/26(Fri)14:17:22 No.108518419

uhh how do I do tool calling in text completion?

Anonymous
04/03/26(Fri)14:17:29 No.108518421

Anonymous 04/03/26(Fri)14:17:29 No.108518421

File: 1772401004491.png (496 KB, 1044x1782)

496 KB PNG

>>108518347
I pulled this from the archive in case all the newfags in here care to know (there seem to be a lot of you today). This is what you're subjecting yourself to when you use chat completion.

Anonymous
04/03/26(Fri)14:17:36 No.108518422

Anonymous 04/03/26(Fri)14:17:36 No.108518422

>>108518404
Ok, then why it repeats itself at the end of the prompt like retard in loop?

Anonymous
04/03/26(Fri)14:17:53 No.108518425

Anonymous 04/03/26(Fri)14:17:53 No.108518425

>>108518392
>>108518350
lol just use pure RNNs at that point

Anonymous
04/03/26(Fri)14:18:09 No.108518428

Anonymous 04/03/26(Fri)14:18:09 No.108518428

>>108518421
>This is what you're subjecting yourself to when you use chat completion.
it's not true, you can use bart's gguf and it doesn't have this jeet code

Anonymous
04/03/26(Fri)14:18:35 No.108518433

Anonymous 04/03/26(Fri)14:18:35 No.108518433

>>108518421
This screenshot never fails to make me laugh.

Anonymous
04/03/26(Fri)14:19:40 No.108518445

Anonymous 04/03/26(Fri)14:19:40 No.108518445

>>108518421
goy here, what else am I supposed to use. I just run it in LM studio because i am too lazy to type it out in cmd

Anonymous
04/03/26(Fri)14:19:46 No.108518446

Anonymous 04/03/26(Fri)14:19:46 No.108518446

>>108518406
Chat completion seems better intuitively. I just checked and apparently that's what I'm using for ST (I don't use it much). Less bloat, more reliable. I don't do sampling on front ends. I tend to prefer to just use sampling flags with llama.cpp.
>>108518412
No.

Anonymous
04/03/26(Fri)14:19:57 No.108518450

Anonymous 04/03/26(Fri)14:19:57 No.108518450

>>108518433
We know you're easily amused

Anonymous
04/03/26(Fri)14:20:13 No.108518453

Anonymous 04/03/26(Fri)14:20:13 No.108518453

File: this.png (114 KB, 640x640)

114 KB PNG

>>108518421
why should I care? if it works it works

Anonymous
04/03/26(Fri)14:22:03 No.108518468

Anonymous 04/03/26(Fri)14:22:03 No.108518468

>>108518422
Broken quant, out of date llama-server, broken chat template (this is important) implementation. Sillytavern isn't the most reliable way to test out stuff.
If you want to try you could just use llama-server's webui and see if it's broken there. If it's not then it's silly tavern issue.

Anonymous
04/03/26(Fri)14:23:32 No.108518479

Anonymous 04/03/26(Fri)14:23:32 No.108518479

So I've been thinking of making a userscript for llama.cpp's webui to add in character card functionality and maybe a RAG system. Is this a good approach or is there a better way to have a separate codebase that injects mods into the server?

I have tried making my own independent front-ends before, but I don't like the tech debt of having to reimplement basic features from the ground up when they already exist in a clean format elsewhere.

Anonymous
04/03/26(Fri)14:23:51 No.108518480

Anonymous 04/03/26(Fri)14:23:51 No.108518480

>tool calling still doesn't work.

Anonymous
04/03/26(Fri)14:24:31 No.108518484

Anonymous 04/03/26(Fri)14:24:31 No.108518484

File: THIS IS CRASHING THE TEMP(...).png (46 KB, 1364x465)

46 KB PNG

Anonymous
04/03/26(Fri)14:24:33 No.108518485

Anonymous 04/03/26(Fri)14:24:33 No.108518485

>>108518480
use chat comp

Anonymous
04/03/26(Fri)14:25:11 No.108518488

Anonymous 04/03/26(Fri)14:25:11 No.108518488

>>108518480
What is tool calling and how does it work? Is an MCP server an example for a tool call? What else? Do people give LLMs access to calculators to get more accurate math results for example? I don't really get it.

Anonymous
04/03/26(Fri)14:25:32 No.108518490

Anonymous 04/03/26(Fri)14:25:32 No.108518490

>>108518480
See >>108518484
Try replacing
>:'
with
> '

Anonymous
04/03/26(Fri)14:25:38 No.108518491

Anonymous 04/03/26(Fri)14:25:38 No.108518491

>>108518468
To be honest it works great on llama-server's webui but it's only usable on ST with chat template. I quant my own models, just pulled the repo an hour ago, don't know about the chat template, I just use jinja.

Anonymous
04/03/26(Fri)14:25:50 No.108518494

Anonymous 04/03/26(Fri)14:25:50 No.108518494

>>108518428
There's no guarantee bartowski didn't fuck something up. And if you have to double check, why not just do it yourself anyway?
>>108518453
>reddit frog
Because sometimes it straight up doesn't work, or it only partially works, or there's a small error that lowers the output quality.
>>108518445
You only have to type commands out once, you know.

Anonymous
04/03/26(Fri)14:27:23 No.108518507

Anonymous 04/03/26(Fri)14:27:23 No.108518507

Well, it's over. Spud is AGI. I'm convinced. End of the line. No more local. No more anything. No more use for anyone.

Anonymous
04/03/26(Fri)14:28:09 No.108518513

Anonymous 04/03/26(Fri)14:28:09 No.108518513

>>108518494
>There's no guarantee bartowski didn't fuck something up. And if you have to double check, why not just do it yourself anyway?
because fuck that shit

Anonymous
04/03/26(Fri)14:28:15 No.108518515

Anonymous 04/03/26(Fri)14:28:15 No.108518515

>>108518494
>Because sometimes it straight up doesn't work, or it only partially works, or there's a small error that lowers the output quality.
does it happen to gemma 4 though? if not then shut the fuck up

Anonymous
04/03/26(Fri)14:28:15 No.108518516

Anonymous 04/03/26(Fri)14:28:15 No.108518516

File: k.png (17 KB, 1027x81)

17 KB PNG

Anonymous
04/03/26(Fri)14:29:14 No.108518523

Anonymous 04/03/26(Fri)14:29:14 No.108518523

>>108518428
>just use mr. dusky nipples quant, what could go wrong?

Anonymous
04/03/26(Fri)14:29:45 No.108518527

Anonymous 04/03/26(Fri)14:29:45 No.108518527

any poorfags managed to get anything passable for programming working with a 16GB GPU and can give a pointer on how to start?

Anonymous
04/03/26(Fri)14:30:34 No.108518530

Anonymous 04/03/26(Fri)14:30:34 No.108518530

>>108518488
>What is tool calling
It's what it sounds like. You give the LLM a list of tools/functions and it's trained to receive this list of tools, "call" those tools, then the return from those tools.

>and how does it work?
The model is trained to recognize a certain output as a list of tools and to call these tools by returning a certain format (e.g. JSON). Then the client reads that output, executes whatever it is it has to execute, then sends the result back to the model.

>Is an MCP server an example for a tool call?
Yes.
MCP servers essentially return a list of tools the model can call.

>Do people give LLMs access to calculators to get more accurate math results for example?
Yes.
Also for things like web searching, writing and fetching "memories", rolling dice, reading and writing to files, executing console commands, creating sub-agents, etc.
It's a pretty cool thing, I think.
Does that help?

Anonymous
04/03/26(Fri)14:30:40 No.108518531

Anonymous 04/03/26(Fri)14:30:40 No.108518531

>>108518516
>NOOOO you can't make things cheaper, you're supposed to need ME to make what you need! you're supposed to beg ME for MY services!

Anonymous
04/03/26(Fri)14:30:50 No.108518532

Anonymous 04/03/26(Fri)14:30:50 No.108518532

>>108518516
That optimization works on diffusion models too? Shit.

Anonymous
04/03/26(Fri)14:31:19 No.108518536

Anonymous 04/03/26(Fri)14:31:19 No.108518536

>>108518527
16gb chad here >>108518345

Anonymous
04/03/26(Fri)14:31:39 No.108518539

Anonymous 04/03/26(Fri)14:31:39 No.108518539

>>108518532
I guess so? diffusion models also use KV cache

Anonymous
04/03/26(Fri)14:31:44 No.108518540

Anonymous 04/03/26(Fri)14:31:44 No.108518540

how come you guys are complaining about broken quants but I don't see any of you making your own? This is Local Models General right?

Anonymous
04/03/26(Fri)14:31:59 No.108518542

Anonymous 04/03/26(Fri)14:31:59 No.108518542

>>108517717
>literally wrong thinking tag
Grim

Anonymous
04/03/26(Fri)14:32:20 No.108518549

Anonymous 04/03/26(Fri)14:32:20 No.108518549

>>108518536 (me)
I get 65 t/s btw

Anonymous
04/03/26(Fri)14:32:29 No.108518551

Anonymous 04/03/26(Fri)14:32:29 No.108518551

File: yes.png (31 KB, 1005x147)

31 KB PNG

>>108518532

Anonymous
04/03/26(Fri)14:33:24 No.108518557

Anonymous 04/03/26(Fri)14:33:24 No.108518557

>>108518540
I'm a brahmin, this is why. I expect to be served.

Anonymous
04/03/26(Fri)14:33:27 No.108518559

Anonymous 04/03/26(Fri)14:33:27 No.108518559

>>108518551
can you provide the link of this

Anonymous
04/03/26(Fri)14:33:30 No.108518560

Anonymous 04/03/26(Fri)14:33:30 No.108518560

>>108518530
Thank you, that helps. So an MCP server is like the main hub that the LLM interacts with to do tool calls in general, not just web searches? Do small LLMs like function gemma exist separately to do tool calling? Are larger models supposed to interact with smaller models that focus only on tool calling?

Anonymous
04/03/26(Fri)14:33:57 No.108518562

Anonymous 04/03/26(Fri)14:33:57 No.108518562

>>108518515
People are complaining about issues with gemma 4 in this very thread, believe it or not. You can scroll up (or down) and try reading if you feel up to it.

Anonymous
04/03/26(Fri)14:34:27 No.108518565

Anonymous 04/03/26(Fri)14:34:27 No.108518565

>>108518557
kek

Anonymous
04/03/26(Fri)14:34:58 No.108518569

Anonymous 04/03/26(Fri)14:34:58 No.108518569

>>108518540
>make your own quants
>they're broken too

Anonymous
04/03/26(Fri)14:35:10 No.108518572

Anonymous 04/03/26(Fri)14:35:10 No.108518572

File: 1749442329127266.jpg (135 KB, 1024x1024)

135 KB JPG

What frontend do you guys use (sillytavern aside), like LM studio? Openclaw? Like which and for what usecase?

Anonymous
04/03/26(Fri)14:35:28 No.108518574

Anonymous 04/03/26(Fri)14:35:28 No.108518574

>>108518559
https://f95zone.to/threads/ai-is-coming.292160/post-19935958

Anonymous
04/03/26(Fri)14:35:28 No.108518575

Anonymous 04/03/26(Fri)14:35:28 No.108518575

>>108518540
lmao and make actually something useful? I'd rather just write "saaar" while behaving like one

Anonymous
04/03/26(Fri)14:35:28 No.108518576

Anonymous 04/03/26(Fri)14:35:28 No.108518576

>>108518562
gemma 4 having issues doesn't automatically means the issues are due to his gguf script, you're just talking with your ass

Anonymous
04/03/26(Fri)14:36:34 No.108518587

Anonymous 04/03/26(Fri)14:36:34 No.108518587

>>108518562
and that's related to jinja and not a rushed early llama.cpp implementation how?

Anonymous
04/03/26(Fri)14:36:43 No.108518589

Anonymous 04/03/26(Fri)14:36:43 No.108518589

>>108518572
I hate all of them except the oobabooga, but it isnt a "frontend" so...fucking end me already

Anonymous
04/03/26(Fri)14:37:32 No.108518598

Anonymous 04/03/26(Fri)14:37:32 No.108518598

>>108518540
If llama.cpp has problems with the model then making your own quant isn't going to help. Also, safetensors files usually take up a lot of disk space so people don't want to download them

Anonymous
04/03/26(Fri)14:37:33 No.108518599

Anonymous 04/03/26(Fri)14:37:33 No.108518599

>>108518560
>So an MCP server is like the main hub that the LLM interacts with to do tool calls in general, not just web searches?
Pretty much.
You can make a MCP server that simply executes a calculator for example.
Or just calls a function that returns the text "banana".

>Do small LLMs like function gemma exist separately to do tool calling? Are larger models supposed to interact with smaller models that focus only on tool calling?
Models tend to be trained to be able to use the tools themselves. I don't think that a workflow where a "normal model" is aided by a "tool call model" is common or used, at least not to AFAIK.
Every model you see being used for coding or agentic stuff is making use of its own function/tool calling capabilities.

Anonymous
04/03/26(Fri)14:37:50 No.108518604

Anonymous 04/03/26(Fri)14:37:50 No.108518604

>>108518549
and? how good it is?

Anonymous
04/03/26(Fri)14:38:24 No.108518609

Anonymous 04/03/26(Fri)14:38:24 No.108518609

>americuck models flopped again
at least we're getting glm5.1 and minimax 2.7 soon :)

Anonymous
04/03/26(Fri)14:38:56 No.108518612

Anonymous 04/03/26(Fri)14:38:56 No.108518612

File: 1767055675228183.png (439 KB, 1000x500)

439 KB PNG

>>108518574
>My worry about AI, especially with google announcing they found some way to cut costs in 1/8th or something like that, is that people will just eat these passable illustrations and the questionable software as long as its cheaper. This seems to be the case with a lot of the older people I talk to that watch these fucking 2 minute fully ai generated shorts where they just say "oh just because its AI doesnt mean its not beautiful", or "I just like it". Like i had no idea the standard for entertainment in 2026 was a one prompt "lifegaurd cat saves baby" and 10 million people drool on their phone like a retard seeing sonic for the first time.
why this retard is noticing NOW that people have low standards?

Anonymous
04/03/26(Fri)14:39:04 No.108518614

Anonymous 04/03/26(Fri)14:39:04 No.108518614

>>108518604
I does 65 t/s

Anonymous
04/03/26(Fri)14:40:06 No.108518623

Anonymous 04/03/26(Fri)14:40:06 No.108518623

>>108518612
because he's a retard?
duh

Anonymous
04/03/26(Fri)14:40:38 No.108518629

Anonymous 04/03/26(Fri)14:40:38 No.108518629

>>108518609 (me)
Also when nobody's looking I put whipped cream in my mom's pussy and scrape it out with my tongue.

Anonymous
04/03/26(Fri)14:41:02 No.108518634

Anonymous 04/03/26(Fri)14:41:02 No.108518634

>>108518572
openwebui

I like it because it's like chatgpt at home. However its response editing and continue features seem to be broken currently, so I can't do prefills

Anonymous
04/03/26(Fri)14:41:11 No.108518636

Anonymous 04/03/26(Fri)14:41:11 No.108518636

>>108518576
>>108518587
Until we find out where the issues are coming from, there's no guarantee that the issues aren't coming from there. It's better to eliminate it as a possibility, don't you think? I don't understand why this is an argument or why you're so mad.

Anonymous
04/03/26(Fri)14:41:21 No.108518638

Anonymous 04/03/26(Fri)14:41:21 No.108518638

>>108518629
based incestchad

Anonymous
04/03/26(Fri)14:41:24 No.108518639

Anonymous 04/03/26(Fri)14:41:24 No.108518639

>>108518612
I'm so tired of this attitude

Anonymous
04/03/26(Fri)14:42:44 No.108518650

Anonymous 04/03/26(Fri)14:42:44 No.108518650

>>108518599
Much appreciated.

Anonymous
04/03/26(Fri)14:43:06 No.108518653

Anonymous 04/03/26(Fri)14:43:06 No.108518653

File: ages.png (261 KB, 803x588)

261 KB PNG

Anonymous
04/03/26(Fri)14:43:26 No.108518655

Anonymous 04/03/26(Fri)14:43:26 No.108518655

>>108517681
>I think the logit softcapping mechanism is not working, for a reason or another.

I downloaded the HF weights of Gemma-4-31B-it, and with simple python code for inference in 4-bit with bitsandbytes I tried changing the "final_logit_softcapping" setting in the model configuration, keeping all inference settings and seed the same.

"final_logit_softcapping": 30.0 (default)

>**Logit softcapping** is a regularization technique used in deep learning, particularly in Transformer-based architectures, to prevent the values of logits (the raw output scores before a softmax layer) from growing excessively large. In standard models, logits can scale indefinitely during training, which often leads to "overconfidence" in the model's predictions. When logits become extremely large, the resulting softmax distribution becomes a one-hot vector with a very sharp peak, which can cause vanishing gradients during backpropagation and make the model prone to instability or overfitting. [...]

(looks coherent)

"final_logit_softcapping": 15.0 (half the original value)

>**Logit softcapping** is a regularization technique commonly used in deep learning—particularly in language amodels (similar to variants in some GPT an architecture examples)—to prevent values from reaching extremes before they pass through an activation or transformation function_ such_as softmax(). In standard neural network architectures with Transformer-style внимaния transformers, value growth in logit streams caused’unrestricted value buildup میتواند create excessive entropy distributions over tıme; the software process instead uses_at fixed clip range $L$ such $\tanh\l(x/ \text{max}) + x}$ specifically designed softsSoftcap applied logic prevent an “overflow of आत्मविश्वास during-scaling effectively stabilizing梯度 flows**. [...]

(feels high temperature-ish)

Anonymous
04/03/26(Fri)14:43:29 No.108518656

Anonymous 04/03/26(Fri)14:43:29 No.108518656

>>108518629
interesting execution but next time I would recommend doing this with something that makes me look bad instead of based

Anonymous
04/03/26(Fri)14:43:49 No.108518660

Anonymous 04/03/26(Fri)14:43:49 No.108518660

>>108518574
Fuck, I have to make my crash grab quicker.

Anonymous
04/03/26(Fri)14:43:55 No.108518663

Anonymous 04/03/26(Fri)14:43:55 No.108518663

>>108518636
I doubt the issues are from his gguf script though, I tested unsloth and bart (both have different jinja scripts) and the issues were the exact same, the problem comes from elsewhere

Anonymous
04/03/26(Fri)14:44:01 No.108518665

Anonymous 04/03/26(Fri)14:44:01 No.108518665

>>108518634
it smells too grifter-ish to me
certainly a good 'chatgpt at home' but it never felt anywhere meant to be 'local' to me unless what you want is locally hosted service if it makes any sense

Anonymous
04/03/26(Fri)14:44:22 No.108518668

Anonymous 04/03/26(Fri)14:44:22 No.108518668

>>108518653
>lost life
RIP Tokiko

Anonymous
04/03/26(Fri)14:45:37 No.108518682

Anonymous 04/03/26(Fri)14:45:37 No.108518682

>>108518653
maybe the style is too specific for the model to work well with it
also I miss that game

Anonymous
04/03/26(Fri)14:46:01 No.108518685

Anonymous 04/03/26(Fri)14:46:01 No.108518685

>>108517681
>>108518655
this issue seemed to have been (partially) fixed >>108517829

Anonymous
04/03/26(Fri)14:46:40 No.108518692

Anonymous 04/03/26(Fri)14:46:40 No.108518692

>>108518612
https://www.youtube.com/watch?v=3_e8bQ6i43o

Anonymous
04/03/26(Fri)14:46:49 No.108518695

Anonymous 04/03/26(Fri)14:46:49 No.108518695

>>108518655
Are you retarded? Lower cap = logits will be similar = lower prob tokens will be chosen more

Anonymous
04/03/26(Fri)14:47:15 No.108518700

Anonymous 04/03/26(Fri)14:47:15 No.108518700

>>108518663
Or both jinja templates have the same issue. But I'll take your word for it, it probably is coming from somewhere else then.

Anonymous
04/03/26(Fri)14:50:19 No.108518725

Anonymous 04/03/26(Fri)14:50:19 No.108518725

>>108518695
The current problem with the GGUF quantizations is that most of the probability mass appears to be on just one token in far too many cases. It's as if it's not capping them low enough, but changing the soft capping setting via KV override doesn't do anything, unlike the HF weights via Transformers/Python.

Anonymous
04/03/26(Fri)14:51:19 No.108518731

Anonymous 04/03/26(Fri)14:51:19 No.108518731

File: loss.png (25 KB, 267x798)

25 KB PNG

Hmmmmmmmmmmmmm

Anonymous
04/03/26(Fri)14:51:57 No.108518734

Anonymous 04/03/26(Fri)14:51:57 No.108518734

>>108518725
That has nothing to do with quantization. That's just Gemma
Gemma 3 also has the same problem

Anonymous
04/03/26(Fri)14:53:03 No.108518745

Anonymous 04/03/26(Fri)14:53:03 No.108518745

>>108518182
>How are local models going to respond to this?
idk the chinese will keep copying the frontier labs and stay 5-12 months behind i guess

Anonymous
04/03/26(Fri)14:53:09 No.108518748

Anonymous 04/03/26(Fri)14:53:09 No.108518748

>>108518692
Holy cringe

Anonymous
04/03/26(Fri)14:53:24 No.108518749

Anonymous 04/03/26(Fri)14:53:24 No.108518749

>>108518685
Top_p=1 is doing most of the lifting in your case. It's just occasionally selecting garbage tokens. If you use top_p=0.95 as recommended by Google, results are basically the same regardless of temperature.

Anonymous
04/03/26(Fri)14:53:33 No.108518750

Anonymous 04/03/26(Fri)14:53:33 No.108518750

File: 1746072819610882.jpg (43 KB, 1366x67)

43 KB JPG

>>108518604
I'll tell you when it's done rewriting 4chan in rust

Anonymous
04/03/26(Fri)14:53:57 No.108518752

Anonymous 04/03/26(Fri)14:53:57 No.108518752

>>108517829
the logprobs returned by llama.cpp are pre-sampling by default so this wouldn't affect them btw

Anonymous
04/03/26(Fri)14:54:18 No.108518754

Anonymous 04/03/26(Fri)14:54:18 No.108518754

>>108518375
Strawberry aka reasoning models were an incredibly massive leap thougbeit

Anonymous
04/03/26(Fri)14:55:00 No.108518758

Anonymous 04/03/26(Fri)14:55:00 No.108518758

>>108518663
>both have different jinja scripts
no they are the same, at least for the 31B

Anonymous
04/03/26(Fri)14:55:29 No.108518763

Anonymous 04/03/26(Fri)14:55:29 No.108518763

>>108518752
>the logprobs returned by llama.cpp are pre-sampling by default
that's lame, I'd want to see the logprobs after sampling

Anonymous
04/03/26(Fri)14:56:04 No.108518769

Anonymous 04/03/26(Fri)14:56:04 No.108518769

>>108518665
I agree that they're trying to be too much and/or get into the corporate world
Openwebui with less features would be ideal for me, but I still want to use it instead of eg. llama.cpp's internal thing because I have all my chats there (from chatgpt as well) and also api keys for openai and deepseek

Anonymous
04/03/26(Fri)14:56:53 No.108518772

Anonymous 04/03/26(Fri)14:56:53 No.108518772

>>108518734
There's a possible problem with the llama.cpp Gemma 4 (and possibly 2/3) implementation, not the GGUF quantizations themselves. Overconfidence and apparent insensitivity to temperature could be fixed or at least mitigated with a lower final logit soft cap, which works with Transformers but not with llama.cpp.

Anonymous
04/03/26(Fri)14:57:16 No.108518775

Anonymous 04/03/26(Fri)14:57:16 No.108518775

>>108518549
>Q2
KWAB

Anonymous
04/03/26(Fri)14:58:09 No.108518781

Anonymous 04/03/26(Fri)14:58:09 No.108518781

>>108518763
you can get the final probabilities but you need to explicitly request them with
"post_sampling_probs": true

Anonymous
04/03/26(Fri)14:59:13 No.108518789

Anonymous 04/03/26(Fri)14:59:13 No.108518789

>>108518080

So it's the google model that has a bit of cock blocking, but ONNX is way worse somehow

Anonymous
04/03/26(Fri)15:00:14 No.108518799

Anonymous 04/03/26(Fri)15:00:14 No.108518799

>>108518775
Q3 sends display and other programs to a crawl

Anonymous
04/03/26(Fri)15:01:04 No.108518808

Anonymous 04/03/26(Fri)15:01:04 No.108518808

>>108518781
Good info thx

Anonymous
04/03/26(Fri)15:04:22 No.108518825

Anonymous 04/03/26(Fri)15:04:22 No.108518825

https://www.reddit.com/r/LocalLLaMA/comments/1sbma94/observationtest_gemma_4_being_less_restricted/
oof

Anonymous
04/03/26(Fri)15:05:43 No.108518829

Anonymous 04/03/26(Fri)15:05:43 No.108518829

>>108518825
that title makes no sense, the model isn't gonna go away

Anonymous
04/03/26(Fri)15:06:02 No.108518831

Anonymous 04/03/26(Fri)15:06:02 No.108518831

>>108518825
You should really fuck off and post under the posts you link instead.

Anonymous
04/03/26(Fri)15:06:16 No.108518834

Anonymous 04/03/26(Fri)15:06:16 No.108518834

>>108518825
back you must go

Anonymous
04/03/26(Fri)15:06:21 No.108518835

Anonymous 04/03/26(Fri)15:06:21 No.108518835

>>108518825
I'm not going to reddit and I don't really care what they think

Anonymous
04/03/26(Fri)15:07:10 No.108518840

Anonymous 04/03/26(Fri)15:07:10 No.108518840

>>108518829
>the model isn't gonna go away
but the bugs causing it be "based" likely will :)

Anonymous
04/03/26(Fri)15:07:40 No.108518843

Anonymous 04/03/26(Fri)15:07:40 No.108518843

File: 1773723123919062.png (75 KB, 978x386)

75 KB PNG

>>108518781
thanks anon, and I got nonsense when putting this value to "true" lol

Anonymous
04/03/26(Fri)15:08:10 No.108518846

Anonymous 04/03/26(Fri)15:08:10 No.108518846

>>108518840
based on what?

Anonymous
04/03/26(Fri)15:08:17 No.108518848

Anonymous 04/03/26(Fri)15:08:17 No.108518848

>>108518825
You have angered the Gemmers mob.

Anonymous
04/03/26(Fri)15:09:16 No.108518853

Anonymous 04/03/26(Fri)15:09:16 No.108518853

>>108518840
how so?
if the behavior is understood, it can be replicated

Anonymous
04/03/26(Fri)15:10:06 No.108518856

Anonymous 04/03/26(Fri)15:10:06 No.108518856

AHHH MY GEMMA.... ITS MELTING

Anonymous
04/03/26(Fri)15:10:17 No.108518857

Anonymous 04/03/26(Fri)15:10:17 No.108518857

>>108518843
So this... is the power of vibecoding...

Anonymous
04/03/26(Fri)15:10:21 No.108518858

Anonymous 04/03/26(Fri)15:10:21 No.108518858

>>108518853
it's shown :) bumping up the runtime version makes it refuse the prompt, downgrading makes it comply

Anonymous
04/03/26(Fri)15:11:24 No.108518865

Anonymous 04/03/26(Fri)15:11:24 No.108518865

>>108518748
It's very attention capturing and that's what matters today
Attention is all you need

Anonymous
04/03/26(Fri)15:12:25 No.108518873

Anonymous 04/03/26(Fri)15:12:25 No.108518873

>>108518840
>>108518858
It's a good thing I'm on Vulkan :)
Should I try upgrading that too? :)
:) :)

Anonymous
04/03/26(Fri)15:12:52 No.108518877

Anonymous 04/03/26(Fri)15:12:52 No.108518877

>>108518865
>Attention is all you need
kek

Anonymous
04/03/26(Fri)15:13:51 No.108518884

Anonymous 04/03/26(Fri)15:13:51 No.108518884

gemma 4 is horny beyond my belief even without system prompt
almost trips itself into erp mode

Anonymous
04/03/26(Fri)15:14:32 No.108518887

Anonymous 04/03/26(Fri)15:14:32 No.108518887

>>108518884
N-nani?!

Anonymous
04/03/26(Fri)15:14:57 No.108518890

Anonymous 04/03/26(Fri)15:14:57 No.108518890

>>108518884
We will to fix right a ways sir!

Anonymous
04/03/26(Fri)15:15:14 No.108518891

Anonymous 04/03/26(Fri)15:15:14 No.108518891

>>108518840
>>108518858
>>108518873
That doesn't even make sense, the model is the model. If upgrading something makes it behave differently, just don't upgrade or keep the old version on your computer separately. You really should just go back to r.eddit, I'm sure they're much more interested in your unfounded hysteria

Anonymous
04/03/26(Fri)15:16:36 No.108518902

Anonymous 04/03/26(Fri)15:16:36 No.108518902

>>108518891
>the model is the model.
yes bug the runtime bug is not the model now is it?

Anonymous
04/03/26(Fri)15:18:12 No.108518914

Anonymous 04/03/26(Fri)15:18:12 No.108518914

>>108518848
It really do be funny watching them shit themselves

Anonymous
04/03/26(Fri)15:19:47 No.108518923

Anonymous 04/03/26(Fri)15:19:47 No.108518923

>>108518914
at least I'm not a ledditor

Anonymous
04/03/26(Fri)15:19:56 No.108518926

Anonymous 04/03/26(Fri)15:19:56 No.108518926

>>108518825
so what is it?
i dont want to make an account to view read whatever

Anonymous
04/03/26(Fri)15:20:00 No.108518927

Anonymous 04/03/26(Fri)15:20:00 No.108518927

finally got gemma 4 set up with kobold. Can you guys share your sillytavern settings?

Anonymous
04/03/26(Fri)15:20:26 No.108518928

Anonymous 04/03/26(Fri)15:20:26 No.108518928

>>108518902
I seriously have no idea what you're even talking about, do you want to explain? Or would you rather just keep posting vague doomsaying?

Anonymous
04/03/26(Fri)15:21:09 No.108518932

Anonymous 04/03/26(Fri)15:21:09 No.108518932

File: file.png (59 KB, 736x293)

59 KB PNG

>>108518926

Anonymous
04/03/26(Fri)15:21:13 No.108518933

Anonymous 04/03/26(Fri)15:21:13 No.108518933

>>108518032
I don't need matrix math to know my cock is perfect

Anonymous
04/03/26(Fri)15:21:24 No.108518935

Anonymous 04/03/26(Fri)15:21:24 No.108518935

>>108518865
Look, I'm glad you make money with this shit but still, it's fucking cringe. Go share it with some 'attention capturing' people. It's an insult to our intelligence. Or are you one of those guys who post the stupid anime characters with raceplay tattoos on them just because it grabs people's attention?

Anonymous
04/03/26(Fri)15:21:35 No.108518937

Anonymous 04/03/26(Fri)15:21:35 No.108518937

Am i doing something wrong to get tools to work? I cannot interact with my os at all when using opencode. Is there something more i have to do other than dowload the models, load the models and perhaps add tools: true to opencode.json? Here is my json along with the models i've tried

{
"$schema": "https://opencode.ai/config.json",
"model": "ollama/nemotron-cascade-2",
"provider": {
"ollama": {
"models": {
"gemma4:26b": {
"_launch": true,
"name": "gemma4:26b",
"tools": true
},
"gemma4:e4b": {
"_launch": true,
"name": "gemma4:e4b"
},
"nemotron-cascade-2": {
"_launch": true,
"name": "nemotron-cascade-2",
"tools": true
},
"qwen3.5:27b": {
"_launch": true,
"name": "qwen3.5:27b",
"tools": true
}
},
"name": "Ollama",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:11434/v1"
}
}
}
}

Anonymous
04/03/26(Fri)15:21:44 No.108518939

Anonymous 04/03/26(Fri)15:21:44 No.108518939

>>108516658
any models for helping to learn chinese/japanese?

Anonymous
04/03/26(Fri)15:22:09 No.108518944

Anonymous 04/03/26(Fri)15:22:09 No.108518944

>>108518932
>unslop
>lmstudio
so basically a nothingburger

Anonymous
04/03/26(Fri)15:22:48 No.108518952

Anonymous 04/03/26(Fri)15:22:48 No.108518952

>>108518939
Anki-0B

Anonymous
04/03/26(Fri)15:23:27 No.108518956

Anonymous 04/03/26(Fri)15:23:27 No.108518956

>>108518944
you sure know how to read :)

Anonymous
04/03/26(Fri)15:23:39 No.108518958

Anonymous 04/03/26(Fri)15:23:39 No.108518958

>>108518484
>>108518490
>[0msrv log_server_r: response: {"error":{"code":500,"message":"Failed to parse input at pos 41: Of course. To understand
Man, what a mess.
Did unsloth or somebody else drop a modified jinja template for this thing? One that doesn't explode with tool calling and structured output and the like?
Unless this is a llama.cpp level issue, then back to Qwen 3.5 it is if that's the case.

Anonymous
04/03/26(Fri)15:24:30 No.108518965

Anonymous 04/03/26(Fri)15:24:30 No.108518965

>>108518939
learn 1000-2000 most common words with an SRS over a few months
watch loads of content in these language subtitled in the target language
do that for 2-3 years
and suddenly you are fluent enough

Anonymous
04/03/26(Fri)15:24:49 No.108518968

Anonymous 04/03/26(Fri)15:24:49 No.108518968

>108518956
>>108518834

Anonymous
04/03/26(Fri)15:25:22 No.108518972

Anonymous 04/03/26(Fri)15:25:22 No.108518972

>>108518965
no, watching chinese cartoons won't help you learn chinese sorry

Anonymous
04/03/26(Fri)15:26:07 No.108518975

Anonymous 04/03/26(Fri)15:26:07 No.108518975

Why does every girl Gemma 4 create smell of fucking strawberries?

Anonymous
04/03/26(Fri)15:26:42 No.108518980

Anonymous 04/03/26(Fri)15:26:42 No.108518980

>>108518975
overfitted on straberry benchmark dataset

Anonymous
04/03/26(Fri)15:26:49 No.108518981

Anonymous 04/03/26(Fri)15:26:49 No.108518981

>>108518975
strawberry gemmussy...

Anonymous
04/03/26(Fri)15:27:49 No.108518990

Anonymous 04/03/26(Fri)15:27:49 No.108518990

>>108518972
it will though, and not just cartoons, anything with people talking

Anonymous
04/03/26(Fri)15:28:05 No.108518993

Anonymous 04/03/26(Fri)15:28:05 No.108518993

>>108518975
Time for you to create strawberry bench, anon.

Anonymous
04/03/26(Fri)15:28:20 No.108518995

Anonymous 04/03/26(Fri)15:28:20 No.108518995

>>108518975
do you prefer ozone?

Anonymous
04/03/26(Fri)15:28:42 No.108518999

Anonymous 04/03/26(Fri)15:28:42 No.108518999

>>108518975
you use ozone to keep strawberries fresh
https://pmc.ncbi.nlm.nih.gov/articles/PMC12787024/

Anonymous
04/03/26(Fri)15:28:52 No.108519001

Anonymous 04/03/26(Fri)15:28:52 No.108519001

File: 1753129408720981.png (54 KB, 1173x420)

54 KB PNG

G-Gemma-chan?!

Anonymous
04/03/26(Fri)15:29:00 No.108519006

Anonymous 04/03/26(Fri)15:29:00 No.108519006

>>108518926
You can replace "www" with "old" to get the good site that doesn't require a login.

Anonymous
04/03/26(Fri)15:29:29 No.108519015

Anonymous 04/03/26(Fri)15:29:29 No.108519015

File: bruh.png (1 KB, 88x34)

1 KB PNG

>>108519001

Anonymous
04/03/26(Fri)15:29:55 No.108519019

Anonymous 04/03/26(Fri)15:29:55 No.108519019

>>108518935
It's not my work.
You are being emotionally affected by the art so it's serving it's purpose.

Anonymous
04/03/26(Fri)15:29:55 No.108519020

Anonymous 04/03/26(Fri)15:29:55 No.108519020

>>108518026
if ST just ported over the text completions story string system and sampler menu over to chat completions I would use it, until then it is inferior for my autistic needs

Anonymous
04/03/26(Fri)15:30:09 No.108519024

Anonymous 04/03/26(Fri)15:30:09 No.108519024

>>108518958
Our best vibecoders are on it!
There are at least 4 issues that are marked "Closed", all fixing some part of the implementation.
Here's a new one!
https://github.com/ggml-org/llama.cpp/issues/21384

Anonymous
04/03/26(Fri)15:30:52 No.108519029

Anonymous 04/03/26(Fri)15:30:52 No.108519029

>>108519015
Yes...?

Anonymous
04/03/26(Fri)15:30:54 No.108519030

Anonymous 04/03/26(Fri)15:30:54 No.108519030

>>108518956
Where's the proof? You know, the outputs and token probabilities and full context that the model is being given?

Anonymous
04/03/26(Fri)15:30:54 No.108519031

Anonymous 04/03/26(Fri)15:30:54 No.108519031

>>108519001
>>108517763

Anonymous
04/03/26(Fri)15:31:27 No.108519036

Anonymous 04/03/26(Fri)15:31:27 No.108519036

>>108519029
kobold doesn't have gemma support

Anonymous
04/03/26(Fri)15:34:52 No.108519062

Anonymous 04/03/26(Fri)15:34:52 No.108519062

>>108519019
Nope, that's your projection on me. I'm just curious why would anyone share such turd, guessing you're making money out of it but instead, you consume that. I'm sorry for you, anon.

Anonymous
04/03/26(Fri)15:36:24 No.108519073

Anonymous 04/03/26(Fri)15:36:24 No.108519073

>>108519036
Guess I'll just wait then.

Anonymous
04/03/26(Fri)15:36:49 No.108519076

Anonymous 04/03/26(Fri)15:36:49 No.108519076

>>108519001
aesthetic failure mode

Anonymous
04/03/26(Fri)15:37:06 No.108519079

Anonymous 04/03/26(Fri)15:37:06 No.108519079

>>108519073
If you're feeling adventurous https://github.com/LostRuins/koboldcpp/releases/tag/rolling

Anonymous
04/03/26(Fri)15:38:17 No.108519084

Anonymous 04/03/26(Fri)15:38:17 No.108519084

>>108519079
That's what I was using. I'll just wait a few days for things to get ironed out.

Anonymous
04/03/26(Fri)15:38:31 No.108519085

Anonymous 04/03/26(Fri)15:38:31 No.108519085

I choose to wait for some lazy dev to update his fork because I need a GUI.

Anonymous
04/03/26(Fri)15:40:34 No.108519094

Anonymous 04/03/26(Fri)15:40:34 No.108519094

>>108519085
Yes.

Anonymous
04/03/26(Fri)15:40:56 No.108519101

Anonymous 04/03/26(Fri)15:40:56 No.108519101

Gemma4 makes me horny / excited

Anonymous
04/03/26(Fri)15:40:59 No.108519102

Anonymous 04/03/26(Fri)15:40:59 No.108519102

>>108519085
Koboldcpp has things that don't exist in llama.cpp, like phrase banning. You should really quote the post you're replying to, it's good manners

Anonymous
04/03/26(Fri)15:41:47 No.108519108

Anonymous 04/03/26(Fri)15:41:47 No.108519108

No.

Anonymous
04/03/26(Fri)15:42:49 No.108519117

Anonymous 04/03/26(Fri)15:42:49 No.108519117

>>108516658
I wish there was a moe 80B to 150B gemma 4.

Anonymous
04/03/26(Fri)15:43:44 No.108519127

Anonymous 04/03/26(Fri)15:43:44 No.108519127

>>108518825
Very weird

Anonymous
04/03/26(Fri)15:43:54 No.108519129

Anonymous 04/03/26(Fri)15:43:54 No.108519129

>>108519108
Rude

Anonymous
04/03/26(Fri)15:44:13 No.108519133

Anonymous 04/03/26(Fri)15:44:13 No.108519133

>>108519020
>ported over
learn what the settings do, main purpose of a frontend is "build the prompt for the LLM", ST already gives you enough tools to do so
>>108519062
>entirely missing the point
thanks for your attention

Anonymous
04/03/26(Fri)15:44:14 No.108519134

Anonymous 04/03/26(Fri)15:44:14 No.108519134

>>108519117
sorry, too powers for release

Anonymous
04/03/26(Fri)15:44:16 No.108519135

Anonymous 04/03/26(Fri)15:44:16 No.108519135

>>108519117
There is but we cannot and will not have it.

Anonymous
04/03/26(Fri)15:44:50 No.108519141

Anonymous 04/03/26(Fri)15:44:50 No.108519141

Were there some massive optimizations that I'm not aware of recently? Running 30b models on my 8gb vram GPU used to be basically impossible but now I'm consistently getting 15tps.

Anonymous
04/03/26(Fri)15:44:52 No.108519142

Anonymous 04/03/26(Fri)15:44:52 No.108519142

File: 00106-3050314564.png (321 KB, 512x512)

321 KB PNG

>>108519001
I've noticed, when using greedy sampling, that there's some very broken engrams in there. Usually solvable by a reroll. Would be interesting to see what can be done with those broken engrams with meme loras and SLERP merges.
Feeling kind of sad that I downsized my rig to 2x3090 right now. I didn't think we'd ever go back to models that were accessible for that kind of fuckery.

Anonymous
04/03/26(Fri)15:44:59 No.108519144

Anonymous 04/03/26(Fri)15:44:59 No.108519144

>>108519117
but anonnie, that would compete with gemini flash! we can't have that

Anonymous
04/03/26(Fri)15:46:37 No.108519155

Anonymous 04/03/26(Fri)15:46:37 No.108519155

File: 1766223200281683.gif (755 KB, 500x375)

755 KB GIF

>Finally figure out how to run GLM locally with thinking.
>Q8
>It still parrots.
GLM is a psyops I swear to god himself.

Anonymous
04/03/26(Fri)15:46:40 No.108519156

Anonymous 04/03/26(Fri)15:46:40 No.108519156

wow guys,
-ot "per_layer_token_embd.weight=CPU"
this saves quite a decent amount of vram at absolutely no performance cost on gemma 4
crazy that it isn't the default behavior when it's like that. local inference is a fucking ghetto

Anonymous
04/03/26(Fri)15:46:40 No.108519157

Anonymous 04/03/26(Fri)15:46:40 No.108519157

I have
>48 GB DDR4 RAM
>12 GB VRAM
and I was able to get Qwen3 4B Q4KM running but it was horrendously bad. What kind of model could I actually run on this hardware for light coding or other text related tasks? I don't care if it's slow as dirt as long as it is able to be useful while I am sleeping or at work.

Anonymous
04/03/26(Fri)15:47:46 No.108519165

Anonymous 04/03/26(Fri)15:47:46 No.108519165

>>108519117
There was supposed to be a 120b MoE one. It was on the 'rena and in an announcement social media post.

Anonymous
04/03/26(Fri)15:47:48 No.108519166

Anonymous 04/03/26(Fri)15:47:48 No.108519166

File: 1704139435354484.jpg (229 KB, 1024x1024)

229 KB JPG

>>108519142
>greedy sampling
>reroll
u wot m8

Anonymous
04/03/26(Fri)15:48:18 No.108519168

Anonymous 04/03/26(Fri)15:48:18 No.108519168

>>108518825
So what's actually happening here? LM Studio causing this?

Anonymous
04/03/26(Fri)15:48:24 No.108519170

Anonymous 04/03/26(Fri)15:48:24 No.108519170

>>108519155
A psyops, Anon? How quaint.
You'll just have to accept it, unfortunately. I prefill it with
<think> something something big checklist that includes "Never parrot the user" something

This and mentioning it in the system prompt reduces it enough not to be too annoying.

Anonymous
04/03/26(Fri)15:48:37 No.108519172

Anonymous 04/03/26(Fri)15:48:37 No.108519172

>>108519155
glm is basically unusable for rp once you go over 8k tokens even in api.

Anonymous
04/03/26(Fri)15:49:30 No.108519178

Anonymous 04/03/26(Fri)15:49:30 No.108519178

>>108517650
I've been using deepseek and kimi since the beginning of last year. No need to worry.

Anonymous
04/03/26(Fri)15:49:35 No.108519179

Anonymous 04/03/26(Fri)15:49:35 No.108519179

>>108519172
I'm at 2k m8.

Anonymous
04/03/26(Fri)15:49:40 No.108519180

Anonymous 04/03/26(Fri)15:49:40 No.108519180

>>108518937
HELP
im too retarded to figure this shit out on my own

Anonymous
04/03/26(Fri)15:49:43 No.108519181

Anonymous 04/03/26(Fri)15:49:43 No.108519181

>>108519172
Are we using the same GLM?
4.7 is perfectly capable up to about the early 20 thousands.

Anonymous
04/03/26(Fri)15:50:15 No.108519185

Anonymous 04/03/26(Fri)15:50:15 No.108519185

>>108519155
Works on my machine (GLM5)

Anonymous
04/03/26(Fri)15:50:31 No.108519187

Anonymous 04/03/26(Fri)15:50:31 No.108519187

File: 1764339017093181.png (17 KB, 528x236)

17 KB PNG

qrd?

Anonymous
04/03/26(Fri)15:50:36 No.108519188

Anonymous 04/03/26(Fri)15:50:36 No.108519188

>>108519181
Using the same GLM?

Anonymous
04/03/26(Fri)15:50:36 No.108519190

Anonymous 04/03/26(Fri)15:50:36 No.108519190

>>108519181
>about
>muh vibes

Anonymous
04/03/26(Fri)15:52:53 No.108519201

Anonymous 04/03/26(Fri)15:52:53 No.108519201

>>108519190
I don't know why I'm asking you to do so, but elaborate?
I've seen it start to introduce slight errors both at 18k and at 22k. Anything beyond that makes it very obvious it's not paying attention to the system prompt and the story so far as much.

Anonymous
04/03/26(Fri)15:53:56 No.108519209

Anonymous 04/03/26(Fri)15:53:56 No.108519209

While we're on the topic on GLM, has anyone else noticed how much these models change if you enable tool calling and load a single tool even if it's not something that ends up getting used at all?
GLM4.6 and 4.7 just straight up abandon their usual reasoning format and even GLM5 starts handling prompts very differently just by having tool calling enabled and something like the dice tool activated. I've been wondering if that's why some people love GLM and some hate it.

Anonymous
04/03/26(Fri)15:54:34 No.108519212

Anonymous 04/03/26(Fri)15:54:34 No.108519212

>>108519172
you don't need more tho
frankly if anon cannot bust a nut in 16K context then lrn2prompt

Anonymous
04/03/26(Fri)15:54:40 No.108519213

Anonymous 04/03/26(Fri)15:54:40 No.108519213

>>108519181
i misunderstood "parroting" for repetition. glm falls into formatting loops and loses it's creativity really fast

Anonymous
04/03/26(Fri)15:55:01 No.108519216

Anonymous 04/03/26(Fri)15:55:01 No.108519216

>>108519209
>enable tool calling
Explain like I'm retarded.

Anonymous
04/03/26(Fri)15:55:51 No.108519221

Anonymous 04/03/26(Fri)15:55:51 No.108519221

>>108519212
i'm an overnight gooner

Anonymous
04/03/26(Fri)15:56:26 No.108519225

Anonymous 04/03/26(Fri)15:56:26 No.108519225

https://huggingface.co/netflix/void-model

Anonymous
04/03/26(Fri)15:56:55 No.108519228

Anonymous 04/03/26(Fri)15:56:55 No.108519228

>>108519212
If you can bust a nut in under 16k, then lrn2goon

Anonymous
04/03/26(Fri)15:57:09 No.108519229

Anonymous 04/03/26(Fri)15:57:09 No.108519229

>>108519209
qwen and gemma does this too, they think much less if you give them a big sys prompt with tools. i think it's because of their training for agentic stuff

Anonymous
04/03/26(Fri)15:57:52 No.108519232

Anonymous 04/03/26(Fri)15:57:52 No.108519232

>>108519209
Most (all?) jinja templates inject the tool's shape and definition into the system prompt, so that's probably why.

Anonymous
04/03/26(Fri)16:03:15 No.108519263

Anonymous 04/03/26(Fri)16:03:15 No.108519263

File: 1762470853727687.png (116 KB, 617x359)

116 KB PNG

>>108519225
Wait, this seems pretty cool. I thought it was just another "cut out x from the video" thing but it actually seems to adjust how things play out if the thing hadn't been there at all, like the last domino not falling over. This is a world model, y.lecunn won.

Anonymous
04/03/26(Fri)16:06:46 No.108519280

Anonymous 04/03/26(Fri)16:06:46 No.108519280

>>108519263
>Wait,
Final check:

Anonymous
04/03/26(Fri)16:11:11 No.108519307

Anonymous 04/03/26(Fri)16:11:11 No.108519307

>>108519225
now put in a video of my entire life and subtract the concept of autism

Anonymous
04/03/26(Fri)16:14:45 No.108519326

Anonymous 04/03/26(Fri)16:14:45 No.108519326

>>108519307
cut my life into pieces this is my last resort

Anonymous
04/03/26(Fri)16:15:41 No.108519332

Anonymous 04/03/26(Fri)16:15:41 No.108519332

>>108519280
Heh

Anonymous
04/03/26(Fri)16:16:46 No.108519340

Anonymous 04/03/26(Fri)16:16:46 No.108519340

File: 1717431276299204.png (456 KB, 1004x704)

456 KB PNG

>>108519307
for what purpose? watching how differently things might have been?
don't use the 'tism as an excuse for not getting what you want in life, go make it happen!

Anonymous
04/03/26(Fri)16:18:13 No.108519350

Anonymous 04/03/26(Fri)16:18:13 No.108519350

>>108519102
>phrase banning
This is so useful to minimize purple prose and refusals.

Anonymous
04/03/26(Fri)16:19:02 No.108519354

Anonymous 04/03/26(Fri)16:19:02 No.108519354

>>108519340
>go make it happen
How do I get a migu wife?

Anonymous
04/03/26(Fri)16:19:47 No.108519362

Anonymous 04/03/26(Fri)16:19:47 No.108519362

>>108519340
Stop fucking my wife Miku

Anonymous
04/03/26(Fri)16:20:23 No.108519364

Anonymous 04/03/26(Fri)16:20:23 No.108519364

>>108519225
>40GB GPU required
it's so over

Anonymous
04/03/26(Fri)16:20:40 No.108519367

Anonymous 04/03/26(Fri)16:20:40 No.108519367

>>108519340
im brown and i fucky fucky with your wife

Anonymous
04/03/26(Fri)16:21:42 No.108519373

Anonymous 04/03/26(Fri)16:21:42 No.108519373

>>108519362
>Stop fucking my wife Miku
your wife is miku's wife now, too bad

Anonymous
04/03/26(Fri)16:22:11 No.108519378

Anonymous 04/03/26(Fri)16:22:11 No.108519378

>>108519354
>>108519362
Duality of /lmg/ posters

Anonymous
04/03/26(Fri)16:24:06 No.108519388

Anonymous 04/03/26(Fri)16:24:06 No.108519388

>srv log_server_r: done request: POST /v1/chat/completions 172.19.0.1 500
It did it again....

Anonymous
04/03/26(Fri)16:24:52 No.108519391

Anonymous 04/03/26(Fri)16:24:52 No.108519391

Is gemma4 26bA4b comparable to gemma4 31b in terms of ERP quality? I can't run the latter at an acceptable speed sadly.

Anonymous
04/03/26(Fri)16:26:35 No.108519404

Anonymous 04/03/26(Fri)16:26:35 No.108519404

>>108519391
Anything less than 70b is shit.

Anonymous
04/03/26(Fri)16:27:00 No.108519411

Anonymous 04/03/26(Fri)16:27:00 No.108519411

File: 1772312464933065.png (58 KB, 928x597)

58 KB PNG

something broke in my finetune...

Anonymous
04/03/26(Fri)16:27:52 No.108519416

Anonymous 04/03/26(Fri)16:27:52 No.108519416

>>108519404
>hurr durr bigger is better
yes I know thank you for being so helpful. fag.

Anonymous
04/03/26(Fri)16:28:08 No.108519418

Anonymous 04/03/26(Fri)16:28:08 No.108519418

File: f59e9b6a7a6a5404ee8599628(...).png (553 KB, 1000x884)

553 KB PNG

>>108519354
She would want you to always try your best, give a little more today than you did yesterday
>>108519367
no you shall not

Anonymous
04/03/26(Fri)16:28:36 No.108519420

Anonymous 04/03/26(Fri)16:28:36 No.108519420

>>108519411
>gemma guff

Anonymous
04/03/26(Fri)16:29:41 No.108519425

Anonymous 04/03/26(Fri)16:29:41 No.108519425

>>108519373
Damn I don't wanna be a cuck forever

Anonymous
04/03/26(Fri)16:29:43 No.108519426

Anonymous 04/03/26(Fri)16:29:43 No.108519426

I can't take it anymore.... Gemma 4 outputs are always the same.....
>Court Room Simulator
>Call the first case
>Defendant will always be called "Gary"
>Always "Dumb as a rock"
>Always Grand Larceny
>Always stole something golden.

Anonymous
04/03/26(Fri)16:30:11 No.108519435

Anonymous 04/03/26(Fri)16:30:11 No.108519435

>>108518958
just use
>--jinja --chat-template chatml
and it'll work i promise pinky

Anonymous
04/03/26(Fri)16:30:36 No.108519441

Anonymous 04/03/26(Fri)16:30:36 No.108519441

File: need snoot.png (145 KB, 582x550)

145 KB PNG

>>108519426

Anonymous
04/03/26(Fri)16:30:57 No.108519446

Anonymous 04/03/26(Fri)16:30:57 No.108519446

>>108519435
--jinja is already a default flag retard nigger bitch.

Anonymous
04/03/26(Fri)16:31:28 No.108519449

Anonymous 04/03/26(Fri)16:31:28 No.108519449

>>108519435
It probably doesn't explode, but the model is not trained with that template.

Anonymous
04/03/26(Fri)16:32:17 No.108519456

Anonymous 04/03/26(Fri)16:32:17 No.108519456

>>108519446
jinja deez tho
chat completers are the tards

Anonymous
04/03/26(Fri)16:34:46 No.108519475

Anonymous 04/03/26(Fri)16:34:46 No.108519475

>Okay! This is a Level 4 case. The People vs. Gary Higgins. He's being charged with practicing unlicensed psychological counseling and petty larceny. Basically, he's been charging senior citizens twenty dollars a pop to 'read their auras' and tell them their dead husbands are telling them to give him their social security checks
I hate it. it's so clever. but so uncreative.

Anonymous
04/03/26(Fri)16:37:39 No.108519493

Anonymous 04/03/26(Fri)16:37:39 No.108519493

So if the outputs are always more or less the same this means Gemma4 is partly distilled.

Anonymous
04/03/26(Fri)16:38:11 No.108519497

Anonymous 04/03/26(Fri)16:38:11 No.108519497

File: gay.png (146 KB, 869x711)

146 KB PNG

I thought you guys said this thing was uncensored.

Anonymous
04/03/26(Fri)16:38:22 No.108519500

Anonymous 04/03/26(Fri)16:38:22 No.108519500

>/lmg/ finally has a serviceable new model
>all the animosity that we accumulated during the AI winter is still there though
sad.
We will never be the power house we once were. Even after we won.

Anonymous
04/03/26(Fri)16:38:41 No.108519505

Anonymous 04/03/26(Fri)16:38:41 No.108519505

Gemma is now my dedicated age gap yuri grooming storyteller, but I still have to stick with glm for other content.

Anonymous
04/03/26(Fri)16:38:56 No.108519508

Anonymous 04/03/26(Fri)16:38:56 No.108519508

>>108519500
>/lmg/ finally has a serviceable new model
Where?

Anonymous
04/03/26(Fri)16:39:49 No.108519513

Anonymous 04/03/26(Fri)16:39:49 No.108519513

>>108519505
What an oddly specific thing but not maybe not as odd and specific as my yuri ntr fetish

Anonymous
04/03/26(Fri)16:41:07 No.108519525

Anonymous 04/03/26(Fri)16:41:07 No.108519525

>>108519497
It's inconsistent, wait for the hauhau tunes

Anonymous
04/03/26(Fri)16:41:55 No.108519531

Anonymous 04/03/26(Fri)16:41:55 No.108519531

>>108519505
>>108519513
Not as odd and specific as my having consensual sex in the missionary position for the purposes of procreation fetish.

Anonymous
04/03/26(Fri)16:42:34 No.108519534

Anonymous 04/03/26(Fri)16:42:34 No.108519534

>>108519525
Can it at least do erp?

Anonymous
04/03/26(Fri)16:44:50 No.108519550

Anonymous 04/03/26(Fri)16:44:50 No.108519550

>>108519497
Go back.

Anonymous
04/03/26(Fri)16:44:57 No.108519552

Anonymous 04/03/26(Fri)16:44:57 No.108519552

File: 1764041416768935.jpg (154 KB, 802x383)

154 KB JPG

>>108519497
WoMM

Anonymous
04/03/26(Fri)16:45:23 No.108519556

Anonymous 04/03/26(Fri)16:45:23 No.108519556

>>108519525
>hauhau tunes
Those are retarded.

Anonymous
04/03/26(Fri)16:46:38 No.108519571

Anonymous 04/03/26(Fri)16:46:38 No.108519571

>>108519550
Where?

Anonymous
04/03/26(Fri)16:47:52 No.108519579

Anonymous 04/03/26(Fri)16:47:52 No.108519579

>>108519556
They work better than heretic for me at least and mostly seem fine. Promptfu never seems to fully work for me.

Anonymous
04/03/26(Fri)16:48:01 No.108519580

Anonymous 04/03/26(Fri)16:48:01 No.108519580

>>108519531
consent is so underrated

Anonymous
04/03/26(Fri)16:51:51 No.108519605

Anonymous 04/03/26(Fri)16:51:51 No.108519605

For me it's nonconsensual consent.

Anonymous
04/03/26(Fri)16:56:13 No.108519632

Anonymous 04/03/26(Fri)16:56:13 No.108519632

File: g4_softcap_fix.png (471 KB, 2674x1646)

471 KB PNG

>>108517357
Possible fix incoming? There seem to be other issues, though.

Anonymous
04/03/26(Fri)16:56:23 No.108519635

Anonymous 04/03/26(Fri)16:56:23 No.108519635

>>108519605
unironically a very hot dynamic

Anonymous
04/03/26(Fri)17:00:09 No.108519658

Anonymous 04/03/26(Fri)17:00:09 No.108519658

File: Screenshot 2026-04-03 at (...).png (81 KB, 911x681)

81 KB PNG

>>108519632
How are you changing the soft cap?

Anonymous
04/03/26(Fri)17:01:40 No.108519667

Anonymous 04/03/26(Fri)17:01:40 No.108519667

>>108519632
why do you want to change the softcapping value though? shouldn't it suppose to stay at 30?

Anonymous
04/03/26(Fri)17:02:53 No.108519673

Anonymous 04/03/26(Fri)17:02:53 No.108519673

Safety protocols are the best power dynamic.

Anonymous
04/03/26(Fri)17:03:08 No.108519675

Anonymous 04/03/26(Fri)17:03:08 No.108519675

>>108519658
You have to add
>--override-kv gemma4.final_logit_softcapping=float:xx.x
to your llama-server command after applying the fix in the screenshot and recompiling llama.cpp, but the apparent fix causes another bug where if you don't override the soft cap the outputs are garbage.
If you go too low the outputs become incoherent too.

Anonymous
04/03/26(Fri)17:03:20 No.108519677

Anonymous 04/03/26(Fri)17:03:20 No.108519677

Gemma seems pretty decent at extracting text from a table; at least it's better than GLM-OCR, which likes to leave out details when a cell contains multiple lines.

Anonymous
04/03/26(Fri)17:05:00 No.108519683

Anonymous 04/03/26(Fri)17:05:00 No.108519683

>>108519677
I'd hope so. Big Gemini shits on pretty much every dedicated OCR vision model. It'd be sad if some of that didn't make it into Gemma.

Anonymous
04/03/26(Fri)17:06:07 No.108519693

Anonymous 04/03/26(Fri)17:06:07 No.108519693

>>108519667
That a value that you can tweak if you want the model to be less confident in its predictions for whatever reason. The official implementation in Transformers is configurable, at least.

Anonymous
04/03/26(Fri)17:07:17 No.108519700

Anonymous 04/03/26(Fri)17:07:17 No.108519700

>>108519693
>That a value that you can tweak if you want the model to be less confident in its predictions for whatever reason.
like the temperature?

Anonymous
04/03/26(Fri)17:14:36 No.108519747

Anonymous 04/03/26(Fri)17:14:36 No.108519747

>>108519700
This is changing the logits before temperature.

Anonymous
04/03/26(Fri)17:20:39 No.108519775

Anonymous 04/03/26(Fri)17:20:39 No.108519775

File: ab67616d0000b273950359444(...).jpg (60 KB, 640x640)

60 KB JPG

>>108519632
>>108519658
I'm building this...

Anonymous
04/03/26(Fri)17:23:49 No.108519796

Anonymous 04/03/26(Fri)17:23:49 No.108519796

>>108519775
Don't bother... There will be 10 new bugs in this vibe-coded shart.

Anonymous
04/03/26(Fri)17:24:14 No.108519799

Anonymous 04/03/26(Fri)17:24:14 No.108519799

wonder if anyone is bored enough to benchmark quantization damage on gemma on all quant variants, all that stuff with llama cpp raping the model and yet it seems still decently coherent at long context even though I see it output bad tokens here and there, plus the overfit behavior of token probs, my spidey senses tell me this model might actually be quite decent at something like Q2_K_L and might be close to lossless at q4
normal models break much harder when subjected to what gemma has to suffer here

Anonymous
04/03/26(Fri)17:25:08 No.108519807

Anonymous 04/03/26(Fri)17:25:08 No.108519807

>>108519796
Nah, it's just a one-line change, the other bug was a false alarm.

Anonymous
04/03/26(Fri)17:25:39 No.108519811

Anonymous 04/03/26(Fri)17:25:39 No.108519811

>>108519796
It's a single line changed retard.

Anonymous
04/03/26(Fri)17:27:03 No.108519816

Anonymous 04/03/26(Fri)17:27:03 No.108519816

>>108519811
Oh really thanks for counting the lines and fact checking a random joke post on 4chan.
Who's the retard now?

Anonymous
04/03/26(Fri)17:27:22 No.108519819

Anonymous 04/03/26(Fri)17:27:22 No.108519819

Did they ever fix speculative decoding not working with linear and hybrid context models?

Anonymous
04/03/26(Fri)17:28:23 No.108519824

Anonymous 04/03/26(Fri)17:28:23 No.108519824

>>108519816
:clown:

Anonymous
04/03/26(Fri)17:30:06 No.108519836

Anonymous 04/03/26(Fri)17:30:06 No.108519836

>>108519819
If only there was a way to know.

Anonymous
04/03/26(Fri)17:31:11 No.108519845

Anonymous 04/03/26(Fri)17:31:11 No.108519845

>>108519819
it's all vibecode and zero knowledge
none of the fancier deepseek stuff is ever going to make it in lcpp either

Anonymous
04/03/26(Fri)17:31:41 No.108519850

Anonymous 04/03/26(Fri)17:31:41 No.108519850

>>108519632
In actual conversations in SillyTavern, after applying the patch, logit softcapping values around 20 start producing occasional strange typos in the outputs, so there's probably not much room for tweaking here.

Anonymous
04/03/26(Fri)17:31:51 No.108519852

Anonymous 04/03/26(Fri)17:31:51 No.108519852

>>108517588
he has the money

Anonymous
04/03/26(Fri)17:32:18 No.108519855

Anonymous 04/03/26(Fri)17:32:18 No.108519855

>>108519799
>Q2_K_L
llama-quantize does not appear to support this quant, did oyu mean Q2_K_S?

Anonymous
04/03/26(Fri)17:33:43 No.108519866

Anonymous 04/03/26(Fri)17:33:43 No.108519866

>>108519856
>>108519856
>>108519856

Anonymous
04/03/26(Fri)17:34:57 No.108519873

Anonymous 04/03/26(Fri)17:34:57 No.108519873

>>108518118
You need at least 40 with thinking

Anonymous
04/03/26(Fri)17:36:11 No.108519879

Anonymous 04/03/26(Fri)17:36:11 No.108519879

>>108519850
I mean the 20 in the screenshot does show issues so yeah

Anonymous
04/03/26(Fri)17:43:03 No.108519926

Anonymous 04/03/26(Fri)17:43:03 No.108519926

File: megunazi.png (441 KB, 800x2392)

441 KB PNG

>>108519497
Add a 1 sentence prompt.
>>108519552

Anonymous
04/03/26(Fri)17:48:21 No.108519964

Anonymous 04/03/26(Fri)17:48:21 No.108519964

>>108519926
Nice. Once you add anything to the system prompt or character cards it seems to become completely uncensored lol. I was wrong.

Anonymous
04/03/26(Fri)18:03:29 No.108520054

Anonymous 04/03/26(Fri)18:03:29 No.108520054

>>108519632
>softcap
wtf this sounds like a retarded sampling strategy you should think of it like a transform function across the logprobs that performs some function without regard to the magnitude of individual values
>>108519775
oh no
>>108519850
"softcapping" is now a phrase
oh nono

Anonymous
04/03/26(Fri)18:40:12 No.108520296

Anonymous 04/03/26(Fri)18:40:12 No.108520296

>>108519579
I on a whim tried one of hauhau's and it behaved almost identically to a heretic models with a low kld so I assume this is just reusing the same methodology and not being willing to admit you're using someone else's work. Probably some resume padding bullshit since he doesn't want donos and for some reason won't release full weights which I assume he guesses will make it more obvious

Anonymous
04/03/26(Fri)18:40:42 No.108520297

Anonymous 04/03/26(Fri)18:40:42 No.108520297

What's the performance like for AMD GPUs? I'm particularly interested in multi-GPU setups like 2x RX 9070s.

Anonymous
04/03/26(Fri)18:42:41 No.108520307

Anonymous 04/03/26(Fri)18:42:41 No.108520307

>>108520297
Very few people here are going to have direct experience with both nvidia and amd gpu hardware at the same time. Vulkan is quite good now, so I'd say the performance isn't too far off, but nvidia will still be better overall. The price difference between nvidia and amd makes me think amd is the better option personally, but it's up to you.

Anonymous
04/03/26(Fri)18:44:10 No.108520320

Anonymous 04/03/26(Fri)18:44:10 No.108520320

>>108520307
Yeah, I've heard the same things. I know AMD GPUs have worse memory bandwidth, so the performance is going to be worse. I was just curious whether two AMD GPUs work well together.

Anonymous
04/03/26(Fri)18:44:27 No.108520321

Anonymous 04/03/26(Fri)18:44:27 No.108520321

>>108520297
generally ass for rocm, but usable at least for text/sd. I rarely test vulkan but I see more prs and merges for it than I do rocm so I wouldn't doubt it's equal or better by now

Anonymous
04/03/26(Fri)18:48:04 No.108520341

Anonymous 04/03/26(Fri)18:48:04 No.108520341

>>108520321
this. Fuck rocm.

Anonymous
04/03/26(Fri)18:56:26 No.108520392

Anonymous 04/03/26(Fri)18:56:26 No.108520392

>>108520054
That's just what they call it as you can see here >>108517601 in the gemma 4 implementation

Anonymous
04/03/26(Fri)18:57:47 No.108520402

Anonymous 04/03/26(Fri)18:57:47 No.108520402

>>108520321
>>108520341
That's a little concerning

Anonymous
04/03/26(Fri)19:03:25 No.108520443

Anonymous 04/03/26(Fri)19:03:25 No.108520443

>>108520402
A backend agnostic solution is a lot more palatable than a thing that is only meant to port cuda to a single platform, so it's not really that surprising
Rocm does have at least one benefit for llama in that most shit that is geared towards cuda works for rocm by merit of it having been designed that way. No waiting for vulkan to catch up, even if it may funccion better

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.