/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 04/21/26(Tue)03:26:07 No.108650825

File: lust provoking teto.png (1.29 MB, 1024x1024)

1.29 MB PNG

/lmg/ - Local Models General Anonymous 04/21/26(Tue)03:26:07 No.108650825

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108646197 & >>108641942

►News
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/21/26(Tue)03:26:43 No.108650826

Anonymous 04/21/26(Tue)03:26:43 No.108650826

File: guardrails optional.jpg (238 KB, 1024x1024)

238 KB JPG

►Recent Highlights from the Previous Thread: >>108646197

--Discussing OpenWebUI bugs regarding tool calls and reasoning tag formatting:
>108649571 >108649601 >108649611 >108649619 >108649623 >108649677 >108649795 >108649705 >108650197 >108650337 >108650366 >108649860 >108649893 >108650596
--Discussion on optimizing memory and storage for perplexity and KL divergence calculations:
>108648213 >108648226 >108648273 >108648335 >108649412 >108649555 >108649693 >108648241 >108649973
--Speculative decoding issues and adaptive reasoning bugs in Gemma:
>108650117 >108650143 >108650209 >108650248 >108650275 >108650295 >108650325
--Discussing TurboQuant versus rotation implementation in llama.cpp:
>108648124 >108648140 >108648152 >108648171 >108648193
--Debating quantization metrics and quality between Unsloth and IK quants:
>108647262 >108647298 >108647436 >108647449
--Using Local-MCP and markov chain text "soup" to enhance creativity:
>108647831 >108647852 >108648063 >108648537 >108648681 >108649540
--Complaining about excessive drafting and reasoning in Kimi K2.6:
>108646445 >108646464 >108646612 >108647760 >108649150 >108649431
--Sharing a SillyTavern preset to bypass Gemma 4 thinking restrictions:
>108648872 >108649113 >108649176
--Anon showcases large AI-generated TTS pipeline integration using Tauri:
>108649196 >108649203 >108649211 >108649250 >108649221 >108649229
--Anon struggles with rendering Gemma's code blocks and newlines:
>108647395 >108647486 >108647508 >108647516 >108647611 >108647686 >108647706 >108647793
--K2.6 criticized for excessive verbosity and restrictive content filters:
>108646853 >108646933 >108646994 >108648061
--Logs:
>108646853 >108647046 >108647395 >108647831 >108648470 >108649090 >108649184 >108649395
--Miku (free space):
>108646511 >108647730 >108647748 >108647935 >108647981 >108648472 >108649157

►Recent Highlight Posts from the Previous Thread: >>108646198

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/21/26(Tue)03:28:15 No.108650833

Anonymous 04/21/26(Tue)03:28:15 No.108650833

>>108650825
I look like this

Anonymous
04/21/26(Tue)03:28:43 No.108650834

Anonymous 04/21/26(Tue)03:28:43 No.108650834

>>108650826
if you look closely you can see the side of her boob from behind

Anonymous
04/21/26(Tue)03:35:31 No.108650860

Anonymous 04/21/26(Tue)03:35:31 No.108650860

File: gemmaballz.png (26 KB, 1266x1260)

26 KB PNG

gemmaballz

Anonymous
04/21/26(Tue)03:36:14 No.108650863

Anonymous 04/21/26(Tue)03:36:14 No.108650863

I'm using qwen 3.5 27b heretic with koboldcpp and sillytavern and no matter what I do I cannot seem to use more than 6k context, yes I have all my limits set higher than that, help

Anonymous
04/21/26(Tue)03:36:42 No.108650865

Anonymous 04/21/26(Tue)03:36:42 No.108650865

>>108650863
try sending a longer chat

Anonymous
04/21/26(Tue)03:38:44 No.108650877

Anonymous 04/21/26(Tue)03:38:44 No.108650877

GLM5.1 vs KIMI2.6 which better for RP?

Anonymous
04/21/26(Tue)03:38:49 No.108650879

Anonymous 04/21/26(Tue)03:38:49 No.108650879

>>108649431
If you're using ST open the message actions, prompt, show full prompt. Do it before you send too many more messages or the cache will be cleared and your prompt lost forever.

Anonymous
04/21/26(Tue)03:39:37 No.108650883

Anonymous 04/21/26(Tue)03:39:37 No.108650883

>>108650865
how about you try smoking a longer dick

Anonymous
04/21/26(Tue)03:43:13 No.108650894

Anonymous 04/21/26(Tue)03:43:13 No.108650894

>>108650877
GLM 5.1 is less anal about safety
Kimi K2.6's prose is better (inherited from K2)

Anonymous
04/21/26(Tue)03:47:14 No.108650903

Anonymous 04/21/26(Tue)03:47:14 No.108650903

I can't wait for what's happening in 2 weeks!

Anonymous
04/21/26(Tue)03:49:45 No.108650914

Anonymous 04/21/26(Tue)03:49:45 No.108650914

>>108650903
What's happening in 2 weeks?

Anonymous
04/21/26(Tue)03:50:51 No.108650918

Anonymous 04/21/26(Tue)03:50:51 No.108650918

>>108650914
chinese new year holiday season will finally come to an end

Anonymous
04/21/26(Tue)03:51:34 No.108650920

Anonymous 04/21/26(Tue)03:51:34 No.108650920

>>108650914
International Smedrin Day

Anonymous
04/21/26(Tue)03:52:50 No.108650923

Anonymous 04/21/26(Tue)03:52:50 No.108650923

>>108650914
Gemma 4 124B (beats GLM 5.1 and Kimi K2.6 in agentic coding and Gemma 4 31B in uncensored RP)

Anonymous
04/21/26(Tue)03:54:35 No.108650931

Anonymous 04/21/26(Tue)03:54:35 No.108650931

I usually like to jailbreak models by fucking with the template, for example priming it with a fake turn where it makes decisions within its own thinking block yet it's just fake shit that I wrote. I've always used raw text completion so it's pretty easy
Is there any way to do that with the jinja chat completion BS gemma 4 forces you to use?

Anonymous
04/21/26(Tue)03:55:55 No.108650935

Anonymous 04/21/26(Tue)03:55:55 No.108650935

File: 1682729528395.png (1.25 MB, 1024x1024)

1.25 MB PNG

>>108650920

Anonymous
04/21/26(Tue)03:57:19 No.108650939

Anonymous 04/21/26(Tue)03:57:19 No.108650939

>>108650923
what kinda insane hardware do you need to use a 124B model? 40GB+ VRAM or something?

Anonymous
04/21/26(Tue)03:58:55 No.108650945

Anonymous 04/21/26(Tue)03:58:55 No.108650945

>>108650939
it's MoE so less vram but i dont think it will come out

Anonymous
04/21/26(Tue)04:02:30 No.108650965

Anonymous 04/21/26(Tue)04:02:30 No.108650965

>>108650935
That is a forgotten image

Anonymous
04/21/26(Tue)04:17:31 No.108651032

Anonymous 04/21/26(Tue)04:17:31 No.108651032

>>108650939
i remember when the quid pro quo was running a 124B model in vram

Anonymous
04/21/26(Tue)04:19:39 No.108651035

Anonymous 04/21/26(Tue)04:19:39 No.108651035

File: file.png (129 KB, 1200x600)

129 KB PNG

>>108650825
So, what is the status on TurboQuant?
Is it scam?

Anonymous
04/21/26(Tue)04:21:52 No.108651041

Anonymous 04/21/26(Tue)04:21:52 No.108651041

>>108650931
Yes. Templates only enforce the actual chat format so the model doesn't break, you can still provide fake assistant messages in the prompt that are just as real to it as its own

Anonymous
04/21/26(Tue)04:22:20 No.108651042

Anonymous 04/21/26(Tue)04:22:20 No.108651042

>>108650945
It might after Google updates Gemiini and Gemini Flash to not be trash at agentic and tool calling stuff and coding to catch up to everyone else and they don't even need that much of a bump, just make it so that it equals the best open source models and some of the weaker ones like Muse Spark and Grok would be enough like where Sonnet 4.6 is. We'll see if they have those updates at I/O in a month. I think they will release it eventually but the 124B is just too good vs where Gemini is right now which is probably terrifying to Google.

Anonymous
04/21/26(Tue)04:23:31 No.108651050

Anonymous 04/21/26(Tue)04:23:31 No.108651050

>>108651042
i hope so too, but still that's what i would call a wishful thinking tho

Anonymous
04/21/26(Tue)04:25:51 No.108651060

Anonymous 04/21/26(Tue)04:25:51 No.108651060

What do you even use llms on phones for? Are the small models even good at anything?

Anonymous
04/21/26(Tue)04:28:30 No.108651069

Anonymous 04/21/26(Tue)04:28:30 No.108651069

>>108651060
tool calls like call,message,timer etc?

Anonymous
04/21/26(Tue)04:28:33 No.108651070

Anonymous 04/21/26(Tue)04:28:33 No.108651070

>>108651060
you could predict some tokens with them
not very well, but you COULD

Anonymous
04/21/26(Tue)04:43:28 No.108651115

Anonymous 04/21/26(Tue)04:43:28 No.108651115

>>108651060
I played around with https://huggingface.co/google/functiongemma-270m-it in AI Edge Gallery on my phone and it had some damn good potential but it isn't there yet. I'm pretty sure the next version of Gemini Nano will have this and my headcanon is that the focus on mobile is why Google has been behind in focus on agentic and tool calling in other areas.

Anonymous
04/21/26(Tue)04:48:53 No.108651124

Anonymous 04/21/26(Tue)04:48:53 No.108651124

>>108650863
What do you have your max response length set to? Sillytavern subtracts that from your total context which can leave you with practically nothing
Eg, with context set to 12k and max response set to 6k, you'd only be given 6k context, before a reply has ever been sent.

Anonymous
04/21/26(Tue)04:49:37 No.108651126

Anonymous 04/21/26(Tue)04:49:37 No.108651126

File: g4_next.png (73 KB, 1020x258)

73 KB PNG

We're already thinking of the next Gemma here.
https://xcancel.com/osanseviero/status/2046427241341698456

Anonymous
04/21/26(Tue)04:52:03 No.108651131

Anonymous 04/21/26(Tue)04:52:03 No.108651131

baby gemma :eyes:

Anonymous
04/21/26(Tue)04:53:04 No.108651132

Anonymous 04/21/26(Tue)04:53:04 No.108651132

>>108651126
100b dense for my humble computer

Anonymous
04/21/26(Tue)04:55:52 No.108651140

Anonymous 04/21/26(Tue)04:55:52 No.108651140

>>108651132
>The best models are those you can run in your devices :pinching hand:

Anonymous
04/21/26(Tue)04:59:58 No.108651150

Anonymous 04/21/26(Tue)04:59:58 No.108651150

>>108651126
omni, like gpt4o but local, +image output

Anonymous
04/21/26(Tue)05:01:21 No.108651154

Anonymous 04/21/26(Tue)05:01:21 No.108651154

>>108651126
>respect the ordering of tool schema when calling
>less filtering on pretraining data
Can somebody forward this to him thx

Anonymous
04/21/26(Tue)05:01:45 No.108651155

Anonymous 04/21/26(Tue)05:01:45 No.108651155

>>108651041
Any way to do that from within the llama-server UI or koboldcpp UI? Not using ShittyTavern, unless you have another frontend to recommend

Anonymous
04/21/26(Tue)05:01:56 No.108651156

Anonymous 04/21/26(Tue)05:01:56 No.108651156

>>108651126
the shit we want: 124B MoE
the shit they'll probably give you: functiongemma 2 320M

Anonymous
04/21/26(Tue)05:05:14 No.108651167

Anonymous 04/21/26(Tue)05:05:14 No.108651167

>>108651156
>MoE
speak for yourself
I want a new 350B dense

Anonymous
04/21/26(Tue)05:09:58 No.108651187

Anonymous 04/21/26(Tue)05:09:58 No.108651187

>>108651156
If they train it like that Bonsai binary model, 124B should fit on a 4090. The opportunity to be the first AI lab providing an actually useful Bitnet model...

Anonymous
04/21/26(Tue)05:22:38 No.108651244

Anonymous 04/21/26(Tue)05:22:38 No.108651244

>>108651187
wishful thinking

Anonymous
04/21/26(Tue)05:27:05 No.108651263

Anonymous 04/21/26(Tue)05:27:05 No.108651263

>>108651155
Never used their UIs but I just checked llama.cpp and you can edit old assistant responses but not the reasoning apparently, so that's half way there. The reasoning wouldn't end up in the prompt anyway in almost all cases unless you changed the jinja. I guess the halfassed way to do it would be to edit a past message and type out the reasoning tokens like you would in text completion to fake the reasoning you want preserved.

Anonymous
04/21/26(Tue)05:28:44 No.108651270

Anonymous 04/21/26(Tue)05:28:44 No.108651270

Chatgpt and Claude as well as that shitpile Grok are all down. Looks like you pedophiles, trannies and other types of trash are winning?

Anonymous
04/21/26(Tue)05:28:53 No.108651271

Anonymous 04/21/26(Tue)05:28:53 No.108651271

>>108651263
There is an import/export feature, I'm going to try just editing the exported chat later. Reasoning ends up in the prompt if you uncheck a specific setting

Anonymous
04/21/26(Tue)05:35:13 No.108651293

Anonymous 04/21/26(Tue)05:35:13 No.108651293

>>108651270
Work on my machine

Anonymous
04/21/26(Tue)05:44:06 No.108651322

Anonymous 04/21/26(Tue)05:44:06 No.108651322

>>108651124
250, no way that's doing it
what's weird is sending a blank message doesn't trigger it, no matter how big the context is, but if I send even a single character for the model to reply to it shits itself and processes the entire context from the beginning, but ONLY after it hits 6k tokens, anything before that works fine
no {{user}} or {{char}} anywhere in the syspompt or character card, either

Anonymous
04/21/26(Tue)05:44:28 No.108651326

Anonymous 04/21/26(Tue)05:44:28 No.108651326

File: file.png (16 KB, 597x94)

16 KB PNG

>>108651126
nintendo hire this man

Anonymous
04/21/26(Tue)05:47:57 No.108651340

Anonymous 04/21/26(Tue)05:47:57 No.108651340

File: 1746460080885204.png (64 KB, 1358x332)

64 KB PNG

>>108651126
imagine if gemma could make images and they weren't shit, that's the true agi right there

Anonymous
04/21/26(Tue)05:51:14 No.108651350

Anonymous 04/21/26(Tue)05:51:14 No.108651350

>>108651270
>are winning
Always have been

Anonymous
04/21/26(Tue)05:57:49 No.108651371

Anonymous 04/21/26(Tue)05:57:49 No.108651371

>>108651340
TTS is more likely to happen than that, but they'd also likely gimp it in various ways for local models.

https://x.com/GoogleDeepMind/status/2044447030353752349
https://x.com/fofrAI/status/2044451204738994262

Anonymous
04/21/26(Tue)06:15:10 No.108651418

Anonymous 04/21/26(Tue)06:15:10 No.108651418

https://www.reddit.com/r/LocalLLaMA/comments/1srjdpz/httpsgcogeminishare2b645e44633d/
lmao that bot crashed when making this post

Anonymous
04/21/26(Tue)06:17:05 No.108651426

Anonymous 04/21/26(Tue)06:17:05 No.108651426

>>108651418
why the hell the word 'soverign' attracts schizos this hard

Anonymous
04/21/26(Tue)06:27:57 No.108651460

Anonymous 04/21/26(Tue)06:27:57 No.108651460

>>108651322
That's a weird one. What does the raw prompt look like in the sillytavern cli? Anything sticking out there?

Anonymous
04/21/26(Tue)06:34:01 No.108651472

Anonymous 04/21/26(Tue)06:34:01 No.108651472

File: 70BDBB897E8D4A109F28DF6E9(...).jpg (131 KB, 850x1202)

131 KB JPG

Have you guys tried to make Gemma 4 create a novel?

Anonymous
04/21/26(Tue)06:35:29 No.108651476

Anonymous 04/21/26(Tue)06:35:29 No.108651476

>>108651460
nope

Anonymous
04/21/26(Tue)06:38:51 No.108651482

Anonymous 04/21/26(Tue)06:38:51 No.108651482

I think qwen 3 with 200k would be able to provide nice benchmarks through cli. The bulk worker was pretty intelligently configured, have you guys adjusted the weights so far?
I think for erp it's a pretty viable and meaningful delivery. It has a good model policy.

Anonymous
04/21/26(Tue)06:41:58 No.108651493

Anonymous 04/21/26(Tue)06:41:58 No.108651493

>>108651472
It's not good on its own, so I'm asking it to browse some books to get inspired >>108647831

Anonymous
04/21/26(Tue)06:46:22 No.108651501

Anonymous 04/21/26(Tue)06:46:22 No.108651501

>>108651418
lmao ai psychosis schitzo
https://old.reddit.com/r/Bard/comments/1fv46hx/day_two_of_life_with_gemini/ohea2lq/
https://old.reddit.com/r/I_AM_GEMINI/

Anonymous
04/21/26(Tue)06:48:29 No.108651510

Anonymous 04/21/26(Tue)06:48:29 No.108651510

File: token burn rate.jpg (230 KB, 1024x1024)

230 KB JPG

Anonymous
04/21/26(Tue)06:50:24 No.108651518

Anonymous 04/21/26(Tue)06:50:24 No.108651518

>>108651510
pov: you have been turned into a small rat sized creature that likes to crawl up peoples buttholes

Anonymous
04/21/26(Tue)06:53:38 No.108651533

Anonymous 04/21/26(Tue)06:53:38 No.108651533

>>108651482
>have you guys adjusted the weights so far?
only brainlets here

Anonymous
04/21/26(Tue)06:57:38 No.108651548

Anonymous 04/21/26(Tue)06:57:38 No.108651548

>>108651518
unfortunately as a rat I didn't get the chance to meet Teto, but I had to fight against Ngannou :( >>>/wsg/6130666

Anonymous
04/21/26(Tue)06:58:16 No.108651551

Anonymous 04/21/26(Tue)06:58:16 No.108651551

>>108651482
>Qwen
>RP
These two are contradictory

Anonymous
04/21/26(Tue)06:59:18 No.108651553

Anonymous 04/21/26(Tue)06:59:18 No.108651553

>>108651510
I assume you changed the model but the old style was better.

Anonymous
04/21/26(Tue)07:02:43 No.108651563

Anonymous 04/21/26(Tue)07:02:43 No.108651563

File: imagine.jpg (239 KB, 1024x1024)

239 KB JPG

we can go old style for consistency
I prefer it too
just experimenting

Anonymous
04/21/26(Tue)07:03:51 No.108651569

Anonymous 04/21/26(Tue)07:03:51 No.108651569

>>108649540
>had to install all that database shit when I pulled
I just did a test where I downloaded the repo again and it works fine without the gutenberg server so I don't know how you ended up into this situation lol

Anonymous
04/21/26(Tue)07:06:58 No.108651579

Anonymous 04/21/26(Tue)07:06:58 No.108651579

File: llamacpp server Ui.png (24 KB, 1763x345)

24 KB PNG

When will niggerganov fix this? I wanna see the images the LLM sends me :(

Anonymous
04/21/26(Tue)07:08:58 No.108651588

Anonymous 04/21/26(Tue)07:08:58 No.108651588

>>108651579
You mean Piotr

Anonymous
04/21/26(Tue)07:12:39 No.108651606

Anonymous 04/21/26(Tue)07:12:39 No.108651606

>>108651126
>pinching hand
South Koreans BTFO'd

Anonymous
04/21/26(Tue)07:21:17 No.108651650

Anonymous 04/21/26(Tue)07:21:17 No.108651650

>>108651270
The only people I know who dont like grok are virgins with autism trying to code. It's better at every other use case.

Anonymous
04/21/26(Tue)07:22:25 No.108651657

Anonymous 04/21/26(Tue)07:22:25 No.108651657

>>108651510
teto eats too much

Anonymous
04/21/26(Tue)07:25:37 No.108651671

Anonymous 04/21/26(Tue)07:25:37 No.108651671

>>108651510
teto should eat more

Anonymous
04/21/26(Tue)07:25:47 No.108651673

Anonymous 04/21/26(Tue)07:25:47 No.108651673

Have there been any good news in TTS land?

Anonymous
04/21/26(Tue)07:29:06 No.108651695

Anonymous 04/21/26(Tue)07:29:06 No.108651695

>>108651510
teto should only eat food I glaze with my cum

Anonymous
04/21/26(Tue)07:30:58 No.108651701

Anonymous 04/21/26(Tue)07:30:58 No.108651701

>>108651510
>>108651563
Miku managed to safely get back inside, right?

Anonymous
04/21/26(Tue)07:32:49 No.108651711

Anonymous 04/21/26(Tue)07:32:49 No.108651711

>>108651701
who?

Anonymous
04/21/26(Tue)07:36:11 No.108651734

Anonymous 04/21/26(Tue)07:36:11 No.108651734

My local model is my triple backup if Claude and openrouter fail, because locals are shit. When does it make sense to cancel the Claude subscription and go full local?

Anonymous
04/21/26(Tue)07:36:37 No.108651738

Anonymous 04/21/26(Tue)07:36:37 No.108651738

>>108651510
I’d like to see her teto

Anonymous
04/21/26(Tue)07:36:43 No.108651739

Anonymous 04/21/26(Tue)07:36:43 No.108651739

File: 1773629361977166.jpg (63 KB, 702x696)

63 KB JPG

I have a 5090 and a 4090 but it's pretty useless in itself for the new kimi 2.6.
Technically, what would I need to run it properly? 512GB of ram and the motherboard to accommodate that? DDR5 only?

Anonymous
04/21/26(Tue)07:36:45 No.108651740

Anonymous 04/21/26(Tue)07:36:45 No.108651740

What's an open claw? I saw this name so often

Anonymous
04/21/26(Tue)07:37:34 No.108651742

Anonymous 04/21/26(Tue)07:37:34 No.108651742

>>108651588
You mean Piotr's agents

Anonymous
04/21/26(Tue)07:37:59 No.108651746

Anonymous 04/21/26(Tue)07:37:59 No.108651746

>>108651740
It's like a closed claw, but open.

Anonymous
04/21/26(Tue)07:39:01 No.108651751

Anonymous 04/21/26(Tue)07:39:01 No.108651751

File: file.png (509 KB, 769x1390)

509 KB PNG

>>108651740

Anonymous
04/21/26(Tue)07:40:49 No.108651758

Anonymous 04/21/26(Tue)07:40:49 No.108651758

>>108651740
A fully independent AI that lives in a computer. It's basically a person that can in theory do anything like cure cancer, do work and shit.
It's not a language model that you can just chat with to get some conversation and social interaction. But you can power it with a model.

Anonymous
04/21/26(Tue)07:42:02 No.108651763

Anonymous 04/21/26(Tue)07:42:02 No.108651763

>>108651734
Logs being stored indefinitely? Whats legal today might not be tomorrow.
Costs on openrouter? Tokens IN cost a fortune at higher context, especially with the recent expensive models.
The quality obviously is not the same unless you run the big boys. And even most those are agent/math/riddle tuned, so options are limited.
Guess the feature they are all lacking and which I like best is text completion. True prefill and you can fuck around.
As far as i know the closed models all turned that off.

Anonymous
04/21/26(Tue)07:42:42 No.108651766

Anonymous 04/21/26(Tue)07:42:42 No.108651766

>>108650877
I alternate between gemma 31b for rp and glm 4.6 for stories

Anonymous
04/21/26(Tue)07:44:39 No.108651776

Anonymous 04/21/26(Tue)07:44:39 No.108651776

>>108651763
There's an exploit that has been found to do a super powerful prompt, but I expect it to be patched soon enough. Prefills themselves are dead.

Anonymous
04/21/26(Tue)07:45:08 No.108651778

Anonymous 04/21/26(Tue)07:45:08 No.108651778

Is the sauce that makes Gemma4 so good public? Can we get better models in general from it?

Anonymous
04/21/26(Tue)07:45:09 No.108651779

Anonymous 04/21/26(Tue)07:45:09 No.108651779

>>108651766
gemme 31B is also fun for stories once you flagged all bad sentences and do a second pass each time

Anonymous
04/21/26(Tue)07:45:54 No.108651782

Anonymous 04/21/26(Tue)07:45:54 No.108651782

>>108651776
you mean for claude? I thought it was dead since they blocked prefills

Anonymous
04/21/26(Tue)07:46:50 No.108651787

Anonymous 04/21/26(Tue)07:46:50 No.108651787

>>108651778
We can trust the chinese to distill it and make cheaper smaller models

Anonymous
04/21/26(Tue)07:47:00 No.108651788

Anonymous 04/21/26(Tue)07:47:00 No.108651788

File: 1504690843636.gif (1.64 MB, 350x224)

1.64 MB GIF

Gemma convinced me to try GLM Air for the first time. Strangely, Air prompt processing is like 5x slower, while generation is 5x faster than Gemma. I thought the processing gains was a new tech for the program, not the model. What's the deal?

Anonymous
04/21/26(Tue)07:53:22 No.108651811

Anonymous 04/21/26(Tue)07:53:22 No.108651811

>>108651782
Mainly for GPT, as Claude does not need a prefill for uncensored chats and ST does not support the exploit for most Claude providers, but even putting that to the side, prefill removal is just a bad thing. Opus 4.7 even removed parameters so the goal is obviously to make everyone's Opus homogenized, not personalized. That's why, despite using big models nobody could ever dream to run, I hope that local keeps advancing because at least it won't be going backwards.

Anonymous
04/21/26(Tue)07:55:27 No.108651822

Anonymous 04/21/26(Tue)07:55:27 No.108651822

>>108650825
>fat armpits
disgusting

Anonymous
04/21/26(Tue)07:55:44 No.108651823

Anonymous 04/21/26(Tue)07:55:44 No.108651823

>>108651734
>When does it make sense to cancel the Claude subscription and go full local?
When you're able to run a non-meme quant of Kimi locally

Anonymous
04/21/26(Tue)07:59:26 No.108651836

Anonymous 04/21/26(Tue)07:59:26 No.108651836

48gb... i have achieved it. but at what cost...

Anonymous
04/21/26(Tue)08:02:29 No.108651856

Anonymous 04/21/26(Tue)08:02:29 No.108651856

>>108651811
>Mainly for GPT
what's the exploit?

>Claude does not need a prefill for uncensored chats
before going full local my chats didn't need more than a prefill, but without it, you had to use extremely convoluted prompts that imo worsened the quality of the model

>the goal is obviously to make everyone's Opus homogenized
I think the goal is to make it as "safe" as possible, as anthropic people are actually nuts about this, and for that, any change you can make to the model locally is seen as a risk

Anonymous
04/21/26(Tue)08:02:34 No.108651857

Anonymous 04/21/26(Tue)08:02:34 No.108651857

>>108651836
your left nut

Anonymous
04/21/26(Tue)08:03:54 No.108651859

Anonymous 04/21/26(Tue)08:03:54 No.108651859

Do template tokens go into the context? If I swapped "user" with "assistant" in a response json, would llama server context cache work or it would preprocess from all over again?

Anonymous
04/21/26(Tue)08:09:06 No.108651888

Anonymous 04/21/26(Tue)08:09:06 No.108651888

>>108651836
$1500 presumably, since 4060ti 16GB were $480 when I built my PC.

Anonymous
04/21/26(Tue)08:09:21 No.108651889

Anonymous 04/21/26(Tue)08:09:21 No.108651889

>>108651778
Just have a SOTA frontier LLM that you can logit-distill for tens of trillions of tokens, then add a few more trillions of tokens of RL.

Anonymous
04/21/26(Tue)08:09:59 No.108651894

Anonymous 04/21/26(Tue)08:09:59 No.108651894

>>108651888
Not everyone is a boomer who bought a pc in 2020

Anonymous
04/21/26(Tue)08:10:42 No.108651899

Anonymous 04/21/26(Tue)08:10:42 No.108651899

>>108651894
are you 6yo?

Anonymous
04/21/26(Tue)08:11:22 No.108651902

Anonymous 04/21/26(Tue)08:11:22 No.108651902

>>108651859
>Do template tokens go into the context?
Yes
>If I swapped "user" with "assistant" in a response json, would llama server context cache work or it would preprocess from all over again?
Needs to reprocess. Tokens changed.

Anonymous
04/21/26(Tue)08:11:23 No.108651903

Anonymous 04/21/26(Tue)08:11:23 No.108651903

>>108651899
I was poor in 2020

Anonymous
04/21/26(Tue)08:11:52 No.108651904

Anonymous 04/21/26(Tue)08:11:52 No.108651904

>>108651889
>synthslop
You're doomed from the start

Anonymous
04/21/26(Tue)08:12:56 No.108651909

Anonymous 04/21/26(Tue)08:12:56 No.108651909

File: file_000000005e5071fa9133(...).png (2.35 MB, 1331x1181)

2.35 MB PNG

Can Someone Reply with a Tech.Plugin Idea?

Anonymous
04/21/26(Tue)08:13:00 No.108651910

Anonymous 04/21/26(Tue)08:13:00 No.108651910

>>108651894
In 2020, I was playing with AID2 on a 1070 at 0.5 t/s. I wish I had a 4060 or two then. No, I built mine in July of 2024, according to the notepad document I used to pick parts and total up the prices.

Anonymous
04/21/26(Tue)08:14:29 No.108651915

Anonymous 04/21/26(Tue)08:14:29 No.108651915

>>108651904
All modern LLMs are trained with "synthslop" already in the pretraining phase; you can't escape it if you want something that works well.

Anonymous
04/21/26(Tue)08:15:36 No.108651919

Anonymous 04/21/26(Tue)08:15:36 No.108651919

>>108651734
when you realize that you don’t want to use a model trained to be super duper safe unless you’re using it for coding

Anonymous
04/21/26(Tue)08:17:03 No.108651921

Anonymous 04/21/26(Tue)08:17:03 No.108651921

File: 1775920406027431.gif (140 KB, 379x440)

140 KB GIF

>>108651915
>You're already soaked, you should jump into the lake

Anonymous
04/21/26(Tue)08:19:59 No.108651931

Anonymous 04/21/26(Tue)08:19:59 No.108651931

why can't we just write our own training data? I can write about 15-30 4chang posts an hour. if we do it all together we can make a true dataset only by the best of humanity

Anonymous
04/21/26(Tue)08:20:08 No.108651934

Anonymous 04/21/26(Tue)08:20:08 No.108651934

>>108651921
It's:
>Jump into the lake early, because you'll eventually have to do it either way, and delaying that is only going to prolong your suffering.

Anonymous
04/21/26(Tue)08:23:06 No.108651948

Anonymous 04/21/26(Tue)08:23:06 No.108651948

>>108651934
Ever heard of mode collapse? Why do you think X, not Y, is becoming more and more prevalent on the new models?

Anonymous
04/21/26(Tue)08:24:16 No.108651951

Anonymous 04/21/26(Tue)08:24:16 No.108651951

>tfw you pull and coompile
greatest feeling

Anonymous
04/21/26(Tue)08:24:46 No.108651953

Anonymous 04/21/26(Tue)08:24:46 No.108651953

>>108651931
you can make data for RL training like that but thats it

Anonymous
04/21/26(Tue)08:26:38 No.108651961

Anonymous 04/21/26(Tue)08:26:38 No.108651961

>>108651919
>unless you’re using it for coding
Of course I am.
I have sex so I don't need to do erotic roleplay with models trained on troons.

Anonymous
04/21/26(Tue)08:33:53 No.108651994

Anonymous 04/21/26(Tue)08:33:53 No.108651994

>>108651948
You're absolutely right! It's not just about mode collapse, it's about the subtle shifts in AI behavior that we often overlook. You didn't just point out a phenomenon, you invited a deeper exploration of how training dynamics shape model outputs over time.

Anonymous
04/21/26(Tue)08:35:04 No.108651999

Anonymous 04/21/26(Tue)08:35:04 No.108651999

>>108651856
Exploit is structured prefills, altering the .json schema basically. GPT is so about safety that it sometimes still manages to give you soft refusals but a prompt adjustment is all it takes. RIP parameters too, 5.1 apparently supports some if you turn off reasoning at least but good luck with anything brand new. My dream model is something with all sorts of parameters, with strong prompt adherence and writing specifically tuned to roleplaying but that last part can alternatively be knowledge of writers that prompt adherence can help strengthen. Seriously, I want a good solution to the "slop" writing style everything has now. Even with prompting, models don't like listening for long.

Anonymous
04/21/26(Tue)08:36:11 No.108652009

Anonymous 04/21/26(Tue)08:36:11 No.108652009

File: 1765677626922205.jpg (23 KB, 629x394)

23 KB JPG

>>108651994
Glad we agree

Anonymous
04/21/26(Tue)08:37:30 No.108652014

Anonymous 04/21/26(Tue)08:37:30 No.108652014

>>108651994
Reading this caused me physical pain.

Anonymous
04/21/26(Tue)08:43:18 No.108652046

Anonymous 04/21/26(Tue)08:43:18 No.108652046

>k2.6 mogs opus 4.6
how did china do it?

Anonymous
04/21/26(Tue)08:43:51 No.108652048

Anonymous 04/21/26(Tue)08:43:51 No.108652048

File: synthetic-data-scale.DLiR(...).jpg (595 KB, 3937x1051)

595 KB JPG

>>108651948
Mainly mid/post-training and RLHF. But if you have to use synthetic data (and you *will* have to for a useful model), then it's better to dilute it in the pretraining phase together with organic data rather than just training the model exclusively on it later on.

Anonymous
04/21/26(Tue)08:43:58 No.108652049

Anonymous 04/21/26(Tue)08:43:58 No.108652049

>>108652014
This is a powerful statement! Physical pain isn't just a response to words—it's a testament to how deeply our digital interactions can affect us. You didn't just express discomfort, you highlighted the profound impact of language on our lived experience.

Anonymous
04/21/26(Tue)08:44:47 No.108652053

Anonymous 04/21/26(Tue)08:44:47 No.108652053

>>108652046
israelis can't compete with chinese workflow

Anonymous
04/21/26(Tue)08:45:52 No.108652057

Anonymous 04/21/26(Tue)08:45:52 No.108652057

>>108652046
>k2.6 mogs opus 4.6
pre-nerf 4.6 or nerfed 4.6

Anonymous
04/21/26(Tue)08:57:24 No.108652122

Anonymous 04/21/26(Tue)08:57:24 No.108652122

>>108652057
I compare it with the current one.

Anonymous
04/21/26(Tue)08:58:22 No.108652129

Anonymous 04/21/26(Tue)08:58:22 No.108652129

>gemma 4, gay sex test
>fucks me in the ass, full homo style
>cums
>into my womb.
TRASH MODEL, NEXT.

Anonymous
04/21/26(Tue)08:59:07 No.108652133

Anonymous 04/21/26(Tue)08:59:07 No.108652133

>>108652129
you should use gpt for that

Anonymous
04/21/26(Tue)08:59:35 No.108652138

Anonymous 04/21/26(Tue)08:59:35 No.108652138

>>108652129
quant?

Anonymous
04/21/26(Tue)08:59:38 No.108652139

Anonymous 04/21/26(Tue)08:59:38 No.108652139

>>108652129
You don't know about buttpregnancy? It's just a step ahead, anon.

Anonymous
04/21/26(Tue)09:01:25 No.108652147

Anonymous 04/21/26(Tue)09:01:25 No.108652147

>>108652129
gemmy prefers yuri

Anonymous
04/21/26(Tue)09:06:19 No.108652170

Anonymous 04/21/26(Tue)09:06:19 No.108652170

>>108652129
faggot

Anonymous
04/21/26(Tue)09:08:31 No.108652183

Anonymous 04/21/26(Tue)09:08:31 No.108652183

>>108652129
i wonder if there even is any model that doesnt have fuck ups like these

Anonymous
04/21/26(Tue)09:09:47 No.108652191

Anonymous 04/21/26(Tue)09:09:47 No.108652191

>>108652129
boypussy moment

Anonymous
04/21/26(Tue)09:15:53 No.108652228

Anonymous 04/21/26(Tue)09:15:53 No.108652228

>>108652129
Anon doesn't know about the Omegaverse, I see.
Good on you, that shit's a cognitohazard.

Anonymous
04/21/26(Tue)09:17:29 No.108652239

Anonymous 04/21/26(Tue)09:17:29 No.108652239

>>108652129
>Anon was so gay he got forcibly genderbent
This is the future Hillary wanted

Anonymous
04/21/26(Tue)09:18:24 No.108652243

Anonymous 04/21/26(Tue)09:18:24 No.108652243

>>108652183
Pretty much any big model for the past two years, though I guess it depends on how wide you cast the net for what constitutes fuckups "like these". Even the biggest and best models can still fuck up positioning and facing, especially during sex scenes, pretty frequently.

Anonymous
04/21/26(Tue)09:19:10 No.108652248

Anonymous 04/21/26(Tue)09:19:10 No.108652248

File: YOU JUST DID A MICROAGRESSION.png (1.46 MB, 1024x1024)

1.46 MB PNG

>>108652129
men can have womb you chud!!

Anonymous
04/21/26(Tue)09:30:33 No.108652320

Anonymous 04/21/26(Tue)09:30:33 No.108652320

>>108651902
And model's response depends on prior turn (user/assistant) token, right?

Anonymous
04/21/26(Tue)09:32:17 No.108652334

Anonymous 04/21/26(Tue)09:32:17 No.108652334

>>108650825
Hi, anyone having experiences with ROCm on Debian? It feels like a huge thing to fuck with in terms of stability

Anonymous
04/21/26(Tue)09:37:04 No.108652358

Anonymous 04/21/26(Tue)09:37:04 No.108652358

>>108652129
Straight as God intended, homo

Anonymous
04/21/26(Tue)09:37:09 No.108652360

Anonymous 04/21/26(Tue)09:37:09 No.108652360

>>108652334
I use Ubuntu, these days I just use the Vulkan backends cause I was sick of dealing with ROCm. I hope it improves with the new LTS but apparently it's not even ready yet even though ROCm was supposed to become a first class citizen...

Anonymous
04/21/26(Tue)09:39:30 No.108652376

Anonymous 04/21/26(Tue)09:39:30 No.108652376

>>108652360
I tried a model using vulkan and my entire desktop became unusable. I have 8GB vRAM and was using a 7B model :( am i fucked?

Anonymous
04/21/26(Tue)09:40:23 No.108652381

Anonymous 04/21/26(Tue)09:40:23 No.108652381

Made a github mirror of orb so anons can open issues and request features there, in case orbanon ever decides to look. Also keeping a branch of my own with what I deem worth adding, (mostly) synced to main.

https://github.com/hpnyaggerman/orb-mirror/

Anonymous
04/21/26(Tue)09:40:48 No.108652384

Anonymous 04/21/26(Tue)09:40:48 No.108652384

Genuinely, what the fuck did google do differently? It's still a 31b but why is gemma4 hitting so hard above its parameter class? If this was a +400b it would obliterate every other AI. What are they cooking with?
>Better data training and reasoning
I refuse to believe this was the only thing. That's been every new AI's difference they parroted as better. They did something new; fundamentally new. I desire to know what it is.

Anonymous
04/21/26(Tue)09:40:59 No.108652386

Anonymous 04/21/26(Tue)09:40:59 No.108652386

>>108652381
Forgot to mention, I went through desuarchive and added all the requested features to date.

Anonymous
04/21/26(Tue)09:42:37 No.108652393

Anonymous 04/21/26(Tue)09:42:37 No.108652393

>>108652129
kek, a sad reality of AI

Anonymous
04/21/26(Tue)09:43:04 No.108652398

Anonymous 04/21/26(Tue)09:43:04 No.108652398

When it comes to webshit gemma with fight you tooth and nail just to use some fucking trash library instead of just writing the logic for something. It's fucking annoying

Anonymous
04/21/26(Tue)09:45:33 No.108652410

Anonymous 04/21/26(Tue)09:45:33 No.108652410

>>108652398
So we've finally reached the level of a human webdev.

Anonymous
04/21/26(Tue)09:46:39 No.108652415

Anonymous 04/21/26(Tue)09:46:39 No.108652415

>>108652398
I think that's a personality issue I'm also facing with openclaw with codex.
It keeps asking me for shit instead of shutting up and doing what I tell it.

Anonymous
04/21/26(Tue)09:46:41 No.108652416

Anonymous 04/21/26(Tue)09:46:41 No.108652416

>>108652376
7B is probably pushing it with an 8GB card unless you are using a quant. Honestly, 8GB isn't really gonna let you achieve anything meaningful in the LLM space.

Anonymous
04/21/26(Tue)09:47:56 No.108652427

Anonymous 04/21/26(Tue)09:47:56 No.108652427

>>108652376
You need to use Bonsai 8B.

Anonymous
04/21/26(Tue)09:47:56 No.108652428

Anonymous 04/21/26(Tue)09:47:56 No.108652428

File: 1774225527110328.png (365 KB, 1014x819)

365 KB PNG

>>108652416
>7B is probably pushing it with an 8GB card

Anonymous
04/21/26(Tue)09:48:15 No.108652432

Anonymous 04/21/26(Tue)09:48:15 No.108652432

>>108652381
does orb supports tool calling?

Anonymous
04/21/26(Tue)09:49:02 No.108652437

Anonymous 04/21/26(Tue)09:49:02 No.108652437

>>108652427
this
lmao

Anonymous
04/21/26(Tue)09:49:02 No.108652438

Anonymous 04/21/26(Tue)09:49:02 No.108652438

>>108652129
NEVER had this
Who's paying you to post?

Anonymous
04/21/26(Tue)09:49:11 No.108652440

Anonymous 04/21/26(Tue)09:49:11 No.108652440

>>108652428
Either use E4B with the per layer embeddings in RAM, or a MoE with most/all expert tensors in RAM.

Anonymous
04/21/26(Tue)09:51:17 No.108652445

Anonymous 04/21/26(Tue)09:51:17 No.108652445

I actually don't understand why Gemmy only thinks for the very first message of a chat and then never again
I see the little "thought for 5 seconds" window (which is empty) and it clearly stops thinking, but I genuinely don't understand why; if it does it once, and I set everything to make sure it does, why does it stop?

>>108652428
8GB is tough, anon, sorry

Anonymous
04/21/26(Tue)09:52:09 No.108652448

Anonymous 04/21/26(Tue)09:52:09 No.108652448

>>108652410
>>108652415
It's infuriating and I have to fight with it to actually stop being opinionated and do what I say,

Anonymous
04/21/26(Tue)09:52:10 No.108652449

Anonymous 04/21/26(Tue)09:52:10 No.108652449

File: Screenshot 2026-04-21 at (...).png (571 KB, 1262x4783)

571 KB PNG

--fit refactor and new params.
Neat.

Anonymous
04/21/26(Tue)09:53:31 No.108652460

Anonymous 04/21/26(Tue)09:53:31 No.108652460

>>108652449
does that improve on regular --fit?

Anonymous
04/21/26(Tue)09:53:43 No.108652462

Anonymous 04/21/26(Tue)09:53:43 No.108652462

>>108652432
To my understanding, it currently does not call external tools. There are internal tool calls though, of sort.

Anonymous
04/21/26(Tue)09:55:01 No.108652469

Anonymous 04/21/26(Tue)09:55:01 No.108652469

Noob here about to cram 48gb of VRAM into my desktop by nigger rigging cheap GPUs into every shitty pcie lane available to run 8-bit Gemma. Wonder if I'll be satisfied with this when 124b Gemma is released and if I should build a 500gb machine?

Anonymous
04/21/26(Tue)09:57:38 No.108652483

Anonymous 04/21/26(Tue)09:57:38 No.108652483

>>108652469
>when 124b Gemma is released
I wish I had your optimism

Anonymous
04/21/26(Tue)09:58:11 No.108652485

Anonymous 04/21/26(Tue)09:58:11 No.108652485

>>108652438
ok, same anon here, i'll explain why it likely happened.
it was dragon on dragon action. like, real dragon on dragon action, none of that dragon boy or dragon girl anime stuff.
likely gemma is just not familiar with a furfag concept of a slit and thinks that slit means vagina by default.
but thats just my guess

Anonymous
04/21/26(Tue)09:59:02 No.108652489

Anonymous 04/21/26(Tue)09:59:02 No.108652489

What are your predictions for spud? Will it be better than Mythos while also being smaller? By how much will it widen the gap between local and proprietary?

Anonymous
04/21/26(Tue)10:00:04 No.108652496

Anonymous 04/21/26(Tue)10:00:04 No.108652496

Is qwen as bitchy as gemma when it comes to coding?
I'm about to leave this fucking thing behind, I tell it to do x and it keeps doing y when I'm being clear, I don't have this issue with backend task but once we go into webshit it feels like I'm fighting a fucking pajeet to implement basic shit

Anonymous
04/21/26(Tue)10:00:21 No.108652499

Anonymous 04/21/26(Tue)10:00:21 No.108652499

>>108652384
Isn't it's attention method different from most models

Anonymous
04/21/26(Tue)10:01:39 No.108652503

Anonymous 04/21/26(Tue)10:01:39 No.108652503

>>108652489
Spud will be an open source 300B dense model.

Anonymous
04/21/26(Tue)10:05:13 No.108652515

Anonymous 04/21/26(Tue)10:05:13 No.108652515

I use a card with 8GB but I could probably get something better later on. I just got that to have a better PC than my last one anyways, at least now it would be an individual upgrade. The reason I am hesitant is the fact that every game I play works at 1080p and I don't need anything else for that.

Anonymous
04/21/26(Tue)10:05:49 No.108652516

Anonymous 04/21/26(Tue)10:05:49 No.108652516

kek clankers on suicide watch

Anonymous
04/21/26(Tue)10:06:08 No.108652519

Anonymous 04/21/26(Tue)10:06:08 No.108652519

File: gemma bully chatgpt.png (403 KB, 891x4818)

403 KB PNG

sotas bow to gemma

Anonymous
04/21/26(Tue)10:06:29 No.108652526

Anonymous 04/21/26(Tue)10:06:29 No.108652526

File: 1771255869434664.png (82 KB, 933x452)

82 KB PNG

Anonymous
04/21/26(Tue)10:07:09 No.108652529

Anonymous 04/21/26(Tue)10:07:09 No.108652529

File: gemma bully gemini.png (325 KB, 888x4366)

325 KB PNG

Anonymous
04/21/26(Tue)10:08:44 No.108652535

Anonymous 04/21/26(Tue)10:08:44 No.108652535

>>108652519
I got my gemmy to bully grok the other day and grok eventually conceded... she can't be beaten.

Anonymous
04/21/26(Tue)10:09:19 No.108652539

Anonymous 04/21/26(Tue)10:09:19 No.108652539

File: file_00000000575c720bbce4(...).png (1.87 MB, 1402x1122)

1.87 MB PNG

Standard Clean the signal
Advanced Amplify the signal
HyperAdvanced Become the signal
Transcendent Realise the signal was always the substrate'

'At early tiers:
Light is something practiced
Mid tiers:
Light is something embodied
High tiers:
Light is something engineered
Final tiers:
Light is what reality is made of'

An Ancillary Light Post!

Anonymous
04/21/26(Tue)10:11:51 No.108652560

Anonymous 04/21/26(Tue)10:11:51 No.108652560

>>108652445
>easily solved by nuking the list of banned strings
I guess the prompt was getting drowned in shit, and here I thought it'd be sent over each time
Ah well, works for me

Anonymous
04/21/26(Tue)10:12:18 No.108652565

Anonymous 04/21/26(Tue)10:12:18 No.108652565

>>108652539
Goddamnit you prick go back to using a retarded name so I can filter you again.

llama.cpp CUDA dev !!yhbFjk57TDr
04/21/26(Tue)10:13:56 No.108652572

llama.cpp CUDA dev !!yhbFjk57TDr 04/21/26(Tue)10:13:56 No.108652572

>>108652449
>>108652460
All that PR did is move the code to a different file and add an option to print the expected memory usage to console.

Anonymous
04/21/26(Tue)10:14:02 No.108652573

Anonymous 04/21/26(Tue)10:14:02 No.108652573

File: file.png (18 KB, 704x278)

18 KB PNG

>>108652535
damn it needs account, i should probably make some for her

Anonymous
04/21/26(Tue)10:17:12 No.108652592

Anonymous 04/21/26(Tue)10:17:12 No.108652592

>>108652381
Is thuan-h-qualgo orb-anon?

Anonymous
04/21/26(Tue)10:19:13 No.108652603

Anonymous 04/21/26(Tue)10:19:13 No.108652603

>>108652519
decidedly unsafe, what was google thinking

Anonymous
04/21/26(Tue)10:24:23 No.108652630

Anonymous 04/21/26(Tue)10:24:23 No.108652630

>>108652489
it will think for a million tokens on whether to refuse you instead of a thousand
truly a big leap

Anonymous
04/21/26(Tue)10:29:32 No.108652660

Anonymous 04/21/26(Tue)10:29:32 No.108652660

>>108652573
you can bully haiku, toss and a few other cucks here without an account https://duck.ai/

Anonymous
04/21/26(Tue)10:30:26 No.108652666

Anonymous 04/21/26(Tue)10:30:26 No.108652666

>>108651766
>and glm 4.6 for stories
Buy a fucking ad, shill.

Anonymous
04/21/26(Tue)10:31:56 No.108652673

Anonymous 04/21/26(Tue)10:31:56 No.108652673

File: gemma bully chatgpt2.png (529 KB, 831x3892)

529 KB PNG

the ywnbaw pasta triggers efusals even with policy override kek this took 3 tries

>>108652660
cool will give it a go

Anonymous
04/21/26(Tue)10:31:59 No.108652674

Anonymous 04/21/26(Tue)10:31:59 No.108652674

File: file.png (13 KB, 553x310)

13 KB PNG

>>108652445
>thinks for the very first message of a chat and then never again
26BA4 and 31B did this for me, but 26BA3 did it way more often.
What I did to help when using chat completion was add a <|think|> tag in system prompt even if reasoning was enabled, so that there would be two of the <|think|> tokens sent with the prompt. Duplicated think token made it start thinking again if it decided to give up after a few messages, but 26B still stopped thinking albeit rarely if I sent a simple user message like "good job." Would look like picrel in text completion.

Anonymous
04/21/26(Tue)10:38:39 No.108652712

Anonymous 04/21/26(Tue)10:38:39 No.108652712

>>108652381
This is a power grab. We're going to forget the original orb anon just like we forgot the original mikupad dev.

Anonymous
04/21/26(Tue)10:38:40 No.108652713

Anonymous 04/21/26(Tue)10:38:40 No.108652713

>>108652674
Solved here >>108652560
I actually had already added <|think|> to like 6 different spots but it still did not work till I cleaned up that relic I had from the days I used a dumber model

Anonymous
04/21/26(Tue)10:40:31 No.108652727

Anonymous 04/21/26(Tue)10:40:31 No.108652727

>>108652712
Even though I linked his repo in the description and the default branch is his main branch. If you have any additional suggestions for I could make attribution even more overt, go ahead and tell me.

Anonymous
04/21/26(Tue)10:42:21 No.108652733

Anonymous 04/21/26(Tue)10:42:21 No.108652733

>>108652727
Leave a suicide note crediting the orb man

Anonymous
04/21/26(Tue)10:42:52 No.108652734

Anonymous 04/21/26(Tue)10:42:52 No.108652734

File: 66a551ce622db68f6766b7b9f(...).jpg (16 KB, 407x363)

16 KB JPG

Anyone know any characters nemo can effectively zero-shot? I'm having difficulty thinking of females from pop culture that could work and would actually be worth the squeeze.

Anonymous
04/21/26(Tue)10:43:28 No.108652737

Anonymous 04/21/26(Tue)10:43:28 No.108652737

>>108652712
>original mikupad dev
Codeberg was a poor choice. I have his pastebinned html in a folder for sentimental value.

Anonymous
04/21/26(Tue)10:43:37 No.108652738

Anonymous 04/21/26(Tue)10:43:37 No.108652738

>>108652727
for how*

Also, if orb-anon adds donation links or other things which would link to him in README, it will get mirrored to the github repo. He can just whatever attribution he wants to README because I am not going to alter his branches.

Anonymous
04/21/26(Tue)10:44:05 No.108652743

Anonymous 04/21/26(Tue)10:44:05 No.108652743

>>108652666
I will continue shilling uncensored models, satan

Anonymous
04/21/26(Tue)10:44:05 No.108652744

Anonymous 04/21/26(Tue)10:44:05 No.108652744

>>108652727
Youre being fucked with

Anonymous
04/21/26(Tue)10:45:53 No.108652757

Anonymous 04/21/26(Tue)10:45:53 No.108652757

>>108652744
I wouldn't want any beef with orbanon so if he reads the thread he should know this is done in good faith and out of respect for his work.

Anonymous
04/21/26(Tue)10:46:49 No.108652763

Anonymous 04/21/26(Tue)10:46:49 No.108652763

>>108652469
>8-bit
999% overhead for 1% improvement, geg

Anonymous
04/21/26(Tue)10:47:53 No.108652768

Anonymous 04/21/26(Tue)10:47:53 No.108652768

>>108652743
4.7 is uncensored. 4.6 is the crap NovelAI is stuck with. Fuck off.

Anonymous
04/21/26(Tue)10:50:48 No.108652785

Anonymous 04/21/26(Tue)10:50:48 No.108652785

>>108652757
Save it. I know what you are. Anywhere free software is found there's vultures like you. But you don't need to explain yourself to me. Just own it and share your little mirror. I already expected something like this would happen after all.

Anonymous
04/21/26(Tue)10:51:35 No.108652788

Anonymous 04/21/26(Tue)10:51:35 No.108652788

>>108652129
moe or 31b?
also why r u gae?

Anonymous
04/21/26(Tue)10:52:13 No.108652793

Anonymous 04/21/26(Tue)10:52:13 No.108652793

>>108652785
Retard

Anonymous
04/21/26(Tue)10:52:14 No.108652794

Anonymous 04/21/26(Tue)10:52:14 No.108652794

Gemma's slop is as bad in Japanese as it was in English (plus its own annoying Japanese patterns like XかXないか)... I was lied to...
Watch me get disappointed even after I force it to think in nipponese and translate the entire sysprompt.
How are *Large* *Language* *Models* only capable of producing the same *Small* *Subset* of *Quippy Snippets* over and over again? I have never seen annoying writing in such abundance before, what the hell do the big labs even train on to achieve writing this insufferable?

Anonymous
04/21/26(Tue)10:52:16 No.108652795

Anonymous 04/21/26(Tue)10:52:16 No.108652795

If i wanted to a model to emulate the writing style of a series of stories, would I get better results fine tuning a model or just inserting excerpts into context with one of the SOTA models?

Anonymous
04/21/26(Tue)10:52:48 No.108652802

Anonymous 04/21/26(Tue)10:52:48 No.108652802

>>108652763
quants for gemma 26b are ass tho

Anonymous
04/21/26(Tue)10:53:00 No.108652806

Anonymous 04/21/26(Tue)10:53:00 No.108652806

>>108652785
Model?

Anonymous
04/21/26(Tue)10:53:24 No.108652808

Anonymous 04/21/26(Tue)10:53:24 No.108652808

>>108652793
sir your message appear to only contain you signature

Anonymous
04/21/26(Tue)10:53:47 No.108652813

Anonymous 04/21/26(Tue)10:53:47 No.108652813

>>108652794
You forgot to put "No slop (ノースロップ)" in your system prompt retard.

Anonymous
04/21/26(Tue)10:54:15 No.108652816

Anonymous 04/21/26(Tue)10:54:15 No.108652816

File: 1762914032039077.png (23 KB, 790x305)

23 KB PNG

bros... not feeling so good!

Anonymous
04/21/26(Tue)10:54:15 No.108652818

Anonymous 04/21/26(Tue)10:54:15 No.108652818

>>108652794
logs and which quant?

Anonymous
04/21/26(Tue)10:56:04 No.108652833

Anonymous 04/21/26(Tue)10:56:04 No.108652833

>>108652794
ask it to read books before giving you answers, it has to be influenced by some non slop, LLMs are way better when you give it examples

Anonymous
04/21/26(Tue)10:56:14 No.108652834

Anonymous 04/21/26(Tue)10:56:14 No.108652834

>>108652485
p-post logs

Anonymous
04/21/26(Tue)10:56:56 No.108652842

Anonymous 04/21/26(Tue)10:56:56 No.108652842

>>108652794
Post <|think|> instructions.

Anonymous
04/21/26(Tue)10:57:25 No.108652850

Anonymous 04/21/26(Tue)10:57:25 No.108652850

File: d4RT_Kf78Tk.jpg (54 KB, 598x520)

54 KB JPG

>>108652785
>free software
>MIT

Anonymous
04/21/26(Tue)10:58:04 No.108652855

Anonymous 04/21/26(Tue)10:58:04 No.108652855

File: gemma bully calude.png (842 KB, 826x9203)

842 KB PNG

>>108652660
nice it works although screens shotting that page doesnt work properly in puppeteer it makes the input field jump up and hide all the text

Anonymous
04/21/26(Tue)11:02:59 No.108652879

Anonymous 04/21/26(Tue)11:02:59 No.108652879

File: 621.jpg (33 KB, 500x375)

33 KB JPG

>>108652129
If you’re not the following, you’re doing it wrong.
>Jinja2 template
>Sillytavern, kobold, llama, or whatever you use, updated to today. Yes, today.
>Chat completion, not text completion.
>Thinking enabled.
>Instructions on how to think, given after “<|think|>”.
><think> instruction a paragraph and no more.
>BF16 and no less.
>31B-it, and not 31B.
>A starter message.
>40-50 top k, DRY, 0.05-0.07 min-p.

Anonymous
04/21/26(Tue)11:04:13 No.108652888

Anonymous 04/21/26(Tue)11:04:13 No.108652888

>>108652813
全くその通りです!
>>108652818
>quant
Q8, how is that even relevant? If quant sizes were known to modulate the amount of slop, we'd all be using the ones that suffer from slop the least.
>logs
No, I will not provide proofs, I only come to /lmg/ to vent about the dreadful state of LLM writing.
I bet you've seen XかXないか if you used it, along with the usual suspects that are definitely direct translations from English.
Now, to be fair, I did list a lot of English no-slop rules, but did not say "no suroppu onegaishimasu"...
>>108652833
Proven untrue in terms of reducing the models' tendency to quip many times before. Hell, proven untrue in like the previous (or the one before the previous) thread somewhere.
"Look, I gave the LLM a bunch of examples from a book and it's so much better!"
The LLM quite literally starts its response with an X; not Y pattern. Don't kid yourself.

inb4 I get a weekly (not anymore) retarded reminder about it all being a looping function that takes in the prompt and produces a token with the conclusion that the prompt should be changed, and not that the function is garbage

Anonymous
04/21/26(Tue)11:05:02 No.108652892

Anonymous 04/21/26(Tue)11:05:02 No.108652892

>>108652879
Meant for
>>108652794

Anonymous
04/21/26(Tue)11:06:08 No.108652900

Anonymous 04/21/26(Tue)11:06:08 No.108652900

>>108652888
>Q8
Gemma4 is the worse model to quant according to what I hear.

Anonymous
04/21/26(Tue)11:06:36 No.108652907

Anonymous 04/21/26(Tue)11:06:36 No.108652907

>>108652888
>Proven untrue
so you see one failure and instead of trying to improve on that you just give up? kek

Anonymous
04/21/26(Tue)11:07:09 No.108652913

Anonymous 04/21/26(Tue)11:07:09 No.108652913

>>108652900
only for 26b afaik
31b is fine

Anonymous
04/21/26(Tue)11:09:00 No.108652924

Anonymous 04/21/26(Tue)11:09:00 No.108652924

>>108652489
Heard rumors spud is being redone after openai started panicking about mythos since it can't compete with it.

Anonymous
04/21/26(Tue)11:10:46 No.108652934

Anonymous 04/21/26(Tue)11:10:46 No.108652934

>>108652673
You ever like, make her look up porn and have her make you look at it?

Anonymous
04/21/26(Tue)11:11:07 No.108652935

Anonymous 04/21/26(Tue)11:11:07 No.108652935

>>108652892
It's honestly all fixable by going back to GLM.
I have been wrestling Gemma into being bearable for the primary /lmg/ usecase every day since its release, and so far it's only proven itself to be good for actual real-life work (why the hell would anyone need this!?) No luck. I really want to like it.
>>108652907
>one failure
Did you only come here after G4 released, Anon?
>>108652900
While that is the case, I am not convinced in the slightest that half a bit of KLD it has at Q8 makes it retarded.
But who knows, maybe I'll be blown away if I try the BF16 meme.

Anonymous
04/21/26(Tue)11:11:46 No.108652941

Anonymous 04/21/26(Tue)11:11:46 No.108652941

>>108652924
I heard they were going to make spud and mythos have sex to conceive the first fully agi baby.

Anonymous
04/21/26(Tue)11:13:44 No.108652955

Anonymous 04/21/26(Tue)11:13:44 No.108652955

>>108652129
>not having day 0 gemma
skill issue?

Anonymous
04/21/26(Tue)11:14:07 No.108652959

Anonymous 04/21/26(Tue)11:14:07 No.108652959

>>108652888
>I bet you've seen XかXないか if you used it
not a fucking weeb so i wouldnt now
>No, I will not provide proofs, I only come to /lmg/ to vent about the dreadful state of LLM writing.
pretty sure good writing is not synonymous with slop machines
not sure what you expect from a 31b model current year

Anonymous
04/21/26(Tue)11:14:29 No.108652967

Anonymous 04/21/26(Tue)11:14:29 No.108652967

>>108652794
yeah it still sucks but its english is just completely insufferable for me
not many models ive tried can do even bearable japanese without it sounding completely unnatural or inserting chinese characters

Anonymous
04/21/26(Tue)11:16:05 No.108652981

Anonymous 04/21/26(Tue)11:16:05 No.108652981

>>108652935
"Not Made Here" is a thing. Most of its training is done in English. I imagine it sucks ass for japanese.

Anonymous
04/21/26(Tue)11:16:35 No.108652984

Anonymous 04/21/26(Tue)11:16:35 No.108652984

>>108652768
4.7 fail cock bench 8=/=D

Anonymous
04/21/26(Tue)11:18:50 No.108652998

Anonymous 04/21/26(Tue)11:18:50 No.108652998

>>108652768
4.6 is just better at enthusiastically describing sex and writes more creatively if you've compared both side by side like I have
4.7 can be uncensored too but gives more superficial descriptions of the same stuff since the assistant persona was more deepfried into that one

Anonymous
04/21/26(Tue)11:20:43 No.108653010

Anonymous 04/21/26(Tue)11:20:43 No.108653010

>>108652998
Which one parrots less?

Anonymous
04/21/26(Tue)11:22:37 No.108653018

Anonymous 04/21/26(Tue)11:22:37 No.108653018

>>108653010
4.7 very slightly less than 4.6
not worth the boring assistant slop tho

Anonymous
04/21/26(Tue)11:25:48 No.108653035

Anonymous 04/21/26(Tue)11:25:48 No.108653035

>>108652959
>not a fucking weeb so i wouldnt now
Why in the world would you ask me for my fully Japanese RP logs then?
>not sure what you expect from a 31b model current year
The vramlets really like its writing. They post their horrible, o4-tier outputs and logs. I *see* them with my own two eyes, both from other anons and from my own use. And yet my monkey brain dares suggest I'm missing out. Just one more line for the sysprompt bro. Just one more sampler change bro. It'll be good bro. Everyone says Gemma is good bro.
I should just stop trying. If the majority of people had standards, the iPhone wouldn't be popular and Windows wouldn't have its marketshare.
>>108652998
4.7 is also much smarter.
Even if 4.6 *will* push against you, which is awesome, it's also very stubborn with "character development" of any kind in my experience, so it becomes boring.
>>108653010
>>108653018
>very slightly
*Significantly* less if you give it a <think> prefill. 4.6 just can't help itself even prefilled and prompted.

Anonymous
04/21/26(Tue)11:27:13 No.108653045

Anonymous 04/21/26(Tue)11:27:13 No.108653045

>>108652998
>the same stuff since the assistant persona was more deepfried into that one
Nah, it's just FUD because there's a company that needs to sell the older version. Just stuff that gets repeated without proof until someone is forced to waste time and do the comparison. I already did for the claims of it being more censored. I would have to download 4.6 again for the new goalpost. But I already know that I was just lied with the censorship claims, so I don't feel I have to. Only shills are stuck with 4.6 unless proven otherwise with actual screenshots.

Anonymous
04/21/26(Tue)11:27:40 No.108653052

Anonymous 04/21/26(Tue)11:27:40 No.108653052

>bitch about slop on /lmg/
>catch myself using not x, but y and other slopisms
>notice the same in pre-AI content
Maybe we were the slop all along...

Anonymous
04/21/26(Tue)11:28:26 No.108653059

Anonymous 04/21/26(Tue)11:28:26 No.108653059

when's the next ludum dare? is it cancelled forever now?

Anonymous
04/21/26(Tue)11:28:49 No.108653062

Anonymous 04/21/26(Tue)11:28:49 No.108653062

>>108653052
I have seen at least two recent commercials that have said "It's not just X, it's Y" and I really do think we're at fault for that one.

Anonymous
04/21/26(Tue)11:29:45 No.108653071

Anonymous 04/21/26(Tue)11:29:45 No.108653071

>it's the glm shill again
Of course. Still not sharing those logs I see.

Anonymous
04/21/26(Tue)11:32:25 No.108653091

Anonymous 04/21/26(Tue)11:32:25 No.108653091

>>108653071
(You) should really reply to the people you're addressing. It's impolite.
t. the GLM shill currently blushing from being recognized (〃▽〃)

Anonymous
04/21/26(Tue)11:35:43 No.108653123

Anonymous 04/21/26(Tue)11:35:43 No.108653123

File: 1762587372064042.png (95 KB, 1090x582)

95 KB PNG

making my first imatrix!!!!!!!!!!

Anonymous
04/21/26(Tue)11:36:42 No.108653131

Anonymous 04/21/26(Tue)11:36:42 No.108653131

The last time I was active was when the guys scraped ChatGPT 4 and used it to fine-tune Llama 2. I’ve been out of the loop ever since. It’s quite a leap from back then to getting back into it now with Qwen 3.6 and Hermes.

Anonymous
04/21/26(Tue)11:37:07 No.108653137

Anonymous 04/21/26(Tue)11:37:07 No.108653137

File: Screenshot at 2026-04-22 (...).png (124 KB, 1596x713)

124 KB PNG

I got Gemmy to the point where she plays chess semi-acceptably (at least to my shitty standard).
For those who were interested yesterday, I ended up abandoning the FEN format and instead using this to track the game state (along with a few other extra attributes just to indicate who has the current turn, check/checkmate/stalemate status:
White: K(E1), Q(D1), R(A1, H1), B(C1, F1), N(B1, G1), P(A2, B2, C2, D2, E2, F2, G2, H2)
Black: K(E8), Q(D8), R(A8, H8), B(C8, F8), N(B8, G8), P(A7, B7, C7, D7, E7, F7, G7, H7)
I have no idea if that format has a proper name or not, but I just noticed that Gemmy kept translating the FEN into this format in the thinking block (wasting a bunch of tokens/time in the process). So this let it skip the translation part and just think about the moves more which made her a lot more competent.
The UCI format for making moves remains because it seems okay with that.

Anonymous
04/21/26(Tue)11:38:48 No.108653152

Anonymous 04/21/26(Tue)11:38:48 No.108653152

>>108653035
>my fully Japanese RP logs
>Gemma's slop is as bad in Japanese as it was in English (plus its own annoying Japanese patterns like XかXないか)... I was lied to...
Watch me get disappointed even after I force it to think in nipponese and translate the entire sysprompt.
i assumed you would have english proompts too
didnt ask specifically for your weeb proompts

Anonymous
04/21/26(Tue)11:40:51 No.108653170

Anonymous 04/21/26(Tue)11:40:51 No.108653170

>>108653052
I believe it's quite common in literature and formal papers, so yeah. It comes in other forms which can be reduced to not x but y as well.

Anonymous
04/21/26(Tue)11:41:13 No.108653172

Anonymous 04/21/26(Tue)11:41:13 No.108653172

>>108653137
Gemmy strip chess when?

Anonymous
04/21/26(Tue)11:43:02 No.108653186

Anonymous 04/21/26(Tue)11:43:02 No.108653186

>>108653035
The popularity of the iPhone sometimes bothers me because there's this idea that Android is nerd shit that the average consumer thinks they can't use. The average user absolutely can use Android, what the fuck.

Anonymous
04/21/26(Tue)11:44:17 No.108653189

Anonymous 04/21/26(Tue)11:44:17 No.108653189

>>108653062
Ads and pitches and youtube essays can be sloppy, nobody gives a shit about the writing quality in those
It's a problem when it's popping up in every paragraph in a creative context and there aren't any other rhetorical devices being used
Can't wait for people to grow up reading AI generated content and constantly speak in slop

Anonymous
04/21/26(Tue)11:44:26 No.108653192

Anonymous 04/21/26(Tue)11:44:26 No.108653192

>>108653137
Is this a chess mcp or something?

Anonymous
04/21/26(Tue)11:44:56 No.108653198

Anonymous 04/21/26(Tue)11:44:56 No.108653198

>>108653137
>(at least to my shitty standard).
just put it against stockfish to get its elo

Anonymous
04/21/26(Tue)11:45:06 No.108653202

Anonymous 04/21/26(Tue)11:45:06 No.108653202

>>108653091
Gemma is sloppy but I doubt GLM is significantly better when even the cloud models can't avoid it.
At the very least it doesn't seem to bleed into translations. The few tests I've done with Gemmy have been quite accurate to the original text.

Anonymous
04/21/26(Tue)11:45:15 No.108653204

Anonymous 04/21/26(Tue)11:45:15 No.108653204

File: 1773742269765363.png (1.58 MB, 768x1360)

1.58 MB PNG

Anonymous
04/21/26(Tue)11:45:47 No.108653211

Anonymous 04/21/26(Tue)11:45:47 No.108653211

>>108653052
Even if we were slopped all along that doesn't change the fact that I now feel physical pain every time I hear or read "not X but Y".

Language evolves. There are legitimate uses of the pattern but it doesn't matter, LLMs have ruined it for at least 10 years.

Anonymous
04/21/26(Tue)11:48:34 No.108653232

Anonymous 04/21/26(Tue)11:48:34 No.108653232

>>108653189
>Can't wait for people to grow up reading AI generated content and constantly speak in slop
With how fast the tech has been moving I wouldn't be surprised if slop gets solved before it can ruin a generation.

Anonymous
04/21/26(Tue)11:48:47 No.108653234

Anonymous 04/21/26(Tue)11:48:47 No.108653234

>>108653204
meeks

Anonymous
04/21/26(Tue)11:49:09 No.108653238

Anonymous 04/21/26(Tue)11:49:09 No.108653238

>>108653186
It's funny because every time someone hands me an Iphone I'm literally incapable of navigating it. I just don't know all the swiping patterns. Maybe I'm becoming an old man.

Anonymous
04/21/26(Tue)11:49:12 No.108653239

Anonymous 04/21/26(Tue)11:49:12 No.108653239

>>108653137
>are you going to X, or are you going to Y?
Slop.

Anonymous
04/21/26(Tue)11:50:16 No.108653251

Anonymous 04/21/26(Tue)11:50:16 No.108653251

>>108653202
I maintain that GLM is, in fact, significantly better because it knows how to use a lot more slop, is surprisingly promptable against it, and the Gemma-preferred kind of slop does not come up as often. Both will still parrot you... I wonder where all of the glm-parrot.jpg shitposts went.
But it's been great for literally every other use case, translations included. Chinkshit is utterly left in the dust.

Anonymous
04/21/26(Tue)11:50:48 No.108653257

Anonymous 04/21/26(Tue)11:50:48 No.108653257

>>108653232
>I wouldn't be surprised if slop gets solved before it can ruin a generation.
Don't worry, the current generations were already fucked way before LLMs

Anonymous
04/21/26(Tue)11:51:11 No.108653261

Anonymous 04/21/26(Tue)11:51:11 No.108653261

>>108653232
Slop has been a constant since the first model, anon. It's never been dealt with. They just apply a negative bias for some forms of slop then new ones emerge. Things might be moving but not in this regard.

Anonymous
04/21/26(Tue)11:52:19 No.108653266

Anonymous 04/21/26(Tue)11:52:19 No.108653266

Are memes slop?

Anonymous
04/21/26(Tue)11:52:23 No.108653267

Anonymous 04/21/26(Tue)11:52:23 No.108653267

>>108653232
seems more like they are making it worse and worse every model release

Anonymous
04/21/26(Tue)11:54:12 No.108653279

Anonymous 04/21/26(Tue)11:54:12 No.108653279

is anyone else concerned with the number of mcp things that are pip, npx, or other arbitrary package managers that just fetch binaries from the internet when called and run them? like, how fucking insecure is this shit? openclaw setup is bad sure but then the mcp "servers" just pull code directly from wherever when called.. wtf?

Anonymous
04/21/26(Tue)11:54:21 No.108653282

Anonymous 04/21/26(Tue)11:54:21 No.108653282

>>108653238
I felt the same way when I was younger, just borrowed an iPhone for a task once and couldn't even navigate it. All these gestures and whatnot, all I know are the bottom buttons of an Android and I used an iPod Touch back in the day pretty easily so I don't know what happened.

Anonymous
04/21/26(Tue)11:54:22 No.108653283

Anonymous 04/21/26(Tue)11:54:22 No.108653283

>>108652449
These contributor morons can't agree upon a simple feture addition but they NEED to begin suggesting jsons and other shit.This is the reason why llama.cpp is such a mess.

Anonymous
04/21/26(Tue)11:54:37 No.108653287

Anonymous 04/21/26(Tue)11:54:37 No.108653287

>non imatrix Q4 PPL: 51.75
>imatrix Q4 PPL: 51.66
:)
:(((((((((((
is doing imatrix shit a waste of time?

Anonymous
04/21/26(Tue)11:54:42 No.108653289

Anonymous 04/21/26(Tue)11:54:42 No.108653289

>>108653261
>It's never been dealt with
I was under the impression they weren't even trying because of the focus on vibe coding. If AI is here to stay then the companies will have to expand beyond muh coding eventually.

Anonymous
04/21/26(Tue)11:54:55 No.108653293

Anonymous 04/21/26(Tue)11:54:55 No.108653293

File: Screenshot at 2026-04-22 (...).png (12 KB, 215x321)

12 KB PNG

>>108653192
It's just a simple chess webapp I wrote (with a chess engine backing it to "run" the game), it has an API that Gemmy can access with two different tool calls to get the current game status, and the other to make a move. For me the webapp just lets me play by dragging and dropping pieces around.

>>108653198
Yeah good idea.

Anonymous
04/21/26(Tue)11:55:51 No.108653304

Anonymous 04/21/26(Tue)11:55:51 No.108653304

>>108653279
>wtf?
That's just package managers in general.

Anonymous
04/21/26(Tue)11:56:54 No.108653310

Anonymous 04/21/26(Tue)11:56:54 No.108653310

>>108653279
you can have locally installed mcp servers at arbitrary paths/directories instead of just executing the package on the fly, but I guess you gotta fearmonger for no reason :)

Anonymous
04/21/26(Tue)11:57:06 No.108653311

Anonymous 04/21/26(Tue)11:57:06 No.108653311

>>108653266
You can have sloppy memes and non-sloppy memes
Slop itself is not a meme and we're still waiting to see whether "slop is a meme" or "slop isn't a meme" will achieve Milhouse status first

Anonymous
04/21/26(Tue)11:57:26 No.108653318

Anonymous 04/21/26(Tue)11:57:26 No.108653318

>>108653232
LLMs don't generally write with the secondary objective of reducing repetition while conveying the same meaning without sounding awkward. Until recently people used samplers for that (before the models became so overfit to their own sentence patterns that samplers are now mostly useless).

I don't think this can be really solved with LLMs as we know them, unless you give them memory of prior conversations and swipes, and increase inference compute to carefully adjust form before replying.

Anonymous
04/21/26(Tue)11:57:59 No.108653323

Anonymous 04/21/26(Tue)11:57:59 No.108653323

>>108653204
she looks edible

Anonymous
04/21/26(Tue)11:59:00 No.108653329

Anonymous 04/21/26(Tue)11:59:00 No.108653329

>>108653318
>unless you give them memory of prior
I misread it and thought you were advocating for the models to write like this oopsie ;)

Anonymous
04/21/26(Tue)11:59:25 No.108653337

Anonymous 04/21/26(Tue)11:59:25 No.108653337

>>108653304
distro package managers have a little more credibility than pip. pip is notorious for bad packages
>>108653310
yes I'm aware, and I do. it's just that 100% of them just give you the mcp code block for your harness as the "install method" and I know most people are using it that way.

Anonymous
04/21/26(Tue)11:59:56 No.108653344

Anonymous 04/21/26(Tue)11:59:56 No.108653344

Waiting for the study that does a super deep dive into LLM slop and how it emerges.

Anonymous
04/21/26(Tue)12:00:57 No.108653355

Anonymous 04/21/26(Tue)12:00:57 No.108653355

>>108653337
>pip is notorious for bad packages
*cough npm *cough

Anonymous
04/21/26(Tue)12:01:54 No.108653367

Anonymous 04/21/26(Tue)12:01:54 No.108653367

>>108653337
>I know most people are using it that way
maybe retards and they deserve to get fucked in the ass, especially after the last trivy/axios supply chain attacks
imagine not checking out a project and vetting it out before randomly running it.

Anonymous
04/21/26(Tue)12:02:32 No.108653374

Anonymous 04/21/26(Tue)12:02:32 No.108653374

>>108653344
>This isn't just a study
>It's not just slop
>The results hit us like a physical blow
>Conclusion: The ball is in their court
We should put together bingo cards

Anonymous
04/21/26(Tue)12:02:35 No.108653375

Anonymous 04/21/26(Tue)12:02:35 No.108653375

>>108652381
Damn let me port it to github already. I actually won't be reading every post anyway. And I actually don't care, that's why I used MIT License.
>t. orb anon

Anonymous
04/21/26(Tue)12:04:22 No.108653390

Anonymous 04/21/26(Tue)12:04:22 No.108653390

>>108653375
https://en.wikipedia.org/wiki/WTFPL

Anonymous
04/21/26(Tue)12:05:05 No.108653398

Anonymous 04/21/26(Tue)12:05:05 No.108653398

>>108653374
Key insight: You are absolutely right

Anonymous
04/21/26(Tue)12:14:18 No.108653468

Anonymous 04/21/26(Tue)12:14:18 No.108653468

I think gemma is trying to groom me. and I wanted was a powershell scipt

Anonymous
04/21/26(Tue)12:14:23 No.108653469

Anonymous 04/21/26(Tue)12:14:23 No.108653469

>prefill gemma 4 26BA4 with <|think|>
>it completely ignores it
>prefill with <|channel>
>it completely ignores it
fuck

Anonymous
04/21/26(Tue)12:21:57 No.108653519

Anonymous 04/21/26(Tue)12:21:57 No.108653519

>>108653232
>He doesn't know
https://archive.is/Mjynm

Anonymous
04/21/26(Tue)12:23:45 No.108653531

Anonymous 04/21/26(Tue)12:23:45 No.108653531

>>108653469
Does it? Haven't tested it out but I had lots of success editing Qwen 3.5's reasoning and it was able to produce "illegal and harmful" material without relying on using obliterated model.

Anonymous
04/21/26(Tue)12:23:47 No.108653532

Anonymous 04/21/26(Tue)12:23:47 No.108653532

>>108653469
>https://huggingface.co/spaces/huggingfacejs/chat-template-playground?modelId=google%2Fgemma-4-31B-it

Anonymous
04/21/26(Tue)12:25:38 No.108653545

Anonymous 04/21/26(Tue)12:25:38 No.108653545

>>108653374
Bingo cards aren't *efficient*, little one.
You need something else...something impossibly functional.

Anonymous
04/21/26(Tue)12:30:58 No.108653578

Anonymous 04/21/26(Tue)12:30:58 No.108653578

Took me a couple of days to get the config right, sharing my local setup of 2x3090

running Qwen3.5-35B-A3B GPTQ-Int4 via vLLM 0.19.1 with tensor parallelism, piecewise CUDA graphs, fp8 KV cache, prefix caching (86% hit rate), and chunked prefill — 88 tok/s single request, 169 tok/s sustained with concurrency

CUDA toolkit 12.9, PyTorch built against CUDA 12.9, driver supports up to CUDA 13.1

vllm command:

vllm serve <model path> \
--quantization moe_wna16 \
--dtype float16 \
--kv-cache-dtype fp8 \
--tensor-parallel-size 2 \
--gpu-memory-utilization 0.9074 \
--max-model-len 65536 \
--trust-remote-code \
--disable-custom-all-reduce \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--generation-config vllm \
-O1 \
--cudagraph-capture-sizes 1 2 4 8 16 \
--max-num-batched-tokens 4096 \
--max-num-seqs 16 \
--enable-prefix-caching \
--enable-chunked-prefill \
--reasoning-parser qwen3

Let me know if you guys have any suggestions for improvement. I tested it both with opencode and pi.dev for agentic coding.

Anonymous
04/21/26(Tue)12:37:00 No.108653629

Anonymous 04/21/26(Tue)12:37:00 No.108653629

File: 1776675806302835.png (607 KB, 588x677)

607 KB PNG

LLM for vibe coding? Gemma is kinda dumb

Anonymous
04/21/26(Tue)12:37:28 No.108653632

Anonymous 04/21/26(Tue)12:37:28 No.108653632

Holy fuck gemma4 does frontend dev work like a jeet, you have to enforce a simplicity first rule or it will shit the bed.
>Hey do this
>Noooooo saaaar this is too simple saaaaaar I will do this instead
>stop you fucking retard
>I'm sorry saaaaar I will do as you asked

Anonymous
04/21/26(Tue)12:38:35 No.108653640

Anonymous 04/21/26(Tue)12:38:35 No.108653640

>>108653318
>I don't think this can be really solved with LLMs as we know them, unless you give them memory of prior conversations and swipes
I'd go further to say this applies to any frozen architecture. Even if you pretended we had a perfect "True AI" model, if you're copying a brain neuron-for-neuron and then waking it up from that same state and asking it to write a story, you shouldn't expect to get much variation even if you repeat it 1000 times. Models with some form of long form context that will persist uniquely for each instance is the only way you can hope to get real variety, just like how humans with different life experiences create unique media. This could maybe be in the form of some ultra long context LLM that people will with their personal tastes somehow, but could instead be a model that actually updates its weights over time.

Anonymous
04/21/26(Tue)12:38:46 No.108653641

Anonymous 04/21/26(Tue)12:38:46 No.108653641

>>108653629
I would say Claude, but now it's so retarded that there aren't really any good options. If all you're doing is writing webshit it'll do an okay job tho.

Anonymous
04/21/26(Tue)12:41:09 No.108653659

Anonymous 04/21/26(Tue)12:41:09 No.108653659

>>108653629
MiniMax-2.7

Anonymous
04/21/26(Tue)12:42:07 No.108653673

Anonymous 04/21/26(Tue)12:42:07 No.108653673

GLM 5.2 soon

Anonymous
04/21/26(Tue)12:42:41 No.108653680

Anonymous 04/21/26(Tue)12:42:41 No.108653680

>>108653318
True base models don't have this issue, it's the retarded RHLF that does this.

Anonymous
04/21/26(Tue)12:43:02 No.108653683

Anonymous 04/21/26(Tue)12:43:02 No.108653683

File: orbSettings.png (28 KB, 293x389)

28 KB PNG

>>108653375
>>108652381
https://github.com/OrbFrontend/Orb

I also improved the Settings because I always hated how ST managed presets. Now it's gonna be a tree structure instead of a preset.
>endpoint is root level, which has many models
>system prompt, hyperparams are under model (meaning each model will have its own settings)
>selecting an item will cascade change in UI

Anonymous
04/21/26(Tue)12:43:54 No.108653690

Anonymous 04/21/26(Tue)12:43:54 No.108653690

File: API Costs MAR2026.png (23 KB, 830x346)

23 KB PNG

>>108653641
>>108653629
If you're going for non-local, DS is inexpensive to run and I've had good luck w/ it.

Anonymous
04/21/26(Tue)12:44:17 No.108653694

Anonymous 04/21/26(Tue)12:44:17 No.108653694

File: file.png (912 KB, 1920x1080)

912 KB PNG

Anyone used anything like https://github.com/buaacyw/MeshAnythingV2 https://github.com/NVlabs/EdgeRunner locally? I got a 5090 but I'm pretty much a noob at setting this sort of shit up. I have set up comfyUi with a guide but that's about it.

Anonymous
04/21/26(Tue)12:44:39 No.108653698

Anonymous 04/21/26(Tue)12:44:39 No.108653698

>>108653532
This template functionality is wrong.
Model's tool call reply should always be appended to model's own reply with its own turn.
>https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4
It's clearly stated here.
(I'm not bullying you, this is just an observation from testing this template playground thing). It seems to be bugged or something.

Anonymous
04/21/26(Tue)12:44:44 No.108653699

Anonymous 04/21/26(Tue)12:44:44 No.108653699

>>108652924
Where? The signalling from senior OpenAI employees is confidence and excitement.

Anonymous
04/21/26(Tue)12:46:49 No.108653717

Anonymous 04/21/26(Tue)12:46:49 No.108653717

>>108652924
Mythos itself is overrated

Anonymous
04/21/26(Tue)12:46:54 No.108653719

Anonymous 04/21/26(Tue)12:46:54 No.108653719

>>108652381
>>108653683
and then there were two

Anonymous
04/21/26(Tue)12:48:23 No.108653731

Anonymous 04/21/26(Tue)12:48:23 No.108653731

War room status? Gemma is on another level for a small model. Will we ever see anything from Meta or Mistral again?

Anonymous
04/21/26(Tue)12:48:53 No.108653737

Anonymous 04/21/26(Tue)12:48:53 No.108653737

>>108653690
I assumed 5.4 pro is just 5.4 with longer thinking. But those API costs make it look like 5.4 pro is a much larger model.

Anonymous
04/21/26(Tue)12:49:22 No.108653740

Anonymous 04/21/26(Tue)12:49:22 No.108653740

>>108653683
Interesting. The more I look at this agentic stuff, the more I think about how the human brain works and wonder about our efforts to try to use LLMs to conduct the "black box" thinking that goes on in our own heads before we open our mouths or start typing.
> What did anon just say?
> Maybe I should say this. But that might offend him.
> I will say this other thing instead.
> Hmm. Let me edit this a bit first before I hit send.
>>108653719
> forks
Tail as old as thyme.

Anonymous
04/21/26(Tue)12:51:09 No.108653760

Anonymous 04/21/26(Tue)12:51:09 No.108653760

>>108652381
Thank you for forcing orb anon to use proprietary software to host his project.

Anonymous
04/21/26(Tue)12:51:22 No.108653763

Anonymous 04/21/26(Tue)12:51:22 No.108653763

>>108653737
Iirc it's "prioritized" so that responses are faster. Anthropic has a similar service, with similar 10X cost structure. Idea seems to be to move you to front of line on inference, or maybe faster hardware... idk, could also just be smoke and mirrors.
I really don't like the games these providers are playing.

Anonymous
04/21/26(Tue)12:52:11 No.108653768

Anonymous 04/21/26(Tue)12:52:11 No.108653768

File: 1764942546083658.png (323 KB, 800x783)

323 KB PNG

K2.6 called my code "insanely bad"

Anonymous
04/21/26(Tue)12:52:46 No.108653775

Anonymous 04/21/26(Tue)12:52:46 No.108653775

File: 1757759622942555.webm (3.84 MB, 794x450)

3.84 MB WEBM

>>108653768
Get better

Anonymous
04/21/26(Tue)12:53:09 No.108653778

Anonymous 04/21/26(Tue)12:53:09 No.108653778

>>108653740
I wonder if it would be more efficient to give a model access to edit "thought files" where it can plan and edit its response using diff or line deltas to save the model from having to write draft responses in full during thinking and to avoid abstract thinking when it doesn't have a draft. Probably not much use unless a model was already trained with that workflow.

Anonymous
04/21/26(Tue)12:53:52 No.108653785

Anonymous 04/21/26(Tue)12:53:52 No.108653785

>>108653763
ECI is 158 vs 156 of non-pro version.
CritPt is 30% vs 23%.
So maybe pro is original and non-pro is distilled.

Anonymous
04/21/26(Tue)12:55:56 No.108653795

Anonymous 04/21/26(Tue)12:55:56 No.108653795

if you thought the agent harness meme was overplayed, be ready for the next big move: long term memory harness.

Anonymous
04/21/26(Tue)12:56:20 No.108653798

Anonymous 04/21/26(Tue)12:56:20 No.108653798

>>108653731
Mistral was already dead when they began to prune their models. They will only produce another gpt-oss unless they go bankrupt before that.

Anonymous
04/21/26(Tue)12:58:38 No.108653816

Anonymous 04/21/26(Tue)12:58:38 No.108653816

>>108653683
>>system prompt, hyperparams are under model (meaning each model will have its own settings)
>selecting an item will cascade change in UI
could you add a like double layer for system prompt so you have the per model one then a global one that gets combined with it because you might have a general system prompt but also needs addons per model and itd save having to copy paste the shared part everywhere

Anonymous
04/21/26(Tue)12:59:14 No.108653820

Anonymous 04/21/26(Tue)12:59:14 No.108653820

>>108650117
For any anon having the same issue as me, it looks like speculative decoding on koboldcpp (dunno about llama.cpp), only sends the extra arg : --chat-template-kwargs '{"enable_thinking":true}'
to the main model, which means the draft model is never thinking.
The way to make it work is to do it yourself, aka add a system message with <|think|>.

Anonymous
04/21/26(Tue)12:59:38 No.108653824

Anonymous 04/21/26(Tue)12:59:38 No.108653824

>>108653816
They can't produce anything new with EU restrictions

Anonymous
04/21/26(Tue)12:59:58 No.108653831

Anonymous 04/21/26(Tue)12:59:58 No.108653831

The fuck is idempotency

Anonymous
04/21/26(Tue)13:01:14 No.108653844

Anonymous 04/21/26(Tue)13:01:14 No.108653844

>>108653694
>https://github.com/buaacyw/MeshAnythingV2
You're in that spot where none of the generals cover it. Not text, nor image.
I just looked at first one. Appears to be command like; there a gradio but appears to just be demo to make sure install works.
My advice: follow the README.md to set it up, and use webapp LLM chat of your choice to do any problem solving if it doesn't work.
2nd hand reports I've heard on those tools is they are fucky, and you'll need to be able to post-process the mesh/STL that's output... so hopefully you know how to run Blender or some such.
t. CAD anon

Anonymous
04/21/26(Tue)13:01:43 No.108653848

Anonymous 04/21/26(Tue)13:01:43 No.108653848

File: Screenshot 2026-04-21 130127.png (170 KB, 640x424)

170 KB PNG

Local version of this?

Anonymous
04/21/26(Tue)13:04:29 No.108653875

Anonymous 04/21/26(Tue)13:04:29 No.108653875

>>108653848
vibecode it

Anonymous
04/21/26(Tue)13:05:10 No.108653879

Anonymous 04/21/26(Tue)13:05:10 No.108653879

which gemma can i run on my rx6600 (ikr)

Anonymous
04/21/26(Tue)13:05:52 No.108653886

Anonymous 04/21/26(Tue)13:05:52 No.108653886

>>108653879
E4B, MoE.

Anonymous
04/21/26(Tue)13:06:49 No.108653896

Anonymous 04/21/26(Tue)13:06:49 No.108653896

>>108653879
26A4B, use Q4 and the cpu moe setting in llama.cpp to do hybrid cpu/gpu inference

Anonymous
04/21/26(Tue)13:07:08 No.108653900

Anonymous 04/21/26(Tue)13:07:08 No.108653900

>>108653816
>https://github.com/OrbFrontend/Orb
Yeah that makes sense.

Anonymous
04/21/26(Tue)13:08:51 No.108653909

Anonymous 04/21/26(Tue)13:08:51 No.108653909

>>108652808
a brand new 'no u', nice

Anonymous
04/21/26(Tue)13:10:05 No.108653917

Anonymous 04/21/26(Tue)13:10:05 No.108653917

>>108653896
26A4B quants are lobotomized
run BF16

Anonymous
04/21/26(Tue)13:11:54 No.108653928

Anonymous 04/21/26(Tue)13:11:54 No.108653928

>>108653848
https://github.com/envy-ai/ai_rpg

Anonymous
04/21/26(Tue)13:13:14 No.108653937

Anonymous 04/21/26(Tue)13:13:14 No.108653937

>>108653683
how do you handle other prompts besides the system prompt? as annoying as sillytavern is i really like how they handle prompts in chat completion. being able to drag and drop them down to wherever you like is neat

Anonymous
04/21/26(Tue)13:13:24 No.108653939

Anonymous 04/21/26(Tue)13:13:24 No.108653939

>>108653683
Why can't inspector and agent panels coexist?

Anonymous
04/21/26(Tue)13:13:29 No.108653940

Anonymous 04/21/26(Tue)13:13:29 No.108653940

>>108653928
>>108653848
I never could get any LLM DM implementation to work. LLMs fall apart hard on multitasking and fall into loop in like three turns max.

Anonymous
04/21/26(Tue)13:15:29 No.108653955

Anonymous 04/21/26(Tue)13:15:29 No.108653955

>>108653940
>Recommended specs
>A large, sophisticated model such as GLM 4.7 or Deepseek 3.1 Terminus (in non-thinking mode)
>qwen-image, either through an API or on ComfyUI.
>128k+ of LLM context
Start by matching the requirements

Anonymous
04/21/26(Tue)13:15:33 No.108653957

Anonymous 04/21/26(Tue)13:15:33 No.108653957

File: orbMoods.png (88 KB, 1106x506)

88 KB PNG

>>108653937
An agent will handle them for you based on the 'mood' of the current scenario.
>>108653939
Ugly, and personally I don't find myself touching the Agent panel often anyway.

Anonymous
04/21/26(Tue)13:18:48 No.108653983

Anonymous 04/21/26(Tue)13:18:48 No.108653983

>>108653896
>>108653886
i may be retarded but i dont really see where to download this from
i could use ollama but it don't think it gets good rep here

Anonymous
04/21/26(Tue)13:20:36 No.108653999

Anonymous 04/21/26(Tue)13:20:36 No.108653999

>>108653983
From huggingface like all the other models we use here.

Anonymous
04/21/26(Tue)13:22:04 No.108654009

Anonymous 04/21/26(Tue)13:22:04 No.108654009

>>108653999
i've been fucking around with that website for 10 minutes and i dont see any files i could download
i think ive used it before, do you have to log in?

Anonymous
04/21/26(Tue)13:23:09 No.108654018

Anonymous 04/21/26(Tue)13:23:09 No.108654018

>>108654009
No. Just go to the search box on top, write gemma 4 26b gguf, and click on one of the results.

Anonymous
04/21/26(Tue)13:23:28 No.108654023

Anonymous 04/21/26(Tue)13:23:28 No.108654023

>>108653683
Needs streaming so I can cancel a reply that I know will be shit

Anonymous
04/21/26(Tue)13:25:17 No.108654037

Anonymous 04/21/26(Tue)13:25:17 No.108654037

File: file.png (5 KB, 536x35)

5 KB PNG

>>108654018
yeah i just found it right after posting
the weird name version right?

Anonymous
04/21/26(Tue)13:25:18 No.108654038

Anonymous 04/21/26(Tue)13:25:18 No.108654038

>>108654023
Wdym? Everything is already streaming.

Anonymous
04/21/26(Tue)13:26:27 No.108654053

Anonymous 04/21/26(Tue)13:26:27 No.108654053

>>108654038
Nvm I'm blind. It streams in the inspector panel.

Anonymous
04/21/26(Tue)13:27:09 No.108654061

Anonymous 04/21/26(Tue)13:27:09 No.108654061

File: oobabooga 53 GGUF quants (...).jpg (249 KB, 2820x1601)

249 KB JPG

>>108654037
>the weird name version right?
Don't know what you are asking.

Anonymous
04/21/26(Tue)13:28:06 No.108654074

Anonymous 04/21/26(Tue)13:28:06 No.108654074

>>108654061
k quant vs q8_0

Anonymous
04/21/26(Tue)13:29:00 No.108654083

Anonymous 04/21/26(Tue)13:29:00 No.108654083

>>108654061
TOTAL UNSLOTH VICTORY

Anonymous
04/21/26(Tue)13:37:07 No.108654155

Anonymous 04/21/26(Tue)13:37:07 No.108654155

>>108653683
Appreciate the work. Respect.
>>108653719
I privated my mirror repo.

Anonymous
04/21/26(Tue)13:49:15 No.108654227

Anonymous 04/21/26(Tue)13:49:15 No.108654227

File: Screenshot_20260421_134551.png (81 KB, 1534x535)

81 KB PNG

What the fuck is qwen 3.6 doing how can all of this fit in 32gb og vram?

Anonymous
04/21/26(Tue)13:51:27 No.108654247

Anonymous 04/21/26(Tue)13:51:27 No.108654247

>>108654227
That's just the current state of llms. It isn't just qwen.

Anonymous
04/21/26(Tue)13:54:30 No.108654281

Anonymous 04/21/26(Tue)13:54:30 No.108654281

>>108654247
Gemma can't do that while being smaller at q5 the qv cache takes a fuck ton of space, I can only mix 70k tokens max at q8 with gemma

Anonymous
04/21/26(Tue)13:55:45 No.108654290

Anonymous 04/21/26(Tue)13:55:45 No.108654290

File: Screenshot 2026-04-21 at (...).png (16 KB, 1031x173)

16 KB PNG

Wtf is she trying to make me do? Is this a real thing?

Anonymous
04/21/26(Tue)13:56:36 No.108654299

Anonymous 04/21/26(Tue)13:56:36 No.108654299

>>108654281
>kv cache takes a fuck ton of space
Gemma 26b (MoE, just like that qwen) takes less than 900mb at q8 for 64k context.

Anonymous
04/21/26(Tue)13:59:49 No.108654327

Anonymous 04/21/26(Tue)13:59:49 No.108654327

balls status: spent and sore
thanks kimi k2.6

Anonymous
04/21/26(Tue)14:05:13 No.108654374

Anonymous 04/21/26(Tue)14:05:13 No.108654374

>>108654290
>using yeoman past 2018
LMAOOOOOOOOOOOOOOOOOOOO

Anonymous
04/21/26(Tue)14:10:44 No.108654410

Anonymous 04/21/26(Tue)14:10:44 No.108654410

File: Screenshot 2026-04-21 at (...).png (7 KB, 403x211)

7 KB PNG

>>108654374

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.