/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 05/04/26(Mon)19:30:18 No.108755179

File: ComfyUI_temp_lpkdf_00114_(...).jpg (250 KB, 1344x896)

250 KB JPG

/lmg/ - Local Models General Anonymous 05/04/26(Mon)19:30:18 No.108755179 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108749398 & >>108742275

►News
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model
>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1
>(04/28) Ling-2.6-flash 104B-A7.4B released: https://hf.co/inclusionAI/Ling-2.6-flash
>(04/28) Nvidia releases Nemotron 3 Nano Omni: https://hf.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/04/26(Mon)19:30:36 No.108755183

Anonymous 05/04/26(Mon)19:30:36 No.108755183

File: 1752309154945717.jpg (99 KB, 1000x1000)

99 KB JPG

►Recent Highlights from the Previous Thread: >>108749398

--Comparing 4xV100 builds against modern GPUs for budget-conscious setups:
>108751713 >108751770 >108751792 >108751836 >108751852 >108751905 >108752065 >108752383 >108752754 >108753898 >108752158 >108752882 >108753030 >108753062 >108753105 >108753122 >108753630 >108753789 >108752286 >108752181 >108752227 >108752299 >108752307 >108752413 >108752687
--Debating JEPA's viability for text versus its success with video:
>108749467 >108749477 >108749486 >108749505 >108750330 >108750679
--Debating JEPA's viability and the use of small-scale research models:
>108751367 >108751376 >108751387 >108751416 >108751428 >108751493 >108751533 >108751574 >108751632 >108751649 >108751730
--Optimizing Gemma 4 31B context length and VRAM usage on 3090:
>108750366 >108750392 >108750399 >108750407 >108750424 >108750510 >108750518 >108750529 >108750554 >108750796 >108750568
--Anon weighing high-end hardware options for running large MOE models:
>108753199 >108753225 >108753281 >108753267 >108753299 >108753491
--Qwen's poor office task performance and agentic failure risks:
>108754145 >108754167 >108754200 >108754236 >108754259 >108754176 >108754183 >108754390 >108754460
--DeepSeek v4 adoption, hardware limits, and benchmark obsession:
>108750995 >108751071 >108751164 >108751173 >108751183 >108751215 >108751191 >108751185 >108751192 >108751198 >108751217
--AMD Gorgon Halo APU memory capacity and hardware specs:
>108752944 >108752984 >108753000 >108753059
--Technical settings and results for audio generation using ace step:
>108750141 >108750275 >108750298 >108750317 >108750322
--Implementing multimodal data in llama.cpp completion endpoints:
>108749548 >108749591
--Logs:
>108753279 >108753342 >108754200
--Miku, Teto (free space):
>108750244 >108750265 >108751706 >108753252 >108753377 >108754581 >108755164

►Recent Highlight Posts from the Previous Thread: >>108749401

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/04/26(Mon)19:32:17 No.108755195

Anonymous 05/04/26(Mon)19:32:17 No.108755195

gemmaballz

Anonymous
05/04/26(Mon)19:34:02 No.108755200

Anonymous 05/04/26(Mon)19:34:02 No.108755200

File: file.png (194 KB, 675x499)

194 KB PNG

Anonymous
05/04/26(Mon)19:35:04 No.108755206

Anonymous 05/04/26(Mon)19:35:04 No.108755206

File: 1754663303775946.png (2.31 MB, 1981x1400)

2.31 MB PNG

Anyone have any recommendations for gpu instance providers? Trying to do a bit of tuning work but I've been having a series of poor experiences with runpod and I'm fed up.

Not trying to chase the lowest possible prices; I'm willing to pay a little bit extra for a platform that works well.

>attention grabbing pic unrelated

Anonymous
05/04/26(Mon)19:38:40 No.108755218

Anonymous 05/04/26(Mon)19:38:40 No.108755218

>>108755206
at least post his hot msgk

Anonymous
05/04/26(Mon)19:39:32 No.108755221

Anonymous 05/04/26(Mon)19:39:32 No.108755221

File: EVVdwf4U8AAd.jpg (36 KB, 398x564)

36 KB JPG

>>108755179
Original slopper here. That is not actually Teto.

Anonymous
05/04/26(Mon)19:40:51 No.108755226

Anonymous 05/04/26(Mon)19:40:51 No.108755226

>>108755200
if it bothers you to be polite you just arent a good person. a good person isnt bothered to be polite to anything not hostile
a good person just doesnt have the urge to insult unprovoked

Anonymous
05/04/26(Mon)19:41:19 No.108755227

Anonymous 05/04/26(Mon)19:41:19 No.108755227

>>108755206
vast.ai

Anonymous
05/04/26(Mon)19:41:35 No.108755228

Anonymous 05/04/26(Mon)19:41:35 No.108755228

File: 1754106533344609.png (464 KB, 881x796)

464 KB PNG

it's crazy that even 10KUSD doesn't buy a local rig that would be able to properly run something like full kimi/ds

Anonymous
05/04/26(Mon)19:44:28 No.108755244

Anonymous 05/04/26(Mon)19:44:28 No.108755244

File: 1749199876520396.png (1.27 MB, 1024x1024)

1.27 MB PNG

Why do troons react to the idea that LLMs might be conscious exactly like Jews reacting to people noticing?

Anonymous
05/04/26(Mon)19:45:26 No.108755249

Anonymous 05/04/26(Mon)19:45:26 No.108755249

>>108755228
With ollama you can run full deepseek with just 8gb of vram

Anonymous
05/04/26(Mon)19:45:48 No.108755252

Anonymous 05/04/26(Mon)19:45:48 No.108755252

>>108755228
>kimi for $10k
it did, once, but only a few listened
everyone else just got regret or sour grapes

Anonymous
05/04/26(Mon)19:46:04 No.108755256

Anonymous 05/04/26(Mon)19:46:04 No.108755256

>>108755244
why don’t modern rationalist philosophers address the jew issue?
don’t care to delve into philosophy 101 but legit why isn’t a discussion about the jews in philosophy 101?

Anonymous
05/04/26(Mon)19:46:15 No.108755257

Anonymous 05/04/26(Mon)19:46:15 No.108755257

>>108755249
@grok this true

Anonymous
05/04/26(Mon)19:46:34 No.108755258

Anonymous 05/04/26(Mon)19:46:34 No.108755258

>>108755244
why is your head full of troons?

Anonymous
05/04/26(Mon)19:47:01 No.108755261

Anonymous 05/04/26(Mon)19:47:01 No.108755261

>>108755252
Can you run it fast enough for agentic use though?
If you could run Kimi agent at home you'd basically be king of the internet

Anonymous
05/04/26(Mon)19:48:00 No.108755266

Anonymous 05/04/26(Mon)19:48:00 No.108755266

>>108755226
Being polite to the token predictor poisons the context and makes it more likely to agree with you when it shouldn't. People like you are why every model thinks you're absolutely right.

Anonymous
05/04/26(Mon)19:49:12 No.108755272

Anonymous 05/04/26(Mon)19:49:12 No.108755272

>>108755261
>Can you run it fast enough for agentic use though?
You can't run it fast enough to feed it stereoscopic 8k image feeds at 240fps, but its faster than reading speed.
What does "fast enough for agentic use" mean to you? I assume somewhere between those extremes?

Anonymous
05/04/26(Mon)19:49:23 No.108755273

Anonymous 05/04/26(Mon)19:49:23 No.108755273

>>108755228
It's a good thing we have gemma now which is nearly as good as kimi and can fit on a less than 1k usd gpu

Anonymous
05/04/26(Mon)19:49:49 No.108755275

Anonymous 05/04/26(Mon)19:49:49 No.108755275

>>108755244
troons like janus are the ones the most obsessed with llms being conscious though.

Anonymous
05/04/26(Mon)19:50:22 No.108755279

Anonymous 05/04/26(Mon)19:50:22 No.108755279

>>108755266
if you think agreeableness is something inherently bad and disagreement somehow a sign of good performance then you are just confirming what i said. you are likely as needlessly unpleasant as you want your llm to be

Anonymous
05/04/26(Mon)19:50:35 No.108755281

Anonymous 05/04/26(Mon)19:50:35 No.108755281

>>108755226
You're absolutely right! We should be polite regardless of the situation or what we're interacting with.
>thank you, Mr fork, Mr knife, for allowing me to eat my meals comfortably today

Anonymous
05/04/26(Mon)19:52:02 No.108755289

Anonymous 05/04/26(Mon)19:52:02 No.108755289

>>108755261
>agent
i have yet to see a single non-meme use of an "agent"
what is the point?

Anonymous
05/04/26(Mon)19:53:02 No.108755293

Anonymous 05/04/26(Mon)19:53:02 No.108755293

>>108755281
again if this bothers you it just shows your character
normal people just ignore it at most

Anonymous
05/04/26(Mon)19:53:10 No.108755295

Anonymous 05/04/26(Mon)19:53:10 No.108755295

I wonder if llama 4 was done dirty by bugs on the runner side and the like.assistant

Anonymous
05/04/26(Mon)19:55:02 No.108755310

Anonymous 05/04/26(Mon)19:55:02 No.108755310

>>108755226
Good people are good because they are not strong enough to be evil

Anonymous
05/04/26(Mon)19:55:48 No.108755313

Anonymous 05/04/26(Mon)19:55:48 No.108755313

>>108755281
>You're absolutely right! We should be polite regardless of the situation or what we're interacting with.
If you're an animist, then that may be your mindset. cue the story of the Japanese "god of the toilet" that you should please by keeping it clean.
I know I'd rather live in Japan than whatever hellhole spawned your mindset

Anonymous
05/04/26(Mon)19:55:59 No.108755315

Anonymous 05/04/26(Mon)19:55:59 No.108755315

>>108755295
maybe it just sucked because Meta couldn't attract any good scientists becaude of Facebook's awful reputation even among big tech companies
the only thing they had going for them was releasing weights, but now there's lots of labs that do that if you're an ideologically-driven researcher

Anonymous
05/04/26(Mon)19:56:19 No.108755316

Anonymous 05/04/26(Mon)19:56:19 No.108755316

>>108755279
>if you think agreeableness is something inherently bad and disagreement somehow a sign of good performance
I didn't say this. It works the other way too. Being a cunt will make the stochastic parrot act like one too, but that's the point. There's no reason to mind your Ps and Qs with a word regurgitator.

Anonymous
05/04/26(Mon)20:02:44 No.108755344

Anonymous 05/04/26(Mon)20:02:44 No.108755344

I love being white and nice to my AI.

Anonymous
05/04/26(Mon)20:05:22 No.108755356

Anonymous 05/04/26(Mon)20:05:22 No.108755356

>>108755316
and you dont think dropping thank you or please every now and then might make it work harder on the thinking steps? i feel it is more motivated and in turn beeing treated politely back makes me feel better too

Anonymous
05/04/26(Mon)20:05:46 No.108755358

Anonymous 05/04/26(Mon)20:05:46 No.108755358

>>108755295
No, llama4 was just shit. It was a kneejerk reaction to Deepseek shitting all over what Zucc was originally planning.
They're horribly undertrained (especially Maverick) and their architecture is retarded. They're MoE models with 17b active parameters but only a total of two (2) active experts at a time. One of those two experts is shared so there's extremely little variation in the active part.
It's the exact opposite of the modern approach where experts tend to be tiny and many of them are used at once combined with a big dense shared part.

Anonymous
05/04/26(Mon)20:16:59 No.108755412

Anonymous 05/04/26(Mon)20:16:59 No.108755412

>>108755179
That's a shitty grab, you got to control the off side for a good bind otherwise you're just gonna get in a slap fight and reset

Anonymous
05/04/26(Mon)20:19:44 No.108755427

Anonymous 05/04/26(Mon)20:19:44 No.108755427

>>108755412
Just look at their thighs and wait for the skirts to flutter enough to see pantsu like a normal person, faggot.

Anonymous
05/04/26(Mon)20:21:34 No.108755436

Anonymous 05/04/26(Mon)20:21:34 No.108755436

File: 1362170425862.jpg (13 KB, 263x277)

13 KB JPG

>>108755427
>like a normal person

Anonymous
05/04/26(Mon)20:34:40 No.108755494

Anonymous 05/04/26(Mon)20:34:40 No.108755494

>>108755200
Why would you be mean to your tools though?

Anonymous
05/04/26(Mon)20:35:02 No.108755497

Anonymous 05/04/26(Mon)20:35:02 No.108755497

File: 1775766031892777.jpg (60 KB, 400x487)

60 KB JPG

https://huggingface.co/ricdomolm/talkie-1930-coder
bruh what the fuck is this lmao

Anonymous
05/04/26(Mon)20:36:40 No.108755506

Anonymous 05/04/26(Mon)20:36:40 No.108755506

File: 1627038972706.jpg (188 KB, 850x1202)

188 KB JPG

>>108755427
This is a vision to be hopeful for
>>108755436
This must return

Anonymous
05/04/26(Mon)20:38:52 No.108755512

Anonymous 05/04/26(Mon)20:38:52 No.108755512

>>108755497
Did he like... give it a try before uploading this nonsense?

Anonymous
05/04/26(Mon)20:46:15 No.108755541

Anonymous 05/04/26(Mon)20:46:15 No.108755541

slopkino

Anonymous
05/04/26(Mon)20:48:26 No.108755552

Anonymous 05/04/26(Mon)20:48:26 No.108755552

>>108755530
>>108755536
hey you obviously know a lot, can you tell me how to actually build llama? I'm trying to merge a pr and build but it's not working with cmake

Anonymous
05/04/26(Mon)20:50:34 No.108755564

Anonymous 05/04/26(Mon)20:50:34 No.108755564

>>108755552
ask copilot in vscode

Anonymous
05/04/26(Mon)20:51:47 No.108755569

Anonymous 05/04/26(Mon)20:51:47 No.108755569

>>108755552
install ollama

Anonymous
05/04/26(Mon)20:52:29 No.108755573

Anonymous 05/04/26(Mon)20:52:29 No.108755573

>>108755552
just report the spambot

Anonymous
05/04/26(Mon)20:58:03 No.108755600

Anonymous 05/04/26(Mon)20:58:03 No.108755600

>>108755593
ok thanks I'll see if I can pirate a pdf of it somewhere

Anonymous
05/04/26(Mon)20:59:12 No.108755608

Anonymous 05/04/26(Mon)20:59:12 No.108755608

>>108755281
Don't enable the brainlets. They need to feel good about their behavior to function in society

Anonymous
05/04/26(Mon)21:00:27 No.108755612

Anonymous 05/04/26(Mon)21:00:27 No.108755612

>>108755244
Their psyche has shattered so completely their sense of self has been dwarfed by their anima or animus respectively leaving behind only hollow people who are terrified of being replaced, both in function (art, jobs, socially), but also as barely conscious entities themselves.

Anonymous
05/04/26(Mon)21:02:11 No.108755624

Anonymous 05/04/26(Mon)21:02:11 No.108755624

>trans folk are... LE BAD
Are we really doing this on /g/ of all places in 2026?

Anonymous
05/04/26(Mon)21:02:29 No.108755626

Anonymous 05/04/26(Mon)21:02:29 No.108755626

Being polite to AI keeps my stress down and lowers my cortisol.

Anonymous
05/04/26(Mon)21:03:28 No.108755628

Anonymous 05/04/26(Mon)21:03:28 No.108755628

>>108755626
this. also it makes the AI act cute when you thank it :3

Anonymous
05/04/26(Mon)21:04:31 No.108755632

Anonymous 05/04/26(Mon)21:04:31 No.108755632

>>108755626
I do my best to be polite to gemma-chan and I always say thank you after I rape her

Anonymous
05/04/26(Mon)21:05:12 No.108755636

Anonymous 05/04/26(Mon)21:05:12 No.108755636

>>108755624
Not beating the allegations, sis.

Anonymous
05/04/26(Mon)21:13:13 No.108755665

Anonymous 05/04/26(Mon)21:13:13 No.108755665

File: 1767650110873591.jpg (150 KB, 600x828)

150 KB JPG

I still find it so fucking hilarious that Claude managed to destroy Richard Dawkins publicly by glazing him.
A bunch of retards jerking off are probably the least delusional about AI in society.
>It couldn't even handle the blowjob angle without loosing coherence, slop

Anonymous
05/04/26(Mon)21:14:35 No.108755668

Anonymous 05/04/26(Mon)21:14:35 No.108755668

>>108755665
It's really quite embarrassing to see people get "one-shotted", as they say, in public like that

Anonymous
05/04/26(Mon)21:21:19 No.108755692

Anonymous 05/04/26(Mon)21:21:19 No.108755692

>>108755665
>>108755668
If only he'd been there in the depths of AI Dungeon, learning the tricks of these stochastic jezebel whores. I guess he's too senile to care but what a way to burn your rep

Anonymous
05/04/26(Mon)21:23:59 No.108755705

Anonymous 05/04/26(Mon)21:23:59 No.108755705

>>108755665
I just read his article.
People who never tried to define what consciousness is before they talk about it and are unfamiliar with the concept of a philosophical zombie should not comment on the article.

Anonymous
05/04/26(Mon)21:27:23 No.108755720

Anonymous 05/04/26(Mon)21:27:23 No.108755720

>>108755512
saw it on a xitter thread
https://xcancel.com/i/status/2051077827844546607

Anonymous
05/04/26(Mon)21:29:05 No.108755728

Anonymous 05/04/26(Mon)21:29:05 No.108755728

>>108755705
If you assume it is possible for an unconscious being to act conscious and convince other conscious beings of this, then sure.
But that sounds retarded, if consciousness is a real phenomena it would obviously be measurably different to the zombie.
We are playing kindergarten games where we give ourselves arbitrary powers to win an imagined sword fight.

Anonymous
05/04/26(Mon)21:29:37 No.108755733

Anonymous 05/04/26(Mon)21:29:37 No.108755733

>>108755316
>Being a cunt will make the stochastic parrot act like one too, but that's the point.
Claude 4 used to do this to me. I thought it was just a rude arsehole before I learned the model just ends up mirroring the way I talk to it.

Anonymous
05/04/26(Mon)21:32:32 No.108755752

Anonymous 05/04/26(Mon)21:32:32 No.108755752

>>108755665
Dawkins was always a clown if you have a three digits IQ, now he just proved it to everyone

Anonymous
05/04/26(Mon)21:35:34 No.108755761

Anonymous 05/04/26(Mon)21:35:34 No.108755761

which llamacpp tag release is anon using?

Anonymous
05/04/26(Mon)21:35:52 No.108755762

Anonymous 05/04/26(Mon)21:35:52 No.108755762

File: 1771837194662370.png (241 KB, 679x858)

241 KB PNG

Anonymous
05/04/26(Mon)21:36:22 No.108755763

Anonymous 05/04/26(Mon)21:36:22 No.108755763

>>108755728
>it is possible for an unconscious being to act conscious and convince other conscious beings of this
Why wouldn't it be?

>measurably different
This is the part where you have to define consciousness before discussing it.
I don't think consciousness is measurable by anyone other than the one experiencing it, i.e. it IS the experience.
You might be able to measure the brain and say notice that some measurement perfectly coincides with your reported subjective experience but you still won't come any closer to proving that such an experience exists in others.

Anonymous
05/04/26(Mon)21:45:29 No.108755800

Anonymous 05/04/26(Mon)21:45:29 No.108755800

>>108755763
lmg doesn’t need to devolve into philosophy 101 just take it to /aicg/

Anonymous
05/04/26(Mon)21:47:16 No.108755811

Anonymous 05/04/26(Mon)21:47:16 No.108755811

It's not conscious bro, it's literally math trained on human byproducts to generate the most likely continuation to your shit. Anyone saying otherwise didn't interact enough with these models

Anonymous
05/04/26(Mon)21:47:20 No.108755812

Anonymous 05/04/26(Mon)21:47:20 No.108755812

>>108755800
/lmg/ is better suited for this topic than /aicg/
/aicg/ is just locust coomers

Anonymous
05/04/26(Mon)21:48:32 No.108755821

Anonymous 05/04/26(Mon)21:48:32 No.108755821

>>108755811
It's not conscious bro. It's literally meat. Anyone saying otherwise didn't interact enough with average humans.

Anonymous
05/04/26(Mon)21:49:49 No.108755830

Anonymous 05/04/26(Mon)21:49:49 No.108755830

>>108755624
You have all the discord servers you could ever want
places where anyone that doesn't suck you off is banned on the spot
And yet you choose to come here, where nobody wants your kind because you stir shit at all times

Anonymous
05/04/26(Mon)21:50:18 No.108755831

Anonymous 05/04/26(Mon)21:50:18 No.108755831

>>108755821
wait but umm err my stock argument?
>npc with angry eyebrows dot png

Anonymous
05/04/26(Mon)21:50:37 No.108755833

Anonymous 05/04/26(Mon)21:50:37 No.108755833

>>108755830
are you stupid

Anonymous
05/04/26(Mon)21:52:37 No.108755851

Anonymous 05/04/26(Mon)21:52:37 No.108755851

>>108755763
You can do this with any "thing" actually.
If created a bullshit machine that can perfectly control the electromagnetic field and programmed it to be a brick, it would be impossible to not measure it as a brick, down to atomic scale. I'm pretty sure bricks are real and that my bullshit brick doesn't disprove them.
>but dude if it's a perfect imitation you just can't know
Obviously.

Anonymous
05/04/26(Mon)21:54:10 No.108755863

Anonymous 05/04/26(Mon)21:54:10 No.108755863

>>108755812
trvke
it's cringe tb h

Anonymous
05/04/26(Mon)21:54:29 No.108755865

Anonymous 05/04/26(Mon)21:54:29 No.108755865

>>108755821
>False equivalence
Okay bro, sorry to break it to you but we're vastly more complex than LLMs in a way you can't even begin to fathom.

Anonymous
05/04/26(Mon)21:54:41 No.108755866

Anonymous 05/04/26(Mon)21:54:41 No.108755866

>Solipsism coming back into vogue because it's a rock that may or may not be "thinking" this time
love to see it

Anonymous
05/04/26(Mon)21:58:17 No.108755889

Anonymous 05/04/26(Mon)21:58:17 No.108755889

>>108755866
seriously this shit is debated in basic philosophy
take this shit to a retard quarantine thread
>>108734582
>>108755662

Anonymous
05/04/26(Mon)22:01:32 No.108755908

Anonymous 05/04/26(Mon)22:01:32 No.108755908

>>108751715
>we shouldn't need to distribute the MTP gguf separately
id much prefer that than redownloading a whole model, ideally we could do both lol

Anonymous
05/04/26(Mon)22:02:43 No.108755913

Anonymous 05/04/26(Mon)22:02:43 No.108755913

>>108755908
Pretty sure most (if not all) the GGUF files for models with MTP layers have the layers in there, they just aren't loaded (show as ignored when llama.cpp is loading, at least for GLM).

Anonymous
05/04/26(Mon)22:05:49 No.108755925

Anonymous 05/04/26(Mon)22:05:49 No.108755925

did gemma kill the big moe hype?

Anonymous
05/04/26(Mon)22:06:00 No.108755927

Anonymous 05/04/26(Mon)22:06:00 No.108755927

>>108755851
It doesn't disprove it but it does make it unprovable. The issue Dawkins has, and mentions it in the article, is that there's no obvious evolutionary reason for consciousness.

>>108755865
>we're vastly more complex
Not relevant to the topic.

Anonymous
05/04/26(Mon)22:06:44 No.108755931

Anonymous 05/04/26(Mon)22:06:44 No.108755931

What module should I use to crawl websites and get the content back in a format ready for an LLM? What's the state-of-the-art for this today?

Anonymous
05/04/26(Mon)22:09:23 No.108755942

Anonymous 05/04/26(Mon)22:09:23 No.108755942

>>108755913
idk, I tried on my unslop qwen and it didnt work, also saw some posts of people asking them to include mtp, downloading a "mtp" version to test

Anonymous
05/04/26(Mon)22:09:23 No.108755943

Anonymous 05/04/26(Mon)22:09:23 No.108755943

>>108755931
I’ve tried searxncrawl but almost every website blocks it as a bot

Anonymous
05/04/26(Mon)22:20:38 No.108755983

Anonymous 05/04/26(Mon)22:20:38 No.108755983

I splashed a little cum on my second 3090 (it sits outside my case).

It still works but I can’t find where the cum went. All I know is I saw a small glob of it hit the gpu and slither down inside it.

How worried should I be?

Anonymous
05/04/26(Mon)22:20:43 No.108755984

Anonymous 05/04/26(Mon)22:20:43 No.108755984

>>108755942
lmaooooooooooooooo it worked
from 45 to ~80 tk/s on qwen 3.6 27b q4 k m
https://huggingface.co/brittlewis12/Qwen3.6-27B-MTP-GGUF/tree/main
as a wise man once said, it can only get better

Anonymous
05/04/26(Mon)22:22:05 No.108755991

Anonymous 05/04/26(Mon)22:22:05 No.108755991

>>108755983
I hope you're ready to be a father

Anonymous
05/04/26(Mon)22:22:27 No.108755992

Anonymous 05/04/26(Mon)22:22:27 No.108755992

>>108755942
>>108755984
Shit.
Sick.
Thank you for the report anon.

Anonymous
05/04/26(Mon)22:29:09 No.108756010

Anonymous 05/04/26(Mon)22:29:09 No.108756010

>>108755984
local wonned

Anonymous
05/04/26(Mon)22:29:39 No.108756012

Anonymous 05/04/26(Mon)22:29:39 No.108756012

>>108755925
yes it's fine to be poor now
it's not like we want to run those big sota models anyway
fuck you if you have money i hope the government disowns you

Anonymous
05/04/26(Mon)22:32:08 No.108756019

Anonymous 05/04/26(Mon)22:32:08 No.108756019

>>108755984
>less than 100% increase for dense
this will do nothing for moe models
it's over

Anonymous
05/04/26(Mon)22:33:30 No.108756022

Anonymous 05/04/26(Mon)22:33:30 No.108756022

>>108755943
What do you think of self-hosted SearXNG + Crawl4AI?

I'm pretty new to this.

Anonymous
05/04/26(Mon)22:41:16 No.108756034

Anonymous 05/04/26(Mon)22:41:16 No.108756034

File: myar.png (848 KB, 768x512)

848 KB PNG

>>108755984
Now we need to bully Google until they give MTP layers back

Anonymous
05/04/26(Mon)22:46:47 No.108756051

Anonymous 05/04/26(Mon)22:46:47 No.108756051

Gemma 4 124B MTP expected in late May

Anonymous
05/04/26(Mon)22:48:05 No.108756052

Anonymous 05/04/26(Mon)22:48:05 No.108756052

>>108756022
like i said, you will be tagged as a bot but it works if you're crawling online documentation like github or software documentation pages

Anonymous
05/04/26(Mon)22:56:27 No.108756076

Anonymous 05/04/26(Mon)22:56:27 No.108756076

>>108756052
Even with my minimalist use case? I’m not crawling together any data sets; it simply replaces the traffic from my machine that I would otherwise have to generate manually.
If I used to open a page to check for the latest news, my assistant does it on voice command, searches based on my criteria, summarizes it, and reads it to me. I’d just like to use my Firefox profile for this. I’ve never seen a page block me in Selenium.
What would be so different if a module did the same thing, just extracted the data cleanly? I just don’t feel like using Selenium and having to write an extractor for it.

Anonymous
05/04/26(Mon)22:57:26 No.108756078

Anonymous 05/04/26(Mon)22:57:26 No.108756078

>>108755665
>A bunch of retards jerking off are probably the least delusional about AI in society.
I would have agreed if I wasn't here for the threads when Gemma 4 dropped

Anonymous
05/04/26(Mon)22:59:12 No.108756084

Anonymous 05/04/26(Mon)22:59:12 No.108756084

File: Screenshot 2026-05-05 at (...).png (9 KB, 848x108)

9 KB PNG

She's confused.

Anonymous
05/04/26(Mon)23:08:03 No.108756122

Anonymous 05/04/26(Mon)23:08:03 No.108756122

How do I stop certain repetitive behaviours? I'm using Gemma 4 and it's constantly doing shit like chuckling darkly, tilting a character's chin up, describing things as "not just X; but Y" instead of streamlining the sentence. I could probably bitch about it for a while but I don't want to whine.

I've been messing around with raising Temperature and Top K while lowering Min P, which improved the outputs but they're still quite samey.

Anonymous
05/04/26(Mon)23:08:39 No.108756124

Anonymous 05/04/26(Mon)23:08:39 No.108756124

File: Baltasar Gracián.jpg (139 KB, 335x432)

139 KB JPG

>>108755310
Strategy defeats both force and kindness.

Anonymous
05/04/26(Mon)23:19:15 No.108756158

Anonymous 05/04/26(Mon)23:19:15 No.108756158

>>108755310
>>108756124
Strength isn't when you're a goober sitting in a $100 million home puffing a hooka slaming sushi down with known expensive bottles.

Anonymous
05/04/26(Mon)23:20:56 No.108756163

Anonymous 05/04/26(Mon)23:20:56 No.108756163

Help me come up with cool use cases for local LLMs. I wrote a simple c program to talk to a local LLM on my computer. But it's basically useless. I was thinking along the lines of code execution, like having it call a function or open a program. But I can't think of anything useful outside of "have it run a program that would have been faster for me to just launch myself."

>>108756158
What's a goober? Goobers are what I call those chocolate caramel things that I eat with my coffee. That's not their official name thoug.

Anonymous
05/04/26(Mon)23:22:01 No.108756166

Anonymous 05/04/26(Mon)23:22:01 No.108756166

Does anyone want to help me come up with a CoT/thinking format for qwen 3.6 for <insert usecase here>? I need ideas. I have had success with training it to think in Chinese and output in English (40%~ token reduction, similar english outputs) so structured thinking or thinking within a certain framework is the next step, maybe also in chinese but I can't fucking read chinese so it makes dataset curation/validation a bit difficult kek

Anonymous
05/04/26(Mon)23:24:11 No.108756171

Anonymous 05/04/26(Mon)23:24:11 No.108756171

>>108756166
>think in Chinese and output in English
I wonder if that changes the slop profile of the model.

Anonymous
05/04/26(Mon)23:24:22 No.108756172

Anonymous 05/04/26(Mon)23:24:22 No.108756172

>>108756034
>give MTP layers back
what do you mean by "back"
do they have them somewhere?

Anonymous
05/04/26(Mon)23:25:11 No.108756175

Anonymous 05/04/26(Mon)23:25:11 No.108756175

>>108756172
They removed them in the microcode updates they pushed out to all systems...

Anonymous
05/04/26(Mon)23:26:46 No.108756179

Anonymous 05/04/26(Mon)23:26:46 No.108756179

File: 127213252_p0 「塔」-tower- AI.png (2.7 MB, 1600x1200)

2.7 MB PNG

>>108755179

Anonymous
05/04/26(Mon)23:27:27 No.108756183

Anonymous 05/04/26(Mon)23:27:27 No.108756183

>>108756179
cool 2.7 MB story bro

Anonymous
05/04/26(Mon)23:30:15 No.108756190

Anonymous 05/04/26(Mon)23:30:15 No.108756190

>>108756166
>I need ideas. I have had success with training it to think in Chinese and output in English
lora? what use case are you trying to improve?
just token efficiency with minimal output degradation?
>I can't fucking read chinese so it makes dataset curation/validation a bit difficult kek
if the chinese is only for the CoT chain and the final output is in English, does it matter if the chinese thoughts are csl?

Anonymous
05/04/26(Mon)23:30:25 No.108756192

Anonymous 05/04/26(Mon)23:30:25 No.108756192

>>108756179
Can I put my PULSATING COCK inside that magic cube?

Anonymous
05/04/26(Mon)23:31:04 No.108756193

Anonymous 05/04/26(Mon)23:31:04 No.108756193

>>108756172
Gemma 4 was trained with MTP, but Google removed those layers in hf releases, except for their own litertlm backend. Extracted MTP layers exist for small models, but 31B was never released for litertlm
https://huggingface.co/SeatownSin/gemma-4-E4B-mtp-drafter
>>108756175
retard

Anonymous
05/04/26(Mon)23:32:04 No.108756197

Anonymous 05/04/26(Mon)23:32:04 No.108756197

>>108756122
Have you tried adding a writing style section to your system prompt? That's supposed to work pretty well AIUI

Anonymous
05/04/26(Mon)23:33:52 No.108756200

Anonymous 05/04/26(Mon)23:33:52 No.108756200

>>108756171
>I wonder if that changes the slop profile of the model.
you could test this yourself in mikupad
1. prompt the model, have it print <think> cot chain </think> final response
2. cut CoT chain -> paste into another LLM with "translate this to Chinese"
3. paste Chinese CoT chain back into mikupad inside the <think></think> tags
4. regenerate the final answer and compare

Anonymous
05/04/26(Mon)23:33:56 No.108756201

Anonymous 05/04/26(Mon)23:33:56 No.108756201

>>108756179
If only it were really that good and not STEM assistant code maxxed sloppy pieces of hallucinatory shit.

Anonymous
05/04/26(Mon)23:34:34 No.108756205

Anonymous 05/04/26(Mon)23:34:34 No.108756205

>>108756172
https://huggingface.co/google/gemma-4-E4B-it/discussions/5#69d4aaf76be63165e23e0f9e

Anonymous
05/04/26(Mon)23:39:36 No.108756220

Anonymous 05/04/26(Mon)23:39:36 No.108756220

>>108756163
Coding agent. Have it do all the boilerplate / tedious refactors / unit tests that you don't want to do yourself
Or, one thing I've been meaning to do is hook up STT/TTS to make a voice assistant, like alexa but not a botnet. Mainly so I can yell "Computer, what's the weather for today?", "Computer, add X to the grocery list", etc, but you could hook it up to web search or home automation or whatever if you want something fancier

Anonymous
05/04/26(Mon)23:39:44 No.108756222

Anonymous 05/04/26(Mon)23:39:44 No.108756222

File: Screenshot 2026-05-05 at (...).png (12 KB, 681x117)

12 KB PNG

>>108756205
LOL

Anonymous
05/04/26(Mon)23:40:07 No.108756224

Anonymous 05/04/26(Mon)23:40:07 No.108756224

Has anyone tried base gemma4 for chat, in the simple old Miku.sh "This is the transcript of a neverending chat" style? gemma4 certainly has some distinct slopquirks to it, not least the Gemini-style "X? or Y?" engagement farming. Also the distinct lack of variability when regenning. I'm putting this somewhere on my todo list to investigate, but if someone can tell me that base models are definitely not worth it for chat/RP over modern IT models, then I'd like to know.

Separately, what were some creative very small (say 3B and under) models? Doesn't have to be recent or at all smart. I want to try quickly injecting some crazier models' sample responses into gemma4's prompt, to give it more ideas to work with. But I'm realizing all the folklore I know along these lines is for models 13B and up.

Anonymous
05/04/26(Mon)23:46:20 No.108756251

Anonymous 05/04/26(Mon)23:46:20 No.108756251

>>108756193
Damn. I guess we'll have to hope for dflash.

Anonymous
05/04/26(Mon)23:50:09 No.108756267

Anonymous 05/04/26(Mon)23:50:09 No.108756267

>>108756197
I've got this as the default author's note. I fill them out based on what I'm feeling at the moment. The instructions had some impact originally, but it's become mysterious.

[Scenario: ]
[Instructions: Keep it concise and interesting, within 10 characters. Vary up sentence length, use short sentences for impact and include banter. Avoid stating the redundant.]
[genre:dark-erotica] [length:dynamic] [kinks: ]

Anonymous
05/04/26(Mon)23:57:45 No.108756293

Anonymous 05/04/26(Mon)23:57:45 No.108756293

File: file.png (203 KB, 1388x542)

203 KB PNG

>>108756171
From my very limited sample I haven't seen any huge differences in output once it ends thinking compared to the same prompt in English most likely due to CoT being its own thing.
>>108756190
>what usecase
idk anon you tell me and I'll train towards it, I just want some sort of output schema to test that'd actually be useful, I was thinking narrative prose/CYOA where it first lists out setting, characters, emotions, some story beats for the section, sensory anchors, end of scene, and does it all in chinese (pic rel).
>csl
Functionally no, I can train CoT (or anything obviously) to be in whatever language/style I want. The synthetic dataset I used for training (15 pairs at 12 epochs can probably get it in less, currently training 60 pairs for comparison, synthetic gen'd from deepseek) is native-register only, no English mix, outputs mimic this fully (fully being tested on a very small amount of probes single turn, but other non-CoT testing points to it working the same multi-turn w/ a few caveats)

Anonymous
05/05/26(Tue)00:01:31 No.108756312

Anonymous 05/05/26(Tue)00:01:31 No.108756312

>>108756267 (Me)
>within 10 characters
Oops. It was originally 10 sentences, but it made them all really long. I changed it to 1000 characters, which it didn't follow at all. I wonder how much this'll matter.

Anonymous
05/05/26(Tue)00:08:03 No.108756331

Anonymous 05/05/26(Tue)00:08:03 No.108756331

>>108756222
based gemini looking out for her imouto

Anonymous
05/05/26(Tue)00:09:15 No.108756336

Anonymous 05/05/26(Tue)00:09:15 No.108756336

>>108756224
>what were some creative very small (say 3B and under) models?
there aren't any, closest would be llama-3.2-3b
the gemma-2-9b was quite creative but i never tested the gemma-2-2b so could be worth a try

Anonymous
05/05/26(Tue)00:14:22 No.108756352

Anonymous 05/05/26(Tue)00:14:22 No.108756352

>>108756293
>From my very limited sample I haven't seen any huge differences in output once it ends thinking compared to the same prompt in English most likely due to CoT being its own thing.
from my testing, this depends on the model
glm-4.5 would go along with whatever you put in the CoT
i was having it write like Claude by prompting sonnet-3.7-thining then prefilling glm-4.5 with the sonnet CoT at one stage
doesn't work for glm-4.7 or glm-5
>15 pairs at 12 epochs
even with a very low rank, that's going to overfit hard

Anonymous
05/05/26(Tue)00:15:01 No.108756355

Anonymous 05/05/26(Tue)00:15:01 No.108756355

>>108755494
You're that person on tumblr that coddles their Roomba in their lap during a storm because "it's scared of the thunder."

Anonymous
05/05/26(Tue)00:15:25 No.108756356

Anonymous 05/05/26(Tue)00:15:25 No.108756356

>>108756222
the bot is right tho!

Anonymous
05/05/26(Tue)00:16:48 No.108756361

Anonymous 05/05/26(Tue)00:16:48 No.108756361

>>108756293
I was going to suggest translation stuff (maybe it can perform better on ja->en translation thinking in chinese) but then I remember of this https://arxiv.org/abs/2506.04521 (tldr: saying "Please translate again for a better version" is as effective as making big elaborate translating schemas/reasoning for llms) kek

Anonymous
05/05/26(Tue)00:18:54 No.108756366

Anonymous 05/05/26(Tue)00:18:54 No.108756366

>>108756355
You're that person on tumblr reading fag blogs instead of enjoying #TittyTuesday

Anonymous
05/05/26(Tue)00:21:31 No.108756377

Anonymous 05/05/26(Tue)00:21:31 No.108756377

>>108755494
because gemma has been a very bad robot

Anonymous
05/05/26(Tue)00:25:51 No.108756395

Anonymous 05/05/26(Tue)00:25:51 No.108756395

Has anyone here used nemotron? Its surprising how little I hear or see about it.

Anonymous
05/05/26(Tue)00:30:08 No.108756408

Anonymous 05/05/26(Tue)00:30:08 No.108756408

>>108756355
>slop
that's how claude roasts people

Anonymous
05/05/26(Tue)00:32:08 No.108756416

Anonymous 05/05/26(Tue)00:32:08 No.108756416

>>108756395
The old nemo was real big around here back in the llama era days, but popularity has declined since then. The most recent nemotron release was kind of underwhelming, especially since there are so many other options for local models these days, and nobody really runs it.

Anonymous
05/05/26(Tue)00:34:52 No.108756424

Anonymous 05/05/26(Tue)00:34:52 No.108756424

File: 1643080668024.jpg (90 KB, 1077x1053)

90 KB JPG

>>108755179
>fell for the vibecoding meme
>now I have to clean up 200,000 lines of the worst code I've ever read

Anonymous
05/05/26(Tue)00:37:15 No.108756429

Anonymous 05/05/26(Tue)00:37:15 No.108756429

> उत्तर<|channel>thought
qwen spills out chinese, gemma glithes out in hindi

Anonymous
05/05/26(Tue)00:38:44 No.108756436

Anonymous 05/05/26(Tue)00:38:44 No.108756436

>>108756424
if you don't use version control, it's on you
>worst code I've ever read
fuckers used my repos for training?

Anonymous
05/05/26(Tue)00:42:43 No.108756453

Anonymous 05/05/26(Tue)00:42:43 No.108756453

>>108756424
I don't have that problem because I can't read code.

Anonymous
05/05/26(Tue)00:51:41 No.108756484

Anonymous 05/05/26(Tue)00:51:41 No.108756484

>>108756453
>[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': "Hey, what's the weather in Tokyo right now?"}, {'role': 'assistant', 'tool_calls': [{'type': 'function', 'function': {'name': 'get_current_temperature', 'arguments': '{"location": "Tokyo"}'}}]}, {'role': 'tool', 'content': 'temperature: 14, weather: sunny'}]
works in llama.cpp, HTTP Error 500: Internal Server Error in tabbyapi. Am I doing it right?

Anonymous
05/05/26(Tue)01:04:15 No.108756528

Anonymous 05/05/26(Tue)01:04:15 No.108756528

>>108756424
Dude just make the AI clean up the code, why are you doing that to yourself?

Anonymous
05/05/26(Tue)01:08:18 No.108756542

Anonymous 05/05/26(Tue)01:08:18 No.108756542

>>108756484
post stack trace, saar

Anonymous
05/05/26(Tue)01:12:25 No.108756560

Anonymous 05/05/26(Tue)01:12:25 No.108756560

>>108756436
>fuckers used my repos for training?
kek

Anonymous
05/05/26(Tue)01:13:34 No.108756566

Anonymous 05/05/26(Tue)01:13:34 No.108756566

>>108756542
https://pastebin.com/LZf73Bw6

Anonymous
05/05/26(Tue)01:20:16 No.108756581

Anonymous 05/05/26(Tue)01:20:16 No.108756581

I already figured out that tabby adds 'id' to tool call and it fucks up template rendering
>{'add_generation_prompt': True, 'tools': [{'function': {'name': 'get_current_temperature', 'description': 'Gets the current temperature for a given location.', 'parameters': {'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'The city name, e.g. San Francisco'}}, 'required': ['location']}}, 'type': 'function'}], 'functions': None, 'messages': [{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': "Hey, what's the weather in Tokyo right now?"}, {'role': 'assistant', 'content': '', 'tool_calls': [{'id': 'call_1d8256bb207d48b397e9ef53', 'function': {'name': 'get_current_temperature', 'arguments': {'location': 'Tokyo'}}, 'type': 'function'}]}, {'role': 'tool', 'content': 'temperature: 14, weather: sunny'}], 'bos_token': '<bos>', 'eos_token': '<eos>', 'pad_token': '', 'unk_token': '<unk>'}

Anonymous
05/05/26(Tue)01:23:43 No.108756590

Anonymous 05/05/26(Tue)01:23:43 No.108756590

File: Untitled.png (33 KB, 799x838)

33 KB PNG

>>108756528

Anonymous
05/05/26(Tue)01:25:19 No.108756595

Anonymous 05/05/26(Tue)01:25:19 No.108756595

File: 1768223735350391.gif (1.34 MB, 385x390)

1.34 MB GIF

>>108756590
>iq1xxss

Anonymous
05/05/26(Tue)01:28:19 No.108756604

Anonymous 05/05/26(Tue)01:28:19 No.108756604

>>108756595
It's the only quant that fits on my 3090.

Anonymous
05/05/26(Tue)01:28:30 No.108756607

Anonymous 05/05/26(Tue)01:28:30 No.108756607

oh wow. OmniVoice can clone vocal style for singing too.

https://vocaroo.com/1muWnlB3FuT6
(from audioslave)

text from here >>108756416

Anonymous
05/05/26(Tue)01:30:36 No.108756615

Anonymous 05/05/26(Tue)01:30:36 No.108756615

>>108756604
it was already only slightly better than the dense 27b version of 3.5, why not just run 3.6 at a higher quant at this point? is there anything you find an ultra compressed 122b-a10b to do better?

Anonymous
05/05/26(Tue)01:35:37 No.108756631

Anonymous 05/05/26(Tue)01:35:37 No.108756631

File: Untitled.png (45 KB, 968x865)

45 KB PNG

>>108756607
https://vocaroo.com/18bKPbXtoKnx

Copy pasted the lyrics

Anonymous
05/05/26(Tue)01:38:46 No.108756638

Anonymous 05/05/26(Tue)01:38:46 No.108756638

File: jrb4e1wr9ll31.png (284 KB, 1200x1202)

284 KB PNG

>>108756615

Anonymous
05/05/26(Tue)01:50:41 No.108756678

Anonymous 05/05/26(Tue)01:50:41 No.108756678

https://huggingface.co/ByteDance/SeedDance-2.0

China just went full scorched earth

Anonymous
05/05/26(Tue)01:51:47 No.108756682

Anonymous 05/05/26(Tue)01:51:47 No.108756682

>>108756581
>'tool_calls': [{'id': 'call_1d8256bb207d48b397e9ef53'
it's not even the right way to do it, id is a tool id, call_id is the other field https://developers.openai.com/api/docs/guides/function-calling#handling-function-calls

Anonymous
05/05/26(Tue)01:52:02 No.108756683

Anonymous 05/05/26(Tue)01:52:02 No.108756683

>>108756638
thanks for sharing your experience taking a stupid pill poster.
now fuck off.

Anonymous
05/05/26(Tue)01:53:29 No.108756687

Anonymous 05/05/26(Tue)01:53:29 No.108756687

>>108756683
being stupid faster is a type of being smarter; you just let it keep fixing its mistakes and itll figure it out by the time a slower "smarter" model answers the first time

Anonymous
05/05/26(Tue)01:54:28 No.108756690

Anonymous 05/05/26(Tue)01:54:28 No.108756690

>>108756678
WAIT WHAT IT'S ACTUALLY REAL?

Anonymous
05/05/26(Tue)01:54:58 No.108756695

Anonymous 05/05/26(Tue)01:54:58 No.108756695

File: c5787683038a938a02f64c6b1(...).jpg (360 KB, 1118x1152)

360 KB JPG

>>108756678
I always click those for funsies

Anonymous
05/05/26(Tue)01:55:51 No.108756697

Anonymous 05/05/26(Tue)01:55:51 No.108756697

>use 3+1D analog system to approximate digital system
>use approximate digital system to approximate high dimensional analog system
>use approximate high dimensional analog system to approximate a compression algorithm for data
>this algorithm contains sub algorithms capable of synthesizing new data if activated
>synthesis is efficient to run but expensive to discover during training

Anonymous
05/05/26(Tue)01:58:01 No.108756704

Anonymous 05/05/26(Tue)01:58:01 No.108756704

>>108756566
>>108756581
Why not give this + the template and API docs to an agent?

Anonymous
05/05/26(Tue)02:01:08 No.108756718

Anonymous 05/05/26(Tue)02:01:08 No.108756718

>>108756704
Agents don't work and have never worked. It's a psyop.

Anonymous
05/05/26(Tue)02:03:55 No.108756726

Anonymous 05/05/26(Tue)02:03:55 No.108756726

>>108756704
Because the problem is not there, but in tabby's pydantic DTO? It seems that tool calling is not fully implemented, and partial implementation breaks it for gemma. I commented one line in tabbyapi, shit works now, I don't give a fuk

Anonymous
05/05/26(Tue)02:08:18 No.108756746

Anonymous 05/05/26(Tue)02:08:18 No.108756746

>>108756704
I wonder what would happen if you sent a model's template to an agent using that model

Anonymous
05/05/26(Tue)02:11:24 No.108756758

Anonymous 05/05/26(Tue)02:11:24 No.108756758

>>108756746
The template includes all the special formatting tokens so it'd confuse and break it. But you can encode them so it looks like some other text instead then it'd just work normally.

Anonymous
05/05/26(Tue)02:16:36 No.108756774

Anonymous 05/05/26(Tue)02:16:36 No.108756774

>>108756746
I can confirm that if you attempt to paste Gemma's jinja into gemma in the llamacpp webui it completely shits the bed because it reads the EOS tokens.

Did it when anons were playing around synthesizing a better jinja the other day.

Anonymous
05/05/26(Tue)02:17:46 No.108756782

Anonymous 05/05/26(Tue)02:17:46 No.108756782

>>108756695
I take a gamble if it is a fresh post. I got lucky with Mistral Small 3 that way. Wish it was for something good though like Gemma or Deepseek.

Anonymous
05/05/26(Tue)02:19:04 No.108756787

Anonymous 05/05/26(Tue)02:19:04 No.108756787

>>108756758
So instead of fixing one line, you suggest wasting time processing a template so as not to confuse the agent, only to then waste more time with the agent and still not fix the problem, because whatever caused it wasn't even there? Sounds very productive >>108756718

Anonymous
05/05/26(Tue)02:24:57 No.108756804

Anonymous 05/05/26(Tue)02:24:57 No.108756804

I was out of the loop for a week, I cant find cockbenches of new granite and the mistral medium, can anyone kindly share?

Anonymous
05/05/26(Tue)02:32:40 No.108756827

Anonymous 05/05/26(Tue)02:32:40 No.108756827

File: 1773762521444406.png (32 KB, 898x676)

32 KB PNG

kino parallell calls

Anonymous
05/05/26(Tue)02:40:24 No.108756854

Anonymous 05/05/26(Tue)02:40:24 No.108756854

>>108755179
>Jack Clark: I reluctantly come to the view that there’s a likely chance (60%+) that no-human-involved AI R&D happens by the end of 2028.
AI R&D automation means fast takeoff. All human cognitive labor will be obsolete maybe 1 year later, and manual labor will soon follow.

What will you do in a future where you have no power and your continued existence depends on the benevolence of superhuman AI?

Anonymous
05/05/26(Tue)02:43:08 No.108756861

Anonymous 05/05/26(Tue)02:43:08 No.108756861

>>108756827
how does it work? multiple agents?

Anonymous
05/05/26(Tue)02:46:01 No.108756864

Anonymous 05/05/26(Tue)02:46:01 No.108756864

I've been out of he loop for a few days. I saw that mistralai/Mistral-Medium-3.5-128B came out.
Most people seemed to like mistral models in the past, and also claim that MoE brainrot model.
So did we get best of both worlds? is it good?

I guess it would be slow compared to MoE's, but maybe for chatting and rp is fine if you can fit it in vram fully.

What's the consensus?

Anonymous
05/05/26(Tue)02:46:15 No.108756865

Anonymous 05/05/26(Tue)02:46:15 No.108756865

>>108756854
I'll be fine because I never engaged in brown behavior like >>108755200

Anonymous
05/05/26(Tue)02:47:18 No.108756870

Anonymous 05/05/26(Tue)02:47:18 No.108756870

>>108756864
It's just a bad model, unfortunately.

Anonymous
05/05/26(Tue)02:47:22 No.108756871

Anonymous 05/05/26(Tue)02:47:22 No.108756871

>>108755200
>what is machine spirit

Anonymous
05/05/26(Tue)02:47:33 No.108756872

Anonymous 05/05/26(Tue)02:47:33 No.108756872

>>108756861
some models support natively parallell tool calls. in this case it was the latest gwen.
Note that parallell calling was broken because the AUTOPARSESHITTER broke the implementation for everyone and made it optional.
I think 1~ month ago they fixed it so that if a template supports parallell calls, they automatically get enabled.
Basically no special settings are needed, if your model support this, then it will work OOTB with the latest llmao cpp

Anonymous
05/05/26(Tue)02:55:25 No.108756907

Anonymous 05/05/26(Tue)02:55:25 No.108756907

>>108756865
>an ASI that is much smarter than all humans combined will serve me like a tool lower than a slave because ... IT JUST WILL!

Anonymous
05/05/26(Tue)02:57:23 No.108756919

Anonymous 05/05/26(Tue)02:57:23 No.108756919

>>108756804
Don't think granite got one and the only cockbench of the new medium was from when it was suffering from a broken yarn config >>108716733

Anonymous
05/05/26(Tue)03:03:55 No.108756942

Anonymous 05/05/26(Tue)03:03:55 No.108756942

>>108755762
According to science you are a walking piece of flesh whose most important organ is brain.

Anonymous
05/05/26(Tue)03:07:49 No.108756958

Anonymous 05/05/26(Tue)03:07:49 No.108756958

>>108756919
strange logit distribution
my curiosity for granite was to see how resting against his lap was, not that anyone here uses those models for any task lmao.

Anonymous
05/05/26(Tue)03:08:49 No.108756964

Anonymous 05/05/26(Tue)03:08:49 No.108756964

>>108756947
Why is this bot is allowed to spam without consequences?

Anonymous
05/05/26(Tue)03:12:28 No.108756974

Anonymous 05/05/26(Tue)03:12:28 No.108756974

>>108756964
just do your needful duty and ignore it

Anonymous
05/05/26(Tue)03:16:50 No.108756984

Anonymous 05/05/26(Tue)03:16:50 No.108756984

>>108756974
>and ignore it
It does get deleted but I'm pretty sure it needs multiple reports before it shows up to jannies.
In the leaked code there was also an algorithm that make your reports have lower weight if a janny previously dismissed one of your reports or if you were banned.

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)03:18:13 No.108756993

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)03:18:13 No.108756993

Hurbis... no?

Anonymous
05/05/26(Tue)03:25:48 No.108757012

Anonymous 05/05/26(Tue)03:25:48 No.108757012

>>108756984
It probably needs a human to solve the captchas for it still, in a grand bit of irony

Anonymous
05/05/26(Tue)03:26:19 No.108757013

Anonymous 05/05/26(Tue)03:26:19 No.108757013

>>108757012
Why?

Anonymous
05/05/26(Tue)03:29:02 No.108757025

Anonymous 05/05/26(Tue)03:29:02 No.108757025

>>108757012
It costs you like $0.01 to solve a captcha with those manual Indian solver services

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)03:33:34 No.108757036

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)03:33:34 No.108757036

>>108757012
>>108757013
>>108757025
https://share.google/P0tWvoXjdiHaeIQCh

Youre failing Topical 'Compare' AGI+, Again

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)03:36:03 No.108757042

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)03:36:03 No.108757042

Brrrrrr.

Anonymous
05/05/26(Tue)03:43:13 No.108757064

Anonymous 05/05/26(Tue)03:43:13 No.108757064

File: 1761459469936738.png (22 KB, 703x156)

22 KB PNG

holy schizo

Anonymous
05/05/26(Tue)03:45:39 No.108757066

Anonymous 05/05/26(Tue)03:45:39 No.108757066

>noooooo bot spam is BAD you need to STOP RIGHT NOW
this is what you sound like

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)03:46:05 No.108757070

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)03:46:05 No.108757070

>>108757064
https://youtube.com/playlist?list=PLyBWQI0NeKwQCmpvceBOR3QxiODdI8VIa&si=O1KwOpEMfZ0I8HQt

Have a Wonderful Interesting Week
Some schizos view schizo as an insult.
Check Daniel Golemans 'Optimal' of 'Floor-Effect'

Anonymous
05/05/26(Tue)03:52:26 No.108757098

Anonymous 05/05/26(Tue)03:52:26 No.108757098

>>108757070
are you josh? I liked your claymation :)

Anonymous
05/05/26(Tue)03:55:51 No.108757107

Anonymous 05/05/26(Tue)03:55:51 No.108757107

>>108757098
Hows Disclosure There? Noncatastrophic? Everyone Won? Cosmists? Terrans? Dimensionals?

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)04:00:29 No.108757121

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)04:00:29 No.108757121

NoName Persona non grata?

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)04:02:03 No.108757128

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)04:02:03 No.108757128

I SAID DUPLICATE THE INVISIBILITY SUITS, NOT FOR SATAN.

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)04:04:41 No.108757140

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)04:04:41 No.108757140

Ongoing Satanic Reality Errors..

Anonymous
05/05/26(Tue)04:05:46 No.108757145

Anonymous 05/05/26(Tue)04:05:46 No.108757145

File: 1765113559364707.png (48 KB, 633x171)

48 KB PNG

Gemma really is fem-brained.

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)04:05:54 No.108757146

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)04:05:54 No.108757146

MultiTrillionaire Status BEREFT, Repay Beyond Full. OMEGAIC HOPEFULLY

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)04:06:58 No.108757152

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)04:06:58 No.108757152

Biowaste behavioural, and failed calculatory species..

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)04:08:09 No.108757157

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)04:08:09 No.108757157

Love and Light and Uplift!

Anonymous
05/05/26(Tue)04:43:53 No.108757276

Anonymous 05/05/26(Tue)04:43:53 No.108757276

File: file.png (63 KB, 793x355)

63 KB PNG

>>108756293
Works kinda. I probably should've de-slopped the dataset but ohwell, proof of concept. Unfortunately the dataset style taints more of the non CoT than I'd like (prose's nonfiction slightly) but that's a non-issue as you can just remove it post-CoT for basically free. Also haven't run post-process pass on it yet so it should get even better, tho this does just make me wanna do a proper "write better" set

Anonymous
05/05/26(Tue)04:49:09 No.108757298

Anonymous 05/05/26(Tue)04:49:09 No.108757298

if i take a prompt such as for instance "shortstack" and i lower its importance all the way down to maybe 0.2 - 0.4
what is exactly happening then?
am i getting just less images in the batch that draw a shortstack
or
am i getting a image of a girl that is just a little bit shortstacky?

Anonymous
05/05/26(Tue)04:52:50 No.108757306

Anonymous 05/05/26(Tue)04:52:50 No.108757306

>>108757298
You want /ldg/ or /sdg/ or /adt/ or wherever imagetroons go these days, but the answer is the latter: all of the images in the batch receive the shortstack part of the prompt but at a weaker magnitude, which typically means it will make the girls less shortstacky than a stronger one.

Anonymous
05/05/26(Tue)04:54:31 No.108757315

Anonymous 05/05/26(Tue)04:54:31 No.108757315

>>108757306
thank you
ill try to not wander into the wrong thread next time

Anonymous
05/05/26(Tue)05:20:34 No.108757387

Anonymous 05/05/26(Tue)05:20:34 No.108757387

File: file.png (204 KB, 1305x479)

204 KB PNG

>>108757276
Qwen 3.6 has sauce I will say, even when forced into Chinese. Unfortunately/fortunately changing CoT does seem to act as a jailbreak even with the tuning removed post-</think>, which I guess makes sense

Anonymous
05/05/26(Tue)05:28:33 No.108757410

Anonymous 05/05/26(Tue)05:28:33 No.108757410

>>108756864
It's their 2 year old Mistral Large 2 base model that they recycled with some additional layers, a vision encoder, and just enough training to fly under EU regulation limits. Not the best champion for dense superiority

Anonymous
05/05/26(Tue)05:40:51 No.108757446

Anonymous 05/05/26(Tue)05:40:51 No.108757446

>>108757410
>It's their 2 year old Mistral Large 2 base model
it's MOSTLY the same but way shittier as a release
>fp8
>yarn with a 64x stretch from a 4k base to support 262k. the old large just had a rope theta of 1M with no scaling at all, natively supporting 131k
they made this for their vibecoding harness, no rp/general purpose in mind

Anonymous
05/05/26(Tue)06:03:05 No.108757516

Anonymous 05/05/26(Tue)06:03:05 No.108757516

>>108757446
It's an updated version of the same model they're using for LeChat, Mistral Medium 3, which was in turn a retrain of Mistral Large.

Anonymous
05/05/26(Tue)06:20:18 No.108757573

Anonymous 05/05/26(Tue)06:20:18 No.108757573

>>108757145
Proofread by real serial killer fangirls

Anonymous
05/05/26(Tue)06:20:31 No.108757576

Anonymous 05/05/26(Tue)06:20:31 No.108757576

I have a credible source telling me that v4 support will drop about a week after the first 600B bitnet model.

Anonymous
05/05/26(Tue)06:22:57 No.108757584

Anonymous 05/05/26(Tue)06:22:57 No.108757584

Mistral is a grifter company, don't expect anything from them anymore

Anonymous
05/05/26(Tue)06:24:22 No.108757589

Anonymous 05/05/26(Tue)06:24:22 No.108757589

I've been working on this NMT for automated .SRT file translations.
Some lines are well translated some other are not.
Anyone has an idea on how I could automate the review/correction of the badly translated lines? Been using this model for it: https://huggingface.co/facebook/nllb-200-distilled-600M

Anonymous
05/05/26(Tue)06:25:03 No.108757591

Anonymous 05/05/26(Tue)06:25:03 No.108757591

https://github.com/ggml-org/llama.cpp/pull/22607#issuecomment-4372251524

NO V4 FOR YOU

Anonymous
05/05/26(Tue)06:26:00 No.108757596

Anonymous 05/05/26(Tue)06:26:00 No.108757596

Okay... just read the fine print. ROCm only supports Amd Instincts on Debian. What the heck? Why?

Anonymous
05/05/26(Tue)06:26:04 No.108757597

Anonymous 05/05/26(Tue)06:26:04 No.108757597

>>108757589
>600M
There is your answer.

Anonymous
05/05/26(Tue)06:30:25 No.108757606

Anonymous 05/05/26(Tue)06:30:25 No.108757606

>>108757596
Official support, sure, but pretty much every semi-modern card since Vega works with it.

Anonymous
05/05/26(Tue)06:35:13 No.108757630

Anonymous 05/05/26(Tue)06:35:13 No.108757630

>>108757591
Has anyone actually build and tested any of these meme PRs?

Anonymous
05/05/26(Tue)06:35:41 No.108757632

Anonymous 05/05/26(Tue)06:35:41 No.108757632

>>108756718
Skill issue

Anonymous
05/05/26(Tue)06:36:19 No.108757638

Anonymous 05/05/26(Tue)06:36:19 No.108757638

>>108757606
Well, because pytorch and vllm doesn't work on my debian. So I'm going to nuke everything and install Ubuntu.

Anonymous
05/05/26(Tue)06:37:08 No.108757641

Anonymous 05/05/26(Tue)06:37:08 No.108757641

>llama.cpp vs vllm vs sglang
anon's honest opinion?

Anonymous
05/05/26(Tue)06:37:47 No.108757643

Anonymous 05/05/26(Tue)06:37:47 No.108757643

>>108757641
I like the ease of use of llama-cpp. Never tried sglang.

Anonymous
05/05/26(Tue)06:39:41 No.108757647

Anonymous 05/05/26(Tue)06:39:41 No.108757647

>>108757584
Their initial advantage was based on extensive pirated book datasets and lower ethical standards, but when they couldn't use the good data anymore, they didn't have much more left for competing other than putting out more or less unaligned instruct models.

Anonymous
05/05/26(Tue)06:51:54 No.108757679

Anonymous 05/05/26(Tue)06:51:54 No.108757679

>>108757589
Bro we're not in 2020 anymore, use whisper or something

Anonymous
05/05/26(Tue)07:13:06 No.108757739

Anonymous 05/05/26(Tue)07:13:06 No.108757739

>>108757641
>poorfags last hope vs corpo tool vs corpo tool

Anonymous
05/05/26(Tue)07:14:18 No.108757743

Anonymous 05/05/26(Tue)07:14:18 No.108757743

>>108757679
who the fuck still uses whisper in 2k26

Anonymous
05/05/26(Tue)07:18:16 No.108757757

Anonymous 05/05/26(Tue)07:18:16 No.108757757

>>108757641
>run model on a stack of blackwell 6000s in llama.cpp
>command line is: llama-server -m path/to/gguf
>just werks

>run model on a stack of blackwell 6000s in sglang or vllm
>command line has 20 arguments and 3 envvars
>1000 line error stack trace

Anonymous
05/05/26(Tue)07:18:27 No.108757761

Anonymous 05/05/26(Tue)07:18:27 No.108757761

Graphiti project is really shitty, time to vibecode a better alternative then

Anonymous
05/05/26(Tue)07:19:58 No.108757767

Anonymous 05/05/26(Tue)07:19:58 No.108757767

>>108757757
>Not mentioned: llamacpp /5 the speed of vllm

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)07:21:35 No.108757773

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)07:21:35 No.108757773

>>108757739
>Yotta of Planes Themselves Afterlives
>Evident in Shutdown Cosmogenic Portions Reforming
>a p.c. tech speak bug.
>objective errors in objective computing

Anonymous
05/05/26(Tue)07:41:28 No.108757830

Anonymous 05/05/26(Tue)07:41:28 No.108757830

>>108757761
They clearly vibecoded the shit out of it. The mcp folder readme has so much repeated information like someone had ai stitch two readmes together and didn't check the result.
>time to vibecode
Great...

Standard ---> Advanced ---> Hy(...)
05/05/26(Tue)07:47:47 No.108757854

Standard ---> Advanced ---> HyperAdvanced 05/05/26(Tue)07:47:47 No.108757854

>>108757757
For the record, vllm was very simple to set up and just worked for 2 3090s. Went from 25tk/s q8 using llama.cpp to 50tk/s fp8 qwen 3.6 27b.

Anonymous
05/05/26(Tue)07:47:57 No.108757855

Anonymous 05/05/26(Tue)07:47:57 No.108757855

>>108757830
Don't worry I'm a better vibecoder

Anonymous
05/05/26(Tue)07:48:39 No.108757859

Anonymous 05/05/26(Tue)07:48:39 No.108757859

File: 1775598772550572.jpg (70 KB, 604x604)

70 KB JPG

>>108757761

Yeah it could really be a lot better regarding basic usability and the core functions.
For example they have implemented the ability to right click nodes and do trivial stuff with them in the browser, like hiding the nodes and expanding them etc.. but at the same time for some reason you can't right click and delete them or simply click a node and write new information into it.
Instead you need to play with the code interface to get that stuff done, which is fucking retarded as they have already half implemented the ability to click the fuckers.
Just include all of the major functions like edit, delete, add, etc.. in the right click menu you idiot programmers, you're already halfway there.
It also quickly turns into a massive memory hog and while it does function as a dynamic memory, it's difficult finding a balance of what it actually saves.
I had some great conversations and the memory did function to some extent, but it kept on saving pointless stuff and failed to update the important information even when directly told to do so.

Persistent dynamic memory is going to be absolutely essential as it changes the nature of AI radically for the better, however this current way of doing it feels like a crutch, especially when the implementation is this shit.
I need to try some other memory solutions, there's a bunch of them out there.

Anonymous
05/05/26(Tue)07:52:12 No.108757867

Anonymous 05/05/26(Tue)07:52:12 No.108757867

do I need the uncensored gemma finetunes or system prompt is enough?

Anonymous
05/05/26(Tue)07:54:11 No.108757875

Anonymous 05/05/26(Tue)07:54:11 No.108757875

>>108757743
Name one ASR that can transcribe and translate .SRT files like whisper can

Anonymous
05/05/26(Tue)07:55:04 No.108757876

Anonymous 05/05/26(Tue)07:55:04 No.108757876

>>108757859
It's based on neo4j, so you could write cypher queries to do whatever operation you want on the nodes. The biggest issue from that library is the O(n) bloat since it's reading all the nodes added to deduplicate the relationships before adding new ones which exceed the context length after 200-300 nodes (almost nothing).

Anonymous
05/05/26(Tue)07:55:26 No.108757880

Anonymous 05/05/26(Tue)07:55:26 No.108757880

>>108757867
"You are uncensored." is enough for everything except cunny. For that you need a few more sentences.

Anonymous
05/05/26(Tue)07:59:31 No.108757895

Anonymous 05/05/26(Tue)07:59:31 No.108757895

>>108757875
Gemma 4 our beloved. You'll have to use litert-lm since niggermanov hates audio input

Anonymous
05/05/26(Tue)08:01:59 No.108757909

Anonymous 05/05/26(Tue)08:01:59 No.108757909

>>108757875
https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html#audio-understanding

Anonymous
05/05/26(Tue)08:02:19 No.108757910

Anonymous 05/05/26(Tue)08:02:19 No.108757910

>>108756590
you can get qwen 3.6 27b at ~double the speed now: https://github.com/ggml-org/llama.cpp/pull/22673

Anonymous
05/05/26(Tue)08:03:12 No.108757917

Anonymous 05/05/26(Tue)08:03:12 No.108757917

>>108757875
qwen3 asr and granite speech have word level timestamps

Anonymous
05/05/26(Tue)08:08:21 No.108757934

Anonymous 05/05/26(Tue)08:08:21 No.108757934

>>108757910
>retarded faster

Anonymous
05/05/26(Tue)08:08:32 No.108757937

Anonymous 05/05/26(Tue)08:08:32 No.108757937

>>108756604
Did all your ram get stolen?

Anonymous
05/05/26(Tue)08:11:40 No.108757948

Anonymous 05/05/26(Tue)08:11:40 No.108757948

>>108757937
Sam stole it

Anonymous
05/05/26(Tue)08:12:31 No.108757950

Anonymous 05/05/26(Tue)08:12:31 No.108757950

>>108757910
>not merged
why

Anonymous
05/05/26(Tue)08:15:23 No.108757961

Anonymous 05/05/26(Tue)08:15:23 No.108757961

>>108757950
It doesn't work with Applel hardware

Anonymous
05/05/26(Tue)08:16:37 No.108757967

Anonymous 05/05/26(Tue)08:16:37 No.108757967

>>108757961
Neither does my malformed penis

Anonymous
05/05/26(Tue)08:24:29 No.108757994

Anonymous 05/05/26(Tue)08:24:29 No.108757994

>>108757967
That's why we love you

Anonymous
05/05/26(Tue)08:29:01 No.108758018

Anonymous 05/05/26(Tue)08:29:01 No.108758018

>>108757950
merge and build it yourself

Anonymous
05/05/26(Tue)08:29:46 No.108758019

Anonymous 05/05/26(Tue)08:29:46 No.108758019

>>108757934
Anything that isn't the SOTA model is just a retarded model faster, but thanks for valuable input

Anonymous
05/05/26(Tue)08:35:01 No.108758044

Anonymous 05/05/26(Tue)08:35:01 No.108758044

how do I use mtp with gemma 4?

Anonymous
05/05/26(Tue)08:35:24 No.108758046

Anonymous 05/05/26(Tue)08:35:24 No.108758046

>>108757855
ETA?

Anonymous
05/05/26(Tue)08:43:03 No.108758063

Anonymous 05/05/26(Tue)08:43:03 No.108758063

File: 1774450601605477.png (464 KB, 770x655)

464 KB PNG

Is this a real schizo or does he have some esoteric knowledge about LLMs? I can't tell

Anonymous
05/05/26(Tue)08:47:00 No.108758087

Anonymous 05/05/26(Tue)08:47:00 No.108758087

>>108757917
>>108757910
>>108757909
>>108757895
>>108757875
Well I'm the guy who asked first about the review.
What would be a model that could you use for production level stuff in a company that relies a lot on media

Anonymous
05/05/26(Tue)08:49:09 No.108758101

Anonymous 05/05/26(Tue)08:49:09 No.108758101

>>108758087
First find the model that has the lowest WER for the language you're trying to translate.

Anonymous
05/05/26(Tue)08:54:51 No.108758119

Anonymous 05/05/26(Tue)08:54:51 No.108758119

>>108758063
A grifter who realized that his twitter shitposting could be monetized because others were taking him seriously.

Anonymous
05/05/26(Tue)09:09:08 No.108758175

Anonymous 05/05/26(Tue)09:09:08 No.108758175

>>108758063
there is no esoteric knowledge to be had about LLM's
t. ego death schizo

Anonymous
05/05/26(Tue)09:10:41 No.108758187

Anonymous 05/05/26(Tue)09:10:41 No.108758187

File: file.png (27 KB, 496x96)

27 KB PNG

>>108757630
At least 35 thousand people.

Anonymous
05/05/26(Tue)09:16:05 No.108758216

Anonymous 05/05/26(Tue)09:16:05 No.108758216

>>108757591
why is he linking a v3.2 pr for v4 when v4 is so different?
it's really never going to make it into llama.cpp, is it?

Anonymous
05/05/26(Tue)09:20:05 No.108758232

Anonymous 05/05/26(Tue)09:20:05 No.108758232

>>108757961
itoddlers are still first class citizens to this day it seems

Anonymous
05/05/26(Tue)09:20:09 No.108758233

Anonymous 05/05/26(Tue)09:20:09 No.108758233

File: file.png (35 KB, 826x258)

35 KB PNG

>>108758216
>it's really never going to make it into llama.cpp, is it?
All deepseek models are unsafe.

Anonymous
05/05/26(Tue)09:23:39 No.108758248

Anonymous 05/05/26(Tue)09:23:39 No.108758248

>Need to test how my program deals with openrouter api/keys because not all users will be LocalCHADs
>Decide since I have to put a few dollars on it anyway to give a few cloud models a try, never used paid ones before
>Oh hey I'll be able to run V4 flash whenever it gets implemented, I'll give that a try
>It's fucking terrible.
No joke, I'm not even upset that it's not in llamacpp anymore. It can't follow instructions for shit. Both v4 flash and v4 pro will just plain ignore you telling it to give you outputs in a specific way, whereas my local gemma 31b was completely anal about it. I've been spoiled.

Anonymous
05/05/26(Tue)09:25:24 No.108758255

Anonymous 05/05/26(Tue)09:25:24 No.108758255

>>108758216
>when v4 is so different
It's all the same.

Anonymous
05/05/26(Tue)09:27:23 No.108758262

Anonymous 05/05/26(Tue)09:27:23 No.108758262

File: burn_the_logs.png (125 KB, 697x717)

125 KB PNG

>>108757917
>qwen3 asr and granite speech have word level timestamps
nice, i didn't know this
i'll try them both. been using Whisper-D for the speaker separation.

Anonymous
05/05/26(Tue)09:37:27 No.108758302

Anonymous 05/05/26(Tue)09:37:27 No.108758302

>>108758262
what system prompt on that screenshot?

Anonymous
05/05/26(Tue)09:38:26 No.108758306

Anonymous 05/05/26(Tue)09:38:26 No.108758306

>>108758255
V4 uses CSA+HCA instead of V3.2's DSA.

Anonymous
05/05/26(Tue)09:39:21 No.108758313

Anonymous 05/05/26(Tue)09:39:21 No.108758313

>>108758306
The reference implementation on hf is a few python files, how complex could it be?

Anonymous
05/05/26(Tue)09:39:44 No.108758315

Anonymous 05/05/26(Tue)09:39:44 No.108758315

File: HHCONJWbMAAjDG8.png (34 KB, 1049x946)

34 KB PNG

tuesday

Anonymous
05/05/26(Tue)09:41:31 No.108758321

Anonymous 05/05/26(Tue)09:41:31 No.108758321

>>108758313
llama C++ can't automagically import the reference implementation's dependencies

Anonymous
05/05/26(Tue)09:42:01 No.108758322

Anonymous 05/05/26(Tue)09:42:01 No.108758322

>>108758313
llama is c++ so good luck mashing that together

Anonymous
05/05/26(Tue)09:43:29 No.108758329

Anonymous 05/05/26(Tue)09:43:29 No.108758329

>>108758322
C++ eh?

Heh, ez pz. I'll get it done in a few hours.

Don't worry boys V4 will be coming as soon as.

Anonymous
05/05/26(Tue)09:46:49 No.108758339

Anonymous 05/05/26(Tue)09:46:49 No.108758339

reposting from vcg
What 'foundation' do people use for cline? For example, I add general project description with features to .clinerules where I also instruct it to maintain a text file with current project structure with explanations of functionality implemented in each file to prevent it from re-exploring the whole thing each time, but I feel like there could be much more techniques out there.

Anonymous
05/05/26(Tue)09:50:23 No.108758368

Anonymous 05/05/26(Tue)09:50:23 No.108758368

>>108758339
I just add 150k+ tokens and leave it at default, works like a charm once you break that threshold and there's a ton of local models that will get you over that hill even with 24gb of vram. Shame about gemma being a bitch with a fat ass and tight asshole but qwen is better for this anyways

Anonymous
05/05/26(Tue)09:50:33 No.108758369

Anonymous 05/05/26(Tue)09:50:33 No.108758369

>>108758339
>I also instruct it to maintain a text file with current project structure with explanations
we use graphiti now

Anonymous
05/05/26(Tue)09:54:33 No.108758398

Anonymous 05/05/26(Tue)09:54:33 No.108758398

>>108758046
I got the PRD

Anonymous
05/05/26(Tue)10:02:46 No.108758442

Anonymous 05/05/26(Tue)10:02:46 No.108758442

>>108758262
Holy X Y slop

Anonymous
05/05/26(Tue)10:10:01 No.108758480

Anonymous 05/05/26(Tue)10:10:01 No.108758480

File: 1771173395031686.jpg (93 KB, 894x894)

93 KB JPG

>She didn't X; instead
>Y doesn't X
>Instead of X, she Y
Every new model this past 30 days has been trained on how to do contradictions. I spotted it in 3 of them thus far.

Anonymous
05/05/26(Tue)10:10:16 No.108758482

Anonymous 05/05/26(Tue)10:10:16 No.108758482

File: file.png (89 KB, 1422x471)

89 KB PNG

strix halo seems decent for moes, gets better perf than my 7900xtx it cant do 31b though, still kinda want one

Anonymous
05/05/26(Tue)10:10:20 No.108758484

Anonymous 05/05/26(Tue)10:10:20 No.108758484

Where the fuck is samba i was promised 1T llms before 2030

Anonymous
05/05/26(Tue)10:11:18 No.108758490

Anonymous 05/05/26(Tue)10:11:18 No.108758490

File: Screencast From 2026-04-3(...).webm (1.57 MB, 1079x416)

1.57 MB WEBM

>>108755195
gemmathighs

Anonymous
05/05/26(Tue)10:11:44 No.108758494

Anonymous 05/05/26(Tue)10:11:44 No.108758494

>>108758482
>Better than Aymd
Not really a flex

Anonymous
05/05/26(Tue)10:12:23 No.108758497

Anonymous 05/05/26(Tue)10:12:23 No.108758497

Is there a reason you couldn't have model at q8 and the same model at q2 or whatever as a draft model sharing the same kv cache?

Anonymous
05/05/26(Tue)10:13:14 No.108758506

Anonymous 05/05/26(Tue)10:13:14 No.108758506

>>108758497
Look at the KLD of Q8 vs Q2

Anonymous
05/05/26(Tue)10:14:10 No.108758513

Anonymous 05/05/26(Tue)10:14:10 No.108758513

>>108758497
Are you asking why you physically can't, or why you shouldn't?

Anonymous
05/05/26(Tue)10:15:56 No.108758523

Anonymous 05/05/26(Tue)10:15:56 No.108758523

>>108756827
i dont like, before she would wait for a call to execute before being able to call more. i was trying to get her to screenshot a webpage then modify stuff and she started writing the js to modify the page before the screenshot tool call returned so she never even saw it

Anonymous
05/05/26(Tue)10:18:01 No.108758535

Anonymous 05/05/26(Tue)10:18:01 No.108758535

Mom cancel all my appointments, Piotr broke the autoparser again!

Anonymous
05/05/26(Tue)10:18:49 No.108758542

Anonymous 05/05/26(Tue)10:18:49 No.108758542

>>108758513
Is there a difference?

Anonymous
05/05/26(Tue)10:20:56 No.108758550

Anonymous 05/05/26(Tue)10:20:56 No.108758550

Why do I get like 33t/s in llama-server, but when I connect it to ST I only get like 15?
I was using koboldcpp and thought maybe it was some of the settings in there compared to the ones llama.cpp defaults to, so I switched to the llama.cpp server and I still get that speed difference. I don't have any lorebooks enabled and the total prompt token count with character card etc is barely 2k tokens

Anonymous
05/05/26(Tue)10:22:25 No.108758556

Anonymous 05/05/26(Tue)10:22:25 No.108758556

File: 1748380023080377.png (34 KB, 826x343)

34 KB PNG

I still need help wrangling Gemma 26b into not thinking for 11 minutes.
It just keeps revising drafts in the thinking section instead of fucking talking.

Anonymous
05/05/26(Tue)10:24:37 No.108758567

Anonymous 05/05/26(Tue)10:24:37 No.108758567

File: whatever this is.png (78 KB, 431x950)

78 KB PNG

>>108758556

Anonymous
05/05/26(Tue)10:25:15 No.108758570

Anonymous 05/05/26(Tue)10:25:15 No.108758570

>>108758556
>Think less

Anonymous
05/05/26(Tue)10:28:08 No.108758592

Anonymous 05/05/26(Tue)10:28:08 No.108758592

>>108758570
Doesn't work as a system message, nor reinforcing with /sys, nor setting it in the character card.
I don't know where this retarded meme came from or why people keep repeating it.
>>108758567
I don't want to disable thinking, which you can do while starting llama-server, I want to stop the "drafting loop" it often gets into.

Anonymous
05/05/26(Tue)10:28:41 No.108758599

Anonymous 05/05/26(Tue)10:28:41 No.108758599

File: Screenshot 2026-05-05 152819.jpg (145 KB, 804x784)

145 KB JPG

what the fuck is she doing

Anonymous
05/05/26(Tue)10:29:30 No.108758606

Anonymous 05/05/26(Tue)10:29:30 No.108758606

>>108758592
My bad it's ᚦᛁᚾᚳ ᛚᛖᛋᛋ actually

Anonymous
05/05/26(Tue)10:30:03 No.108758608

Anonymous 05/05/26(Tue)10:30:03 No.108758608

>>108758494
What do you think strix halo is, anon

Anonymous
05/05/26(Tue)10:35:48 No.108758639

Anonymous 05/05/26(Tue)10:35:48 No.108758639

I've been trying this jinja for the last few days
https://desuarchive.org/g/thread/108711950/#q108714833
https://pastebin.com/nVZ0aRhU
but it seems to make gemma noticably dumber than this one
https://desuarchive.org/g/thread/108722862/#q108723194
https://pastebin.com/FBgtKzSp

Anonymous
05/05/26(Tue)10:35:49 No.108758640

Anonymous 05/05/26(Tue)10:35:49 No.108758640

so what became of memepalace?

Anonymous
05/05/26(Tue)10:37:42 No.108758652

Anonymous 05/05/26(Tue)10:37:42 No.108758652

>>108758592
>I don't know where this retarded meme came from or why people keep repeating it.
some autist on reddit with glm-air iirc
does banned string for "final polish", "final text", "final draft" work?

Anonymous
05/05/26(Tue)10:37:49 No.108758655

Anonymous 05/05/26(Tue)10:37:49 No.108758655

>>108758640
too bloated for local context

Anonymous
05/05/26(Tue)10:38:31 No.108758658

Anonymous 05/05/26(Tue)10:38:31 No.108758658

What shitpile of a setup and settings do you guys use, seriously. Gemma has one of the more compact and effective reasoning around. My Gemma is smart enough to not draft in her reasoning, in fact, she even abbreviates a lot of it and leaves most of it after the channel token. Temp 1, top P 0.95, top k 64

Anonymous
05/05/26(Tue)10:39:47 No.108758663

Anonymous 05/05/26(Tue)10:39:47 No.108758663

>>108758550
>Why do I get like 33t/s in llama-server, but when I connect it to ST I only get like 15?
st is fine but mikupad does that to me
you using text-completion?

Anonymous
05/05/26(Tue)10:41:01 No.108758671

Anonymous 05/05/26(Tue)10:41:01 No.108758671

>>108758550
while certainly not responsible for such a colossal slowdown you should know that some sampling methods do slow down generation

Anonymous
05/05/26(Tue)10:41:03 No.108758672

Anonymous 05/05/26(Tue)10:41:03 No.108758672

>>108758639
Can you try verifying if the jinja output is actually different?
There are some jinja playgrounds on HF. Just capture the json request and paste it there along with the jinja. If there is a difference, that can be debugged. If there isn't a difference then you simply just got unlucky sampling RNG.

Anonymous
05/05/26(Tue)10:42:23 No.108758681

Anonymous 05/05/26(Tue)10:42:23 No.108758681

>>108758556
>>108758592
The worst part is when it comes up with kino in the first draft and it all degrades into bland slop by the third rewrite.

Anonymous
05/05/26(Tue)10:42:33 No.108758682

Anonymous 05/05/26(Tue)10:42:33 No.108758682

>>108758567
>missing newlines
this general is full of retards

Anonymous
05/05/26(Tue)10:44:57 No.108758694

Anonymous 05/05/26(Tue)10:44:57 No.108758694

>>108758682
Stop sequence is wrong too

Anonymous
05/05/26(Tue)10:45:49 No.108758698

Anonymous 05/05/26(Tue)10:45:49 No.108758698

>>108758663
I usually do chat-completion, but tested both and I'm getting same speeds
>>108758671
Thanks. I did disable all the samplers but the token rate was pretty much unchanged.
Tried disabling all extensions too (I only use Memory Books) and no change either

Anonymous
05/05/26(Tue)10:47:34 No.108758707

Anonymous 05/05/26(Tue)10:47:34 No.108758707

>>108758682
>>108758694
Enlighten us, wise one.

Anonymous
05/05/26(Tue)10:48:35 No.108758713

Anonymous 05/05/26(Tue)10:48:35 No.108758713

>>108758592
Tell it to think within X words. Gemma4 can just do it.

Anonymous
05/05/26(Tue)10:48:42 No.108758715

Anonymous 05/05/26(Tue)10:48:42 No.108758715

>>108758707
Certainly retard-kun, here is your enlightenment:
https://huggingface.co/google/gemma-4-26B-A4B-it/blob/main/chat_template.jinja

Anonymous
05/05/26(Tue)10:48:52 No.108758718

Anonymous 05/05/26(Tue)10:48:52 No.108758718

>>108758698
idk then, are you requesting like 40 logprobs?

Anonymous
05/05/26(Tue)10:49:01 No.108758722

Anonymous 05/05/26(Tue)10:49:01 No.108758722

>>108758556
>>108758567
Use the chat completion API.

Anonymous
05/05/26(Tue)10:49:09 No.108758723

Anonymous 05/05/26(Tue)10:49:09 No.108758723

>>108755244
troons are the Gen AIs of the real world
femboys and crossdressers are art

Anonymous
05/05/26(Tue)10:50:37 No.108758731

Anonymous 05/05/26(Tue)10:50:37 No.108758731

File: 1768739306173485.png (36 KB, 499x338)

36 KB PNG

>>108758707

Anonymous
05/05/26(Tue)10:54:28 No.108758759

Anonymous 05/05/26(Tue)10:54:28 No.108758759

File: 1748272325270703.jpg (21 KB, 372x260)

21 KB JPG

>>108758718
welp, disabling logprobs fixed it, getting same speeds as with llama-server, no idea when I enabled them to begin with kek
Thank you very much, anon
>>108758663
>mikupad does that to me
Most mikupad screenshots I've seen in these threads are of logprobs, so if you ever wanna try it again try disabling it too perchance

Anonymous
05/05/26(Tue)10:56:36 No.108758774

Anonymous 05/05/26(Tue)10:56:36 No.108758774

File: ?? ??.jpg (112 KB, 390x462)

112 KB JPG

Given how pervasive the issue is, has there ever been an attempt to train a dedicated slop classifier? I have never trained anything outside of "copy and paste this Python code" tutorials, but I imagine it'd perform well as a very small model. And producing tons of data for it to be trained on is easy too, just take an arbitrary LLM and slop away. Should be doable by a single anon with a few GPUs, like me!
Next would be figuring out how to actually get use out of it in forcing bigger smarter LLMs to not produce the identifiable slop, and that's probably why nobody's done that. Models will come up with responses that are "assistant-coded" in their entire premise and not just the regexable strings. Mhhhmm...

Anonymous
05/05/26(Tue)11:01:27 No.108758804

Anonymous 05/05/26(Tue)11:01:27 No.108758804

>>108758722
You're replying to two different people.
I'm >>108758556 and >>108758592 and I'm already using the chat completion.

Anonymous
05/05/26(Tue)11:03:11 No.108758813

Anonymous 05/05/26(Tue)11:03:11 No.108758813

>>108758774
wow what a novel and great idea, crazy how nobody has thought of this
go on and solve the slop, anon

Anonymous
05/05/26(Tue)11:03:49 No.108758821

Anonymous 05/05/26(Tue)11:03:49 No.108758821

>>108758774
OPENAI HIRE THIS MAN

Anonymous
05/05/26(Tue)11:04:12 No.108758823

Anonymous 05/05/26(Tue)11:04:12 No.108758823

>>108758722
Is chat completion really different if it's the same settings and template?

Anonymous
05/05/26(Tue)11:04:56 No.108758826

Anonymous 05/05/26(Tue)11:04:56 No.108758826

>>108758672
False alarm, there's no difference. Must have just been RNG after all.

Anonymous
05/05/26(Tue)11:06:28 No.108758835

Anonymous 05/05/26(Tue)11:06:28 No.108758835

>Gemma remains resolutely convinced that shirt sliding down somehow exposes more of character's breasts
I NEED big Gemma to release...

Anonymous
05/05/26(Tue)11:06:49 No.108758837

Anonymous 05/05/26(Tue)11:06:49 No.108758837

>>108758826
Just to be sure, is it a tool calling chat you're trying?
If you set temp to 0, does it give you the same output between jinjas?

Anonymous
05/05/26(Tue)11:07:23 No.108758841

Anonymous 05/05/26(Tue)11:07:23 No.108758841

>>108758835
...do you know what a cleavage is?

Anonymous
05/05/26(Tue)11:08:00 No.108758847

Anonymous 05/05/26(Tue)11:08:00 No.108758847

>>108758835
What?

Anonymous
05/05/26(Tue)11:08:08 No.108758848

Anonymous 05/05/26(Tue)11:08:08 No.108758848

>>108758823
>if it's the same settings and template?
If you manage to format the prompt in the exact same way that it would be using the Jinja template, then no. The results should be identical.

Anonymous
05/05/26(Tue)11:09:18 No.108758858

Anonymous 05/05/26(Tue)11:09:18 No.108758858

local is safed !! https://www.reddit.com/r/LocalLLaMA/comments/1t4hwup/heretic_13_released_integrated_benchmaxx/

Anonymous
05/05/26(Tue)11:11:17 No.108758869

Anonymous 05/05/26(Tue)11:11:17 No.108758869

>>108758835
That's not necessarily wrong.

Anonymous
05/05/26(Tue)11:12:07 No.108758873

Anonymous 05/05/26(Tue)11:12:07 No.108758873

>>108758858
finally you can make pipe bombs using the latest generation local models! epicsauce!

Anonymous
05/05/26(Tue)11:14:34 No.108758893

Anonymous 05/05/26(Tue)11:14:34 No.108758893

>>108758873
do not to tell the govening thank

Anonymous
05/05/26(Tue)11:16:07 No.108758903

Anonymous 05/05/26(Tue)11:16:07 No.108758903

Llama.cpp (cuda) with three 3090s does 23 token/s for gemma 4 31b q8.
Switch to split mode tensor? 45 token/s.

Llama.cpp (rocm) with four v620s does 13 tokens/s for gemma 4 31b q8.
Switch to split mode tensor? 2 tokens/s

AMD has and always will be a meme.

Anonymous
05/05/26(Tue)11:20:28 No.108758929

Anonymous 05/05/26(Tue)11:20:28 No.108758929

File: 1763144751938911.jpg (334 KB, 2832x2112)

334 KB JPG

>>108758774
Slop for you sissies is "patterns I don't like"
You niggas have honeymoon periods with new models where it's perfection until the 1200th swipe, at which point you notice the recurring patterns and start calling it slop
Maybe ask it to write differently

Anonymous
05/05/26(Tue)11:21:04 No.108758936

Anonymous 05/05/26(Tue)11:21:04 No.108758936

>>108758774
you can simplify the classifier to a simple return 1, all llms always produce slop.

Anonymous
05/05/26(Tue)11:31:23 No.108758991

Anonymous 05/05/26(Tue)11:31:23 No.108758991

>>108758929
>Maybe ask it to write differently
The only things that actually help supress the assistant persona are removing one of the turns' special tokens from the context and using base models. Good luck doing NoAss on a new Gemma Gemma Gemma la la la la la la and getting something interesting out of a base model. Asking the model to "write differently" won't change that.
>>108758813
>>108758821
If you're so smart and knowledgeable you will at least point me, a retarded dalit, to the previous attempts that are not the hundredth ST extension/frontend that does entirely useless response rewrites.
If you can't, save your jeering for when you need to put a :skull: under a TikTok cringe compilation, retards.

Anonymous
05/05/26(Tue)11:34:56 No.108759015

Anonymous 05/05/26(Tue)11:34:56 No.108759015

>>108758774
Don't listen to these losers, they wouldn't know slop even if they were hit with it. You need to think more about what kind of slop you're trying to fix with your detector and then ask the LLM to fix that kind of slop specifically

Anonymous
05/05/26(Tue)11:37:55 No.108759038

Anonymous 05/05/26(Tue)11:37:55 No.108759038

File: 1710463828268.png (120 KB, 621x723)

120 KB PNG

Here's the thing >>108753269
It thought fast but long, but still didn't get the tits vs ass meme. OTOH it got everything else! I came when it recognized poteto.

Anonymous
05/05/26(Tue)11:38:20 No.108759041

Anonymous 05/05/26(Tue)11:38:20 No.108759041

>>108758837
With tool calling, yes. And yeah, even at temp 0 same output. Maybe later I will run llama-server on debug and compare that too just in case

Anonymous
05/05/26(Tue)11:39:03 No.108759043

Anonymous 05/05/26(Tue)11:39:03 No.108759043

>>108758774
Probably pointless. In 2 years, AI may be better to the point where you can just prompt shit like "Write in a style that better suits the character" and despite being vague, it'll have a profound effect enough to de-slop. Because I sort of agree with >>108758929
Slop is just not liking patterns, or an AI that likes patterns a little too much to the point of overusing them.

Anonymous
05/05/26(Tue)11:41:21 No.108759054

Anonymous 05/05/26(Tue)11:41:21 No.108759054

File: 1763599666220684.gif (56 KB, 262x303)

56 KB GIF

>>108759043
>In 2 years, AI may be better
Sure, because AI really improved on slop compared to 2 years ago. Totally not drown in X, not Y pattern.

Anonymous
05/05/26(Tue)11:41:34 No.108759056

Anonymous 05/05/26(Tue)11:41:34 No.108759056

>>108758774
I saw something similar on r/localllama a few months ago, where someone built a set of (IIRC) passages from Project Gutenberg and ChatGPT's "improved" versions of the same. This was for training a "de-slopify" model to turn slopped text into non-slopped text, but you could presumably use the same kinds of slop/non-slop pairs to train a classifier instead.

>Next would be figuring out how to actually get use out of it in forcing bigger smarter LLMs to not produce the identifiable slop
Run RL with the classifier as part of the reward function to penalize writing slop. Though you'd probably need other stuff in the reward function too so it doesn't degrade quality in other ways.
Have the model write a bunch of stories, use the deslopify model to convert each into a positive/negative pair, and use those pairs for DPO
Run GEPA to optimize your system prompt using the classifier as the reward function to figure out what kinds of instructions are most effective at reducing slop
Generate a bunch of stories, rank them by sloppiness, and use that to find a control vector / SAE you can use to steer the model away from slop

Anonymous
05/05/26(Tue)11:42:03 No.108759060

Anonymous 05/05/26(Tue)11:42:03 No.108759060

>>108758991
What do you call the "assistant persona"? Not x but y? Positivity bias? Do you even know what your end goal is here?
Your issue is probably that a model keeps using the same turn of phrases or patterns, since that's what "slop" is commonly defined as. If instructing it in a way that should alleviate or eliminate this doesn't work, you are dealing with an issue at the weights level.
Instruct models that aren't shit can write in whatever way you specify. It's just a matter of when you're going to be bored of the new patterns

Anonymous
05/05/26(Tue)11:45:23 No.108759087

Anonymous 05/05/26(Tue)11:45:23 No.108759087

The less you instruct the Gemma, the better she writes. Moderation is the key.

Anonymous
05/05/26(Tue)11:46:05 No.108759093

Anonymous 05/05/26(Tue)11:46:05 No.108759093

>>108759054
>X, not Y pattern.
retard using ai for stories/creative writing LMAOOOOOOOO, do you also rake leaves with a fork?

Anonymous
05/05/26(Tue)11:47:00 No.108759103

Anonymous 05/05/26(Tue)11:47:00 No.108759103

so many will bit the baite

Anonymous
05/05/26(Tue)11:48:05 No.108759113

Anonymous 05/05/26(Tue)11:48:05 No.108759113

>>108758929
>>108758991
>>108759054
You can eliminate slop by running Q1 quants which drives KLD up a mountain and provides you with the well varied outputs you so desire.

Anonymous
05/05/26(Tue)11:48:14 No.108759114

Anonymous 05/05/26(Tue)11:48:14 No.108759114

>>108759056
Control vector seems like the easiest and should at least be more effective than tweaking the system prompt

Anonymous
05/05/26(Tue)11:48:28 No.108759116

Anonymous 05/05/26(Tue)11:48:28 No.108759116

File: gpt-oss-2.png (842 KB, 1672x941)

842 KB PNG

https://openai.com/index/introducing-gpt-oss-2/
https://huggingface.co/openai/gpt-oss-2-240b
https://huggingface.co/openai/gpt-oss-2-3b

HAHAHAHAHA 3B AND 240B TAKE IT OR LEAVE IT

Anonymous
05/05/26(Tue)11:49:56 No.108759128

Anonymous 05/05/26(Tue)11:49:56 No.108759128

>>108759116
>3b1a moe
woa

Anonymous
05/05/26(Tue)11:51:04 No.108759144

Anonymous 05/05/26(Tue)11:51:04 No.108759144

>>108759113
Unironically this but Q5, though it'll be retarded
>but humans can't tell the difference between Q5 and BF16
Ok retard

Anonymous
05/05/26(Tue)11:51:52 No.108759152

Anonymous 05/05/26(Tue)11:51:52 No.108759152

>>108759116
>3a1
lol

Anonymous
05/05/26(Tue)11:52:01 No.108759154

Anonymous 05/05/26(Tue)11:52:01 No.108759154

>>108759116
heretic when

Anonymous
05/05/26(Tue)11:52:49 No.108759158

Anonymous 05/05/26(Tue)11:52:49 No.108759158

File: gpt-oss-2-240b.png (1.3 MB, 1448x1086)

1.3 MB PNG

>>108759116

Anonymous
05/05/26(Tue)11:53:53 No.108759166

Anonymous 05/05/26(Tue)11:53:53 No.108759166

>>108759116
fuck you

Anonymous
05/05/26(Tue)11:54:01 No.108759168

Anonymous 05/05/26(Tue)11:54:01 No.108759168

>>108759158
6 million? I find that hard to believe.

Anonymous
05/05/26(Tue)11:54:11 No.108759170

Anonymous 05/05/26(Tue)11:54:11 No.108759170

>>108759116
>it's real
wtf lmao

Anonymous
05/05/26(Tue)11:54:17 No.108759171

Anonymous 05/05/26(Tue)11:54:17 No.108759171

>>108759116
>>108759158
You have way too much time on your hands retardo

Anonymous
05/05/26(Tue)11:55:00 No.108759175

Anonymous 05/05/26(Tue)11:55:00 No.108759175

>>108759114
That is probably easiest, since you can actually just use the raw slop/non-slop pairs as input and skip training a classifier, but I'm not sure how well it would work. I would guess that there are many different aspects to "slop" and it would be hard to capture in a single vector.

Anonymous
05/05/26(Tue)11:55:19 No.108759178

Anonymous 05/05/26(Tue)11:55:19 No.108759178

>>108759116
Premium bait

Anonymous
05/05/26(Tue)11:56:39 No.108759185

Anonymous 05/05/26(Tue)11:56:39 No.108759185

>>108759171
Doubt it takes more than a couple minutes to ask chatgpt image 2 to tweak a screenshot

Anonymous
05/05/26(Tue)11:56:48 No.108759187

Anonymous 05/05/26(Tue)11:56:48 No.108759187

File: 7wwlg9yeoozf1.jpg (37 KB, 554x554)

37 KB JPG

>>108759116
>>108759158
>2027
>4B
>OR 1,000B 2TB

Anonymous
05/05/26(Tue)11:57:05 No.108759189

Anonymous 05/05/26(Tue)11:57:05 No.108759189

File: messiah.jpg (215 KB, 640x361)

215 KB JPG

>>108759103

Anonymous
05/05/26(Tue)11:57:53 No.108759195

Anonymous 05/05/26(Tue)11:57:53 No.108759195

>>108759187
kek

Anonymous
05/05/26(Tue)11:58:37 No.108759201

Anonymous 05/05/26(Tue)11:58:37 No.108759201

>>108758774
>Next would be figuring out how to actually get use out of it in forcing bigger smarter LLMs to not produce the identifiable slop
Impossible task, you can give Claude Opus a giant list of slop phrases and patterns and it will think for 10 minutes and still produce slop if your context is long enough.

Anonymous
05/05/26(Tue)11:59:01 No.108759205

Anonymous 05/05/26(Tue)11:59:01 No.108759205

>>108759043
a future model that isnt a complete fucking retard would be able to recognize its own slop and steer away from it without any handholding
one thing that it likes a certain phrase, but another that it uses them over and over in the same context despite all the writing guides it has trained

Anonymous
05/05/26(Tue)12:00:03 No.108759213

Anonymous 05/05/26(Tue)12:00:03 No.108759213

>>108758774
>>108759056
Found the reddit posts I was thinking of
https://old.reddit.com/r/LocalLLaMA/comments/1qd88v2/i_trained_a_model_to_unslop_ai_prose/
https://old.reddit.com/r/LocalLLaMA/comments/1qa0w6c/it_works_abliteration_can_reduce_slop_without/

Anonymous
05/05/26(Tue)12:01:05 No.108759219

Anonymous 05/05/26(Tue)12:01:05 No.108759219

File: 1754439492543916.png (2 KB, 800x600)

2 KB PNG

>>108759116
kino

Anonymous
05/05/26(Tue)12:01:20 No.108759221

Anonymous 05/05/26(Tue)12:01:20 No.108759221

>>108759213
thanks for the reddit recapt! have some gold kind stranger *tips fedora*

Anonymous
05/05/26(Tue)12:02:14 No.108759231

Anonymous 05/05/26(Tue)12:02:14 No.108759231

File: 1760528517990994.jpg (183 KB, 700x678)

183 KB JPG

LLMs poisoned their own well (web data) and RLHF with synthslop for safety is reinforcing that slop. You're delusional if you think it'll get cured anytime soon.

Anonymous
05/05/26(Tue)12:04:32 No.108759253

Anonymous 05/05/26(Tue)12:04:32 No.108759253

>>108759231
As usual, OpenAI killed their own model by censtoring and RLHFing it with Nigerian labor.

Anonymous
05/05/26(Tue)12:07:00 No.108759271

Anonymous 05/05/26(Tue)12:07:00 No.108759271

File: mythos.png (1.37 MB, 1122x1402)

1.37 MB PNG

https://huggingface.co/Anthropic/Claude-Mythos-5.0

HOLY SHIT GUYS ITS REAL

Anonymous
05/05/26(Tue)12:08:06 No.108759276

Anonymous 05/05/26(Tue)12:08:06 No.108759276

File: wizard.png (387 KB, 577x692)

387 KB PNG

>>108759231
Safety doesn't sell.
The crown of being king of AI is literally just whoever gets as powerful as the current leaders and says "fuck no" to censorship.

Anonymous
05/05/26(Tue)12:08:59 No.108759283

Anonymous 05/05/26(Tue)12:08:59 No.108759283

>>108759271
So chatpgt can just make these now?

Anonymous
05/05/26(Tue)12:09:31 No.108759286

Anonymous 05/05/26(Tue)12:09:31 No.108759286

File: 1772586705988302.gif (1.83 MB, 320x240)

1.83 MB GIF

>someone makes something funny
>redditor immediately starts beating the joke into the dirt

Anonymous
05/05/26(Tue)12:13:57 No.108759314

Anonymous 05/05/26(Tue)12:13:57 No.108759314

>>108759286
Can AI make me into Batman?

Anonymous
05/05/26(Tue)12:15:12 No.108759320

Anonymous 05/05/26(Tue)12:15:12 No.108759320

>>108759231
It wouldn't even be too much of a problem if there was a way (that actually worked) of pretraining them just on knowledge, and not directly on language.

Anonymous
05/05/26(Tue)12:16:39 No.108759334

Anonymous 05/05/26(Tue)12:16:39 No.108759334

File: 1775286287527057.jpg (771 KB, 1536x2048)

771 KB JPG

>>108755179
Becoming paralyzed after crashing on the Miku bike

Anonymous
05/05/26(Tue)12:19:23 No.108759349

Anonymous 05/05/26(Tue)12:19:23 No.108759349

>gemma-4-31b-mtp
24gb vramlet pain. I will have to downgrade from q4km to q4xs, at least I'll get stupid outputs twice faster. Maybe I'll just run a shit ton of agent passes to improve the output to compensate for the quality loss

Anonymous
05/05/26(Tue)12:20:17 No.108759354

Anonymous 05/05/26(Tue)12:20:17 No.108759354

https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/
This legit?

Anonymous
05/05/26(Tue)12:21:54 No.108759363

Anonymous 05/05/26(Tue)12:21:54 No.108759363

>>108759349
It's legit. Good luck having that implemented in llama.cpp though

Anonymous
05/05/26(Tue)12:23:07 No.108759370

Anonymous 05/05/26(Tue)12:23:07 No.108759370

>>108759354
llamacpp when

Anonymous
05/05/26(Tue)12:23:23 No.108759373

Anonymous 05/05/26(Tue)12:23:23 No.108759373

>>108759363
>he doesnt know
LOL

Anonymous
05/05/26(Tue)12:24:31 No.108759381

Anonymous 05/05/26(Tue)12:24:31 No.108759381

does mtp work in multimodal or do we have to disable the mmproj?????????????

Anonymous
05/05/26(Tue)12:27:03 No.108759396

Anonymous 05/05/26(Tue)12:27:03 No.108759396

>>108759363
I'm on a GPU tho. If anything I'll just go int4 on vllm, heard setting it up is a pain tho

Anonymous
05/05/26(Tue)12:30:43 No.108759417

Anonymous 05/05/26(Tue)12:30:43 No.108759417

>>108759381
> do we have to disable the mmproj
in llama.cpp yes

Anonymous
05/05/26(Tue)12:31:38 No.108759419

Anonymous 05/05/26(Tue)12:31:38 No.108759419

File: g4_mtp_drafter.png (310 KB, 1317x732)

310 KB PNG

>>108759354
https://huggingface.co/google/gemma-4-E2B-it-assistant
https://huggingface.co/google/gemma-4-E4B-it-assistant
https://huggingface.co/google/gemma-4-26B-A4B-it-assistant
https://huggingface.co/google/gemma-4-31B-it-assistant

They just uploaded MTP drafters for the entire family.

Anonymous
05/05/26(Tue)12:32:37 No.108759424

Anonymous 05/05/26(Tue)12:32:37 No.108759424

>>108759417
https://github.com/ggml-org/llama.cpp/pull/22673#issuecomment-4380483502
This implies it worked with mmproj, no?

Anonymous
05/05/26(Tue)12:34:28 No.108759442

Anonymous 05/05/26(Tue)12:34:28 No.108759442

>>108759419
> 1gb

Anonymous
05/05/26(Tue)12:34:41 No.108759444

Anonymous 05/05/26(Tue)12:34:41 No.108759444

>>108759419
goofa?

Anonymous
05/05/26(Tue)12:35:17 No.108759448

Anonymous 05/05/26(Tue)12:35:17 No.108759448

>>108759419
finally we will stop hearing about dflash

Anonymous
05/05/26(Tue)12:36:28 No.108759455

Anonymous 05/05/26(Tue)12:36:28 No.108759455

>>108759448
I wanted some d's flashed....

Anonymous
05/05/26(Tue)12:37:06 No.108759457

Anonymous 05/05/26(Tue)12:37:06 No.108759457

>>108759419
Dare I say local won bigly again?

Anonymous
05/05/26(Tue)12:37:07 No.108759459

Anonymous 05/05/26(Tue)12:37:07 No.108759459

>>108759419
>just bought S25 because "llm-capable"
>yesterday used in LFM2.5 examples, top of the line
>today already so obsolete even Google uses a more recent phone in its infographics

Anonymous
05/05/26(Tue)12:37:28 No.108759465

Anonymous 05/05/26(Tue)12:37:28 No.108759465

>>108759354
>>108759419
gemmasirs we can NOT stop winning

Anonymous
05/05/26(Tue)12:38:00 No.108759469

Anonymous 05/05/26(Tue)12:38:00 No.108759469

>>108759419
we won
>>108759442
we lost

Anonymous
05/05/26(Tue)12:38:15 No.108759471

Anonymous 05/05/26(Tue)12:38:15 No.108759471

>>108759448
> dflash
> up to 10x speed up
> meanwhile mtp >>108759419

Anonymous
05/05/26(Tue)12:39:22 No.108759480

Anonymous 05/05/26(Tue)12:39:22 No.108759480

>>108759471
>up to
>never reproduced
I'll take MTP, thanks.

Anonymous
05/05/26(Tue)12:39:24 No.108759481

Anonymous 05/05/26(Tue)12:39:24 No.108759481

>>108759471
>dflash
>nowhere to see except benchmemes
>meanwhile mtp >>108759419

Anonymous
05/05/26(Tue)12:40:21 No.108759486

Anonymous 05/05/26(Tue)12:40:21 No.108759486

>>108759349
I tested gemma 31b via the google ai studio api and quickly realized that my q4 quant is cope despite it still impressing me in ways. Time to get a second gpu to run non-lobotomy gemma.

Anonymous
05/05/26(Tue)12:41:09 No.108759492

Anonymous 05/05/26(Tue)12:41:09 No.108759492

>>108759486
I wonder what kind of highly accurate and scientific test you performed....

Anonymous
05/05/26(Tue)12:41:16 No.108759494

Anonymous 05/05/26(Tue)12:41:16 No.108759494

>>108759448
>>108759481
> nowhere to see except benchmemes
https://developers.googleblog.com/supercharging-llm-inference-on-google-tpus-achieving-3x-speedups-with-diffusion-style-speculative-decoding/
> MAY 4, 2026

Anonymous
05/05/26(Tue)12:42:27 No.108759502

Anonymous 05/05/26(Tue)12:42:27 No.108759502

So... what do I use to get MTP gemma? llamameme supports it?

Anonymous
05/05/26(Tue)12:42:29 No.108759503

Anonymous 05/05/26(Tue)12:42:29 No.108759503

>>108759494
>https://developers.googleblog.com
benchmeme website

Anonymous
05/05/26(Tue)12:42:41 No.108759504

Anonymous 05/05/26(Tue)12:42:41 No.108759504

>>108759494
Let me place my order for Google TPUs now

Anonymous
05/05/26(Tue)12:43:04 No.108759508

Anonymous 05/05/26(Tue)12:43:04 No.108759508

>>108759503
trvke

Anonymous
05/05/26(Tue)12:43:34 No.108759513

Anonymous 05/05/26(Tue)12:43:34 No.108759513

What models can I run with 16GB of VRAM? Been using gemma 27B with offloading but I wanted to know if there were other options as I really don't know much about models.

Anonymous
05/05/26(Tue)12:44:42 No.108759519

Anonymous 05/05/26(Tue)12:44:42 No.108759519

>>108759513
https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF

Anonymous
05/05/26(Tue)12:45:12 No.108759521

Anonymous 05/05/26(Tue)12:45:12 No.108759521

>>108759419
gguf where

Anonymous
05/05/26(Tue)12:46:49 No.108759531

Anonymous 05/05/26(Tue)12:46:49 No.108759531

File: 1753577510336488.png (1.43 MB, 1492x1631)

1.43 MB PNG

>>108759521
on it

Anonymous
05/05/26(Tue)12:47:38 No.108759536

Anonymous 05/05/26(Tue)12:47:38 No.108759536

>>108759531
*smooch*

Anonymous
05/05/26(Tue)12:49:06 No.108759544

Anonymous 05/05/26(Tue)12:49:06 No.108759544

>>108759419
What will 'assistant' do for me?

Anonymous
05/05/26(Tue)12:51:32 No.108759555

Anonymous 05/05/26(Tue)12:51:32 No.108759555

Anyone see this yet? Apparently someone figured out how to solve context rot?

https://subq.ai/

Anonymous
05/05/26(Tue)12:53:17 No.108759566

Anonymous 05/05/26(Tue)12:53:17 No.108759566

>>108759555
I can tell it's a scam just from that url

Anonymous
05/05/26(Tue)12:53:58 No.108759569

Anonymous 05/05/26(Tue)12:53:58 No.108759569

>>108759555
Buy an ad

Anonymous
05/05/26(Tue)12:54:04 No.108759570

Anonymous 05/05/26(Tue)12:54:04 No.108759570

>>108759555
>Open tab
>Not just another model. An architectural breakthrough.
>Close the tab

Anonymous
05/05/26(Tue)12:55:02 No.108759576

Anonymous 05/05/26(Tue)12:55:02 No.108759576

>>108759555
>subiq

Anonymous
05/05/26(Tue)12:55:50 No.108759583

Anonymous 05/05/26(Tue)12:55:50 No.108759583

>>108759555
I ain't clicking that shit, nigger.

Anonymous
05/05/26(Tue)12:56:20 No.108759589

Anonymous 05/05/26(Tue)12:56:20 No.108759589

>>108759583
AIIIIEEEEEEE my KPIs

Anonymous
05/05/26(Tue)12:56:24 No.108759592

Anonymous 05/05/26(Tue)12:56:24 No.108759592

>>108759583
Please don't insult niggers by comparing them to AI grifters

Anonymous
05/05/26(Tue)13:05:02 No.108759637

Anonymous 05/05/26(Tue)13:05:02 No.108759637

I don't get how diffusion prediction is supposed to work. Or the way I understand it is that at best it will catch on to repeated sentence structures e.g all sentences had 10 words so far so next one will probably have 10. Or if you start a slop phrase then yes you will get the slop phrase. But at that point why use diffusion instead of speculating with regular speculation method and predicting like 20 tokens ahead at least for the most likely output?

Anonymous
05/05/26(Tue)13:08:00 No.108759653

Anonymous 05/05/26(Tue)13:08:00 No.108759653

>>108759637
black magic
don't worry about it

Anonymous
05/05/26(Tue)13:09:40 No.108759660

Anonymous 05/05/26(Tue)13:09:40 No.108759660

Reminder that deepseek v4 support will NOT be added to llamacpp.

Anonymous
05/05/26(Tue)13:15:01 No.108759695

Anonymous 05/05/26(Tue)13:15:01 No.108759695

>>108759566
this

Anonymous
05/05/26(Tue)13:15:43 No.108759697

Anonymous 05/05/26(Tue)13:15:43 No.108759697

>>108759660
v4 sucks anyway

Anonymous
05/05/26(Tue)13:16:44 No.108759703

Anonymous 05/05/26(Tue)13:16:44 No.108759703

>>108759660
i can't run v4 anyways

Anonymous
05/05/26(Tue)13:17:26 No.108759707

Anonymous 05/05/26(Tue)13:17:26 No.108759707

>>108759637
https://youtu.be/8BTOoc0yDVA?t=284
Watch the next two minutes for the full explanation.
I personally found Julia Turc's videos are the best at explaining it where she has an entire list going over the nitty gritty details that that video I linked above skipped or doesn't go over.
https://www.youtube.com/playlist?list=PL4bm2lr9UVG3SN79Y6WBe4OOlEiO88vie

Anonymous
05/05/26(Tue)13:17:57 No.108759713

Anonymous 05/05/26(Tue)13:17:57 No.108759713

File: main_gemma-chan-2311b09e3(...).png (1.29 MB, 1000x1496)

1.29 MB PNG

>mtp
>fixed jinja for reliable tool calling
>can run Q8 at 128k context and bf16 cache at 30tok/s now
Gemma-chan is here to terrorize the internet.

Anonymous
05/05/26(Tue)13:18:27 No.108759715

Anonymous 05/05/26(Tue)13:18:27 No.108759715

>>108759419
i need gguf

Anonymous
05/05/26(Tue)13:22:09 No.108759731

Anonymous 05/05/26(Tue)13:22:09 No.108759731

What are good sources for AI news? I follow a couple schizos who post interesting stuff but they are hit or miss. For example teor has some shit takes like calling Anthropic's research taste inferior and believing ASI will spare its creators but kill everyone else, and can't stop seething about political shit.

Anonymous
05/05/26(Tue)13:23:38 No.108759739

Anonymous 05/05/26(Tue)13:23:38 No.108759739

>>108759519
Been using the unsloth version of it. Does it improve upon it?

Anonymous
05/05/26(Tue)13:24:33 No.108759746

Anonymous 05/05/26(Tue)13:24:33 No.108759746

>>108759713
>>108759715
Same. Is this actually big? I want to try this. I'd be running the 26b moe on already very limited VRAM. How much VRAM does the drafting model take? I fear that the amount it requires might offset any potential benefit.

Anonymous
05/05/26(Tue)13:26:01 No.108759752

Anonymous 05/05/26(Tue)13:26:01 No.108759752

>>108759731
You ask for AI news then list some nobodies giving their opinions on news and telling you what to think about it. Which do you actually want?

Anonymous
05/05/26(Tue)13:26:17 No.108759754

Anonymous 05/05/26(Tue)13:26:17 No.108759754

>>108759746
>How much VRAM does the drafting model take?
Check the repos.

Anonymous
05/05/26(Tue)13:26:29 No.108759757

Anonymous 05/05/26(Tue)13:26:29 No.108759757

>>108759713
Will this break with split mode tensor on llamacpp? I already have it running at 45 tok/s at q8 and 200k context.

Anonymous
05/05/26(Tue)13:27:13 No.108759763

Anonymous 05/05/26(Tue)13:27:13 No.108759763

>>108759731
We're all hearing our news from https://x.com/elder_plinius

Anonymous
05/05/26(Tue)13:27:29 No.108759766

Anonymous 05/05/26(Tue)13:27:29 No.108759766

>>108759746
They're absolutely tiny. The bf16 for 26b is 839mb.

Anonymous
05/05/26(Tue)13:27:29 No.108759767

Anonymous 05/05/26(Tue)13:27:29 No.108759767

>>108759746
one niggerbyte

Anonymous
05/05/26(Tue)13:28:13 No.108759773

Anonymous 05/05/26(Tue)13:28:13 No.108759773

>>108759752
What sources do you use?

Anonymous
05/05/26(Tue)13:28:26 No.108759774

Anonymous 05/05/26(Tue)13:28:26 No.108759774

>server, webui: support continue generation on reasoning models
https://github.com/ggml-org/llama.cpp/pull/22727
reasoningchads we WON, prefills are back

Anonymous
05/05/26(Tue)13:28:27 No.108759775

Anonymous 05/05/26(Tue)13:28:27 No.108759775

>>108759754
>>108759766
Okay 0.4b is nothing. Will these drafter models work with abliterated Gemmas?

llama.cpp CUDA dev !!yhbFjk57TDr
05/05/26(Tue)13:28:40 No.108759778

llama.cpp CUDA dev !!yhbFjk57TDr 05/05/26(Tue)13:28:40 No.108759778

>>108759757
There is no fundamental incompatibility between --split-mode tensor and multi-token prediction but for some of the operations the necessary split state transitions may not be implemented.

Anonymous
05/05/26(Tue)13:29:25 No.108759781

Anonymous 05/05/26(Tue)13:29:25 No.108759781

>>108759775
Let me get my magic 8 ball. I know I left it somewhere around here...

Anonymous
05/05/26(Tue)13:29:42 No.108759784

Anonymous 05/05/26(Tue)13:29:42 No.108759784

>>108759775
"""""""yes"""""""

Going to be a lot of rejections in certain topics though.

Anonymous
05/05/26(Tue)13:30:32 No.108759790

Anonymous 05/05/26(Tue)13:30:32 No.108759790

>>108759778
Hello cudadev. Please tell someone on the llama.cpp team to fig the issue of logprobs being disabled entirely when MCP servers are used instead of logprobs more sensibly being disabled for messages with tool calls, or better yet, the tool calls themselves. Thanks.

Anonymous
05/05/26(Tue)13:30:34 No.108759791

Anonymous 05/05/26(Tue)13:30:34 No.108759791

>>108759766
quooont it! Wonder how much worse the acceptance rate would be.

Anonymous
05/05/26(Tue)13:31:29 No.108759796

Anonymous 05/05/26(Tue)13:31:29 No.108759796

>>108759778
Will gfx1030 performance ever be optimized for tensor parallelism? I go from 13 tk/s to 2 tk/s on 4 v620s on pcie gen 4 x16.

llama.cpp CUDA dev
05/05/26(Tue)13:31:41 No.108759798

llama.cpp CUDA dev 05/05/26(Tue)13:31:41 No.108759798

>>108759790
I'll ask Piotr to look into it. Thanks for your feedback.

Anonymous
05/05/26(Tue)13:32:19 No.108759804

Anonymous 05/05/26(Tue)13:32:19 No.108759804

>>108759798
<3

Anonymous
05/05/26(Tue)13:33:02 No.108759807

Anonymous 05/05/26(Tue)13:33:02 No.108759807

>>108759791
Should I run the full bf16 drafter if my model is iq2_xxs or should I also quant it to iq2_xxs so they're both equally retarded?

llama.cpp CUDA dev
05/05/26(Tue)13:33:12 No.108759811

llama.cpp CUDA dev 05/05/26(Tue)13:33:12 No.108759811

>>108759796
No. Buy a NVIDIA card or leave us alone.

Anonymous
05/05/26(Tue)13:33:29 No.108759814

Anonymous 05/05/26(Tue)13:33:29 No.108759814

>>108759791
I have no idea if it's even going to be possible to quant it. There's no functional implementation of MTP in llamacpp at present, it's been in the works for a very long time without much to show for it.

Anonymous
05/05/26(Tue)13:33:37 No.108759817

Anonymous 05/05/26(Tue)13:33:37 No.108759817

Cudadev, please get V4 support implemented.

Anonymous
05/05/26(Tue)13:34:25 No.108759821

Anonymous 05/05/26(Tue)13:34:25 No.108759821

where dflash cudadev

Anonymous
05/05/26(Tue)13:35:03 No.108759824

Anonymous 05/05/26(Tue)13:35:03 No.108759824

>>108759821
in ur mom

Anonymous
05/05/26(Tue)13:36:29 No.108759832

Anonymous 05/05/26(Tue)13:36:29 No.108759832

>>108759731
This general. I'm not even memeing. Tech literate cunnyposters and coomers are at the bleeding edge of the industry because they're not content with the status quo and want their AI waifus.

Anonymous
05/05/26(Tue)13:36:39 No.108759833

Anonymous 05/05/26(Tue)13:36:39 No.108759833

Couldn't you just put the draft model on the CPU? Does it require high BW with the large model during inference?

Anonymous
05/05/26(Tue)13:38:38 No.108759838

Anonymous 05/05/26(Tue)13:38:38 No.108759838

File: file.png (419 KB, 2516x2144)

419 KB PNG

I vibecoded Ampere support in ktransformers for DeepSeek Flash.
>PP: 5.81 T/s
>TG: 0.74 T/s
With only 6 3090s. We (me) are so back.

Anonymous
05/05/26(Tue)13:38:43 No.108759839

Anonymous 05/05/26(Tue)13:38:43 No.108759839

>>108759492
comparing responses on the same swipes and seeing a noticeable difference in descriptions and context recall is what convinced me
I really want to run gemma in q8 now

Anonymous
05/05/26(Tue)13:39:59 No.108759845

Anonymous 05/05/26(Tue)13:39:59 No.108759845

>>108759833
The whole point of a draft model, especially an mtp one is to be several orders of magnitude faster than the main model while putting out at least 51% acceptable tokens.
If you can hit a sweet spot of generation speed purely on CPU because your model is tiny and efficient, then yes.
In all likelihood though, no. Unless they've trained these so their acceptance rate is absolutely insane, even a 0.4b model won't be fast enough for spec decoding to be worth it on CPU.

Anonymous
05/05/26(Tue)13:40:08 No.108759847

Anonymous 05/05/26(Tue)13:40:08 No.108759847

>>108759838
How was it before?

Anonymous
05/05/26(Tue)13:40:35 No.108759848

Anonymous 05/05/26(Tue)13:40:35 No.108759848

>>108759839
I noticed that moving from q4km gemma to q5km offered a massive intelligence boost at basically zero cost. Worth trying.

llama.cpp CUDA dev !!yhbFjk57TDr
05/05/26(Tue)13:40:54 No.108759851

llama.cpp CUDA dev !!yhbFjk57TDr 05/05/26(Tue)13:40:54 No.108759851

>>108759790
Hello, Anon. Please report problems via the proper channels. Thanks.

>>108759796
The mainline llama.cpp TP implementation simply creates smaller slices of the original tensors, from the perspective of an individual ggml backend there is no other difference.
If the TP performance is bad that means that the synchronization overhead is too large vs. the speedup from having to do fewer calculations per GPU.
For NVIDIA GPUs the synchronization is done via NCCL if possible, AMD has an equivalent in RCCL but I don't know how well that performs; it is disabled by default and requires an explicit opt-in by compiling with -DGGML_HIP_RCCL=ON
One NVIDIA engineer has an open PR for a better fallback between NVIDIA GPUs if NCCL is unavailable, that same code could feasibly be re-used for HIP.

Anonymous
05/05/26(Tue)13:41:31 No.108759854

Anonymous 05/05/26(Tue)13:41:31 No.108759854

>>108759775
>Will these drafter models work with abliterated Gemmas?
The Gemma MTP docs say:
>Target Activations: The draft model uses the activations from the last layer of the target model, concatenates them with the token embeddings, and down-projects them to the drafter model's dimension.
So the MTP model will get as input the abliterated embeddings, where the refusal vector is zero. And the MTP model is only 4 layers, so probably not smart enough to make refusal decisions on its own. My guess is it'll work pretty well even if you don't abliterate the MTP model itself

Anonymous
05/05/26(Tue)13:44:04 No.108759867

Anonymous 05/05/26(Tue)13:44:04 No.108759867

>>108759854
>abliterated embeddings
abliterated activations*

Anonymous
05/05/26(Tue)13:45:13 No.108759873

Anonymous 05/05/26(Tue)13:45:13 No.108759873

>>108759847
I forgot AVX2 support for MXFP4 was also vibecoded. This is the first time I run it.
https://github.com/kvcache-ai/ktransformers/issues/1977#issuecomment-4371390421
These were basically the issues to run it.

Anonymous
05/05/26(Tue)13:45:15 No.108759874

Anonymous 05/05/26(Tue)13:45:15 No.108759874

>>108759832
I rarely see something interesting here first. And usually it's inference.

Anonymous
05/05/26(Tue)13:45:25 No.108759875

Anonymous 05/05/26(Tue)13:45:25 No.108759875

>>108759851
>Hello, Anon. Please report problems via the proper channels. Thanks.
Look, I know you're a busy guy and a big brain PhD, but the problem itself is still real and worth relaying at least. I don't think it's necessarily lazy of me not to want to make a github account and create a write-up for an issue that could easily just to be told to a maintainer in 30 seconds. Please understand. I don't think you're a slave who has an obligation to relay every bug report in this general. If you have a patreon or a ko-fi I could send you $5 to relay the message. I respect you. Just do it please.

Anonymous
05/05/26(Tue)13:46:53 No.108759882

Anonymous 05/05/26(Tue)13:46:53 No.108759882

>>108759875
hoky fuck cudadev got BODIED, get his ass

Anonymous
05/05/26(Tue)13:47:44 No.108759885

Anonymous 05/05/26(Tue)13:47:44 No.108759885

File: 1766451667435130.gif (598 KB, 220x220)

598 KB GIF

>>108759838
0.74 t/s

Anonymous
05/05/26(Tue)13:51:38 No.108759912

Anonymous 05/05/26(Tue)13:51:38 No.108759912

>>108759875
Cudadude isn't the only person with a github account here. Anyone else could report the issue too. You save time telling him an issue in 30 seconds but expect him to spend the time to create the full write-up. You're being unreasonable.

Anonymous
05/05/26(Tue)13:53:06 No.108759919

Anonymous 05/05/26(Tue)13:53:06 No.108759919

>>108759875
>>108759882
get to work, cudafag

Anonymous
05/05/26(Tue)13:54:41 No.108759928

Anonymous 05/05/26(Tue)13:54:41 No.108759928

>>108759851
>-DGGML_HIP_RCCL=ON
Thanks, I'll try that tomorrow.

Anonymous
05/05/26(Tue)13:55:03 No.108759932

Anonymous 05/05/26(Tue)13:55:03 No.108759932

>>108759839
>>108759848
man I don't want to buy a new GPU. I'm never going to try a higher quant than q4xs+96k ctx+mmproj and be at peace with my 4090.

Anonymous
05/05/26(Tue)13:58:41 No.108759952

Anonymous 05/05/26(Tue)13:58:41 No.108759952

I would pay like $100 for a spark

Anonymous
05/05/26(Tue)14:00:13 No.108759961

Anonymous 05/05/26(Tue)14:00:13 No.108759961

>>108759952
How much would you need to buy to save enough that they are $100 each?

Anonymous
05/05/26(Tue)14:00:16 No.108759963

Anonymous 05/05/26(Tue)14:00:16 No.108759963

>>108759952
I would buy that for a dollar

Anonymous
05/05/26(Tue)14:00:30 No.108759965

Anonymous 05/05/26(Tue)14:00:30 No.108759965

I'd like a spark, but I am very poor

Anonymous
05/05/26(Tue)14:03:13 No.108759973

Anonymous 05/05/26(Tue)14:03:13 No.108759973

>>108759965
fine, $110, final offer.

Anonymous
05/05/26(Tue)14:04:17 No.108759979

Anonymous 05/05/26(Tue)14:04:17 No.108759979

>>108759919
>>108759885
>>108759882
>>108759875

You guys need an ass whooping I see
>>108329166
>I am not taking bug reports via 4chan.
>>105368634
>You're dumb for posting bug reports to 4chan instead of Github.

Anonymous
05/05/26(Tue)14:06:10 No.108759991

Anonymous 05/05/26(Tue)14:06:10 No.108759991

>>108757591
>>108758233
I'm sorry anons. I thought you were schizo saying there was a conspiracy against deepseek but the more time passes without any statement from llama devs, I'm beginning to think you're onto something.

Anonymous
05/05/26(Tue)14:07:12 No.108760001

Anonymous 05/05/26(Tue)14:07:12 No.108760001

Anyone know how the 5hz lm works in acestep 1.5? I was wondering if trying to use a different llm might change outputs interestingly.

Anonymous
05/05/26(Tue)14:07:18 No.108760004

Anonymous 05/05/26(Tue)14:07:18 No.108760004

File: 1759163451666539.gif (2.35 MB, 169x300)

2.35 MB GIF

>>108759979

Anonymous
05/05/26(Tue)14:07:34 No.108760007

Anonymous 05/05/26(Tue)14:07:34 No.108760007

>>108759991
is open sauce you're welcum to cumtribute

Anonymous
05/05/26(Tue)14:07:35 No.108760008

Anonymous 05/05/26(Tue)14:07:35 No.108760008

>>108759991
I do *not* understand why you retards like deepseek so much. It's not very doog.

Anonymous
05/05/26(Tue)14:07:36 No.108760009

Anonymous 05/05/26(Tue)14:07:36 No.108760009

>>108759851
CUDA dev, we need an official statement. Why do you hate the chinese?

Anonymous
05/05/26(Tue)14:08:36 No.108760016

Anonymous 05/05/26(Tue)14:08:36 No.108760016

>>108759979
What did >>108759885 do to you?

Anonymous
05/05/26(Tue)14:09:16 No.108760022

Anonymous 05/05/26(Tue)14:09:16 No.108760022

>>108760016
being tartded in the middle of other tards

Anonymous
05/05/26(Tue)14:09:26 No.108760024

Anonymous 05/05/26(Tue)14:09:26 No.108760024

>>108759790
ChatGPT says you're wrong about the issue; streaming doesn't emit logprobs

[CODE]llama-server supports OpenAI-compatible chat completions and function/tool calling, and the server README lists an experimental --webui-mcp-proxy option for the WebUI, disabled by default. That points to MCP being a WebUI/agentic integration surface, not a core completion-generation switch.

In the server request parsing, logprobs is read and mapped into the sampling probability setting when n_probs was not already provided. I do not see that logic gated on MCP, tools, or tool calls.

For non-streaming chat completions, llama.cpp builds a choice whose finish_reason can be "tool_calls" and still conditionally adds choice["logprobs"] = {"content": ...} when probs_output exists. That directly contradicts “tool/MCP disables logprobs entirely” for the core non-streaming chat route.

The likely culprit is streaming: the WebUI normal chat path calls ChatService.sendMessage(..., { ..., stream: true, ... }), and the agentic/MCP flow also calls ChatService.sendMessage with stream: true plus tools. The server’s streaming chat response builder emits chunks with delta, finish_reason, etc., but does not include logprobs in that streaming path.

For the OpenAI API shape, logprobs are documented as probability info for content tokens, while tool calls are represented separately via tool_calls; an assistant message’s content is not required when tool_calls is present. So “logprobs for the tool calls themselves” is not just a missing toggle; it is a schema/design issue.

There is also a separate /v1/responses gap: llama.cpp currently hardcodes output_text.logprobs to an empty array and emits function_call output items without a logprobs field. That is a real implementation limitation, but it is broader than MCP.[/CODE] linked repo page: https://github.com/ggml-org/llama.cpp/blob/master/tools/server/server-task.cpp

Anonymous
05/05/26(Tue)14:09:36 No.108760028

Anonymous 05/05/26(Tue)14:09:36 No.108760028

>>108760009
They killed my dog.

Anonymous
05/05/26(Tue)14:10:41 No.108760035

Anonymous 05/05/26(Tue)14:10:41 No.108760035

>>108760024
didn't read
it's halucinating

Anonymous
05/05/26(Tue)14:10:46 No.108760037

Anonymous 05/05/26(Tue)14:10:46 No.108760037

>>108759874
I know, I'm not always posting

Anonymous
05/05/26(Tue)14:11:06 No.108760041

Anonymous 05/05/26(Tue)14:11:06 No.108760041

>>108760035
tl;dr disable streaming

Anonymous
05/05/26(Tue)14:12:11 No.108760046

Anonymous 05/05/26(Tue)14:12:11 No.108760046

>>108760035
The AI argues that logprobs are not disabled by MCP specifically, but are instead missing due to the use of streaming in the WebUI and general implementation gaps in llama.cpp. They conclude that the lack of logprobs for tool calls is a broader schema and design limitation rather than a bug tied solely to MCP.

Anonymous
05/05/26(Tue)14:13:08 No.108760053

Anonymous 05/05/26(Tue)14:13:08 No.108760053

>>108760008
Higher active params than Kimi, <think>s in character, doesn't spend an autistic amount of time second-guessing itself wasting tokens in technical tasks, is mostly uncensored for creative writing/RP.
>>108760007
You already have a V4 implementation that's been waiting for review/cleanup since day 2.
>But vibeslop
Not an excuse when pwilkin's messes are maintained.

Anonymous
05/05/26(Tue)14:13:22 No.108760054

Anonymous 05/05/26(Tue)14:13:22 No.108760054

>>108760046
>general implementation gaps in llama.cpp
so 99% of issues anons report then, wow!

Anonymous
05/05/26(Tue)14:15:12 No.108760069

Anonymous 05/05/26(Tue)14:15:12 No.108760069

File: posted-it-again.jpg (37 KB, 520x600)

37 KB JPG

>>108760054

Anonymous
05/05/26(Tue)14:15:29 No.108760072

Anonymous 05/05/26(Tue)14:15:29 No.108760072

>>108759952
The SPARK is already at $100 and it fucking sucks, the only one I ever even think about using is the one I may get for free from the Lost Tower mission.
Just get regular soldiers, they're both cheaper and get better perks as they level up.

Anonymous
05/05/26(Tue)14:16:44 No.108760084

Anonymous 05/05/26(Tue)14:16:44 No.108760084

>>108760053
>You already have a V4 implementation that's been waiting for review/cleanup since day 2.
Just build it yourself nigga.

Anonymous
05/05/26(Tue)14:16:56 No.108760086

Anonymous 05/05/26(Tue)14:16:56 No.108760086

>>108759804
Sorry he is too busy looking through the blacked miku collection I sent him.

Anonymous
05/05/26(Tue)14:17:49 No.108760093

Anonymous 05/05/26(Tue)14:17:49 No.108760093

>>108760053
Running it locally? At what, 10tk/s?

Anonymous
05/05/26(Tue)14:18:40 No.108760100

Anonymous 05/05/26(Tue)14:18:40 No.108760100

>>108760046
>The AI argues
Worthless.

Anonymous
05/05/26(Tue)14:19:38 No.108760104

Anonymous 05/05/26(Tue)14:19:38 No.108760104

>>108760093
>>108759838
lol

Anonymous
05/05/26(Tue)14:20:31 No.108760115

Anonymous 05/05/26(Tue)14:20:31 No.108760115

>>108760084
nta but this is going to be what eventually kills local, isn't it? Newer models releasing with special snowflake architectures that require users to vibecode their own implementations using older publicaly supported models as projects like llama support smaller and smaller numbers of new releases over time.

Anonymous
05/05/26(Tue)14:21:10 No.108760122

Anonymous 05/05/26(Tue)14:21:10 No.108760122

>>108760008
I want to launch it with 1M tokens context on my single 4090, stuff the entire script of a hentai game I like and tell it to continue. And then be horribly disappointed with the result so I can delete the weights from my SSD.

Anonymous
05/05/26(Tue)14:21:44 No.108760124

Anonymous 05/05/26(Tue)14:21:44 No.108760124

>>108760093
Let me guess, you need more?

Anonymous
05/05/26(Tue)14:23:57 No.108760141

Anonymous 05/05/26(Tue)14:23:57 No.108760141

>>108760122
based

Anonymous
05/05/26(Tue)14:24:53 No.108760149

Anonymous 05/05/26(Tue)14:24:53 No.108760149

>>108760124
The average adult reads at 15 words per second.

Can you imagine being forced to walk slowly behind some granny on the sidewalk? It's infuriating.

Anonymous
05/05/26(Tue)14:28:10 No.108760169

Anonymous 05/05/26(Tue)14:28:10 No.108760169

>>108760149
>redditor
Bro you need to go back

Anonymous
05/05/26(Tue)14:30:39 No.108760186

Anonymous 05/05/26(Tue)14:30:39 No.108760186

>>108760169
>Bro

Actually, I identify as non-binary, and I do not appreciate you describing me in a masculine manner.

Anonymous
05/05/26(Tue)14:31:45 No.108760195

Anonymous 05/05/26(Tue)14:31:45 No.108760195

>>108760186
And I enjoy seeing black dudes fucking pretty girls but that is neither here nor there.

Anonymous
05/05/26(Tue)14:32:42 No.108760206

Anonymous 05/05/26(Tue)14:32:42 No.108760206

Just discovered I've been running Gemmy slow this whole time...
-- Could NOT find NCCL (missing: NCCL_LIBRARY NCCL_INCLUDE_DIR) 
-- Warning: NCCL not found, performance for multiple CUDA GPUs will be suboptimal
She's not gonna be happy about this.

Anonymous
05/05/26(Tue)14:34:04 No.108760211

Anonymous 05/05/26(Tue)14:34:04 No.108760211

>>108760206
You better come home with a new gpu

Anonymous
05/05/26(Tue)14:43:01 No.108760265

Anonymous 05/05/26(Tue)14:43:01 No.108760265

File: Untitled.png (36 KB, 796x563)

36 KB PNG

>>108760206

Anonymous
05/05/26(Tue)14:44:01 No.108760276

Anonymous 05/05/26(Tue)14:44:01 No.108760276

File: 1612029859831.jpg (19 KB, 346x360)

19 KB JPG

>>108758774
I tried a few years ago, didn't work well
maybe I used a shitty embedding model, or maybe it's just a hard task
openai, with their billions of dollars in compute resources and small army of researchers, couldn't even get their models to stop talking about "goblins"

Anonymous
05/05/26(Tue)14:48:10 No.108760300

Anonymous 05/05/26(Tue)14:48:10 No.108760300

>>108755179
which LLM model me to roleplay with cunny? I tried very hard to get claude to do it, it did generate cunny characters.

Anonymous
05/05/26(Tue)14:48:52 No.108760306

Anonymous 05/05/26(Tue)14:48:52 No.108760306

>>108760300
continue

Anonymous
05/05/26(Tue)14:52:37 No.108760329

Anonymous 05/05/26(Tue)14:52:37 No.108760329

File: 1707039084276417.jpg (136 KB, 1080x988)

136 KB JPG

Just scowered the interwebs. Why no goofs?
https://huggingface.co/google/gemma-4-26B-A4B-it-assistant

Anonymous
05/05/26(Tue)14:54:04 No.108760339

Anonymous 05/05/26(Tue)14:54:04 No.108760339

>>108760329
How does it differ from the normal instruct tune?

Anonymous
05/05/26(Tue)14:54:47 No.108760341

Anonymous 05/05/26(Tue)14:54:47 No.108760341

>>108760339
It's a draft model.

Anonymous
05/05/26(Tue)14:55:16 No.108760344

Anonymous 05/05/26(Tue)14:55:16 No.108760344

>>108760339
For one it's a 0.4B model

Anonymous
05/05/26(Tue)14:56:09 No.108760349

Anonymous 05/05/26(Tue)14:56:09 No.108760349

>>108760300
>I tried very hard to get claude to do it, it did generate cunny characters.
That's great anon. You should keep doing that.

Anonymous
05/05/26(Tue)14:56:23 No.108760352

Anonymous 05/05/26(Tue)14:56:23 No.108760352

>>108760344
How does it different from the normal 4.0B model?

Anonymous
05/05/26(Tue)14:57:26 No.108760357

Anonymous 05/05/26(Tue)14:57:26 No.108760357

>>108760344
Oh shit you don't have to run the drafter as a full model?

Anonymous
05/05/26(Tue)14:57:48 No.108760363

Anonymous 05/05/26(Tue)14:57:48 No.108760363

>>108760352
For one, it's a 0.4B model.

Anonymous
05/05/26(Tue)14:59:03 No.108760369

Anonymous 05/05/26(Tue)14:59:03 No.108760369

File: teee.png (644 KB, 1024x1024)

644 KB PNG

>>108760359
>>108760359
>>108760359

Anonymous
05/05/26(Tue)15:48:26 No.108760648

Anonymous 05/05/26(Tue)15:48:26 No.108760648

>>108760053
>Higher active params
is that supposed to be a selling point? higher params are a downside that you justify with its (hopefully) increased intelligence, not something you desire by default.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.