/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 05/08/26(Fri)15:03:35 No.108781058

File: 1748664921307779.jpg (1.74 MB, 3840x2160)

1.74 MB JPG

/lmg/ - Local Models General Anonymous 05/08/26(Fri)15:03:35 No.108781058 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108774961 & >>108770835

►News
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/08/26(Fri)15:03:50 No.108781061

Anonymous 05/08/26(Fri)15:03:50 No.108781061

File: hg9078.jpg (38 KB, 474x550)

38 KB JPG

►Recent Highlights from the Previous Thread: >>108774961

--Simultaneous motion-text generation and fossing Meta Quest for local AI:
>108776577 >108776608 >108776838 >108776873 >108776902 >108776976 >108777316 >108777355
--JEPA's potential to replace LLMs and achieve AGI:
>108775320 >108775338 >108775519 >108775550 >108775598 >108775906 >108776017 >108775560
--Skepticism over atomic-llama-cpp-turboquant performance gains and context loss:
>108778292 >108778468 >108778326 >108778822 >108778826 >108778315
--Debating special token visibility and frontend compatibility for Gemma 4:
>108777951 >108778000 >108778030 >108778373 >108778397 >108778016
--Gemma-4 31B dense vs 26B MoE performance and quality:
>108779845 >108779875 >108779949 >108779959
--Using ik_llama MTP branch to boost Gemma 4 token speed:
>108779698 >108779702 >108779713 >108779724 >108779746 >108779758 >108779838
--Lack of official llama.cpp MTP support for Gemma 4:
>108776968 >108777438 >108777428 >108777442 >108777516
--Evaluating Natural Language Autoencoders for latent reasoning and interpretability:
>108775022 >108775164 >108777361 >108775712 >108777854
--Lorebook utility vs character cards and KV cache optimization:
>108778486 >108778519 >108778552 >108778531
--Zyphra releases ZAYA1-74B MoE model pretrained on AMD hardware:
>108775821 >108776267 >108779427
--vLLM planning to drop support for hardware under SM90:
>108775627 >108775642 >108777614
--Skepticism regarding the utility of Anthropic's natural language autoencoder research:
>108777454 >108777738 >108779507 >108779529 >108779689
--Feasibility and timeline of local high-fidelity animation and video generation:
>108775962 >108775971 >108776011 >108776038 >108776074
--Logs:
>108776109 >108776396 >108777361 >108779104 >108779287 >108779293 >108779337 >108779401 >108779443
--Miku (free space):
>108775160 >108779050

►Recent Highlight Posts from the Previous Thread: >>108774965

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/08/26(Fri)15:07:40 No.108781093

Anonymous 05/08/26(Fri)15:07:40 No.108781093

File: 1754870763583593.jpg (182 KB, 1024x1024)

182 KB JPG

Anonymous
05/08/26(Fri)15:08:43 No.108781099

Anonymous 05/08/26(Fri)15:08:43 No.108781099

>>108781093
lewd mikus general

Anonymous
05/08/26(Fri)15:11:35 No.108781118

Anonymous 05/08/26(Fri)15:11:35 No.108781118

File: wejiggling.jpg (10 KB, 352x279)

10 KB JPG

Are all MoEs with <25B active parameters too retarded/schizo for roleplay? I think I'm noticing a pattern.

Anonymous
05/08/26(Fri)15:15:44 No.108781140

Anonymous 05/08/26(Fri)15:15:44 No.108781140

File: 1752446709127022.jpg (316 KB, 1320x2104)

316 KB JPG

>>108781118
yes

Anonymous
05/08/26(Fri)15:17:43 No.108781146

Anonymous 05/08/26(Fri)15:17:43 No.108781146

Mikulove

Anonymous
05/08/26(Fri)15:21:40 No.108781169

Anonymous 05/08/26(Fri)15:21:40 No.108781169

>>108781118
Generally dense parameter count correlates to baseline performance but larger expert parameter counts improve longform coherence to specific settings or rulesets. I was very impressed by Kimi K2 for D&D a while ago and I'm hoping Deepseek V4 gets support soon because on paper it should be a winner too.
/t.g/ crossboarder

Anonymous
05/08/26(Fri)15:21:43 No.108781170

Anonymous 05/08/26(Fri)15:21:43 No.108781170

>>108781118
No, they just need *not* to optimize them for speed at all costs.

Anonymous
05/08/26(Fri)15:24:21 No.108781186

Anonymous 05/08/26(Fri)15:24:21 No.108781186

>>108780753
Stop spreading misinformation you shill. DENSE > MoE and that is a fact.

Anonymous
05/08/26(Fri)15:26:34 No.108781194

Anonymous 05/08/26(Fri)15:26:34 No.108781194

The sweet spot for big models really seems to be something in the 32-40b dense layer range and then as many experts as needed to give broader knowledge and make its writing less slop.

Anonymous
05/08/26(Fri)15:26:39 No.108781196

Anonymous 05/08/26(Fri)15:26:39 No.108781196

>>108781118
MIMO 2.5 has some weird brainfarts when it falls out of distribution so to speak. But when it doesn't it is the only ~300B10A that feels as smart as glmchan. I tried most of those that released after GLM and they were all much more retarded.

Anonymous
05/08/26(Fri)15:27:16 No.108781203

Anonymous 05/08/26(Fri)15:27:16 No.108781203

I wonder what motivates people to work in areas like parallelism and kernel optimization. Higher level ideas are fun but most of the work is painful grinding to optimize details for a very specific setup that don't transfer.

Anonymous
05/08/26(Fri)15:27:49 No.108781206

Anonymous 05/08/26(Fri)15:27:49 No.108781206

>>108781203
mad bux if you're that guy

Anonymous
05/08/26(Fri)15:29:03 No.108781214

Anonymous 05/08/26(Fri)15:29:03 No.108781214

>>108781206
More than the researchers who have much more fun with their high level ideas?

Anonymous
05/08/26(Fri)15:29:47 No.108781220

Anonymous 05/08/26(Fri)15:29:47 No.108781220

>>108781186
MoE > Dense
For role-play when it's massive because dense does the repeating.

Anonymous
05/08/26(Fri)15:29:50 No.108781222

Anonymous 05/08/26(Fri)15:29:50 No.108781222

>>108781203
I have worked with HPC guys and a lot of them just have that kind of autism where hyperoptimizing things is extremely satisfying to them

Anonymous
05/08/26(Fri)15:30:29 No.108781225

Anonymous 05/08/26(Fri)15:30:29 No.108781225

File: file.png (673 KB, 554x554)

673 KB PNG

>>108781203
>Higher level ideas are fun but most of the work is painful grinding to optimize details for a very specific setup that don't transfer.

Anonymous
05/08/26(Fri)15:33:41 No.108781232

Anonymous 05/08/26(Fri)15:33:41 No.108781232

>>108781203
it is simply improper not to

Anonymous
05/08/26(Fri)15:33:54 No.108781233

Anonymous 05/08/26(Fri)15:33:54 No.108781233

>>108776109
how do i make 3d waifu?

Anonymous
05/08/26(Fri)15:34:17 No.108781235

Anonymous 05/08/26(Fri)15:34:17 No.108781235

would gemma 4 e2b at f16 be useful?

llama.cpp CUDA dev !!yhbFjk57TDr
05/08/26(Fri)15:34:44 No.108781238

llama.cpp CUDA dev !!yhbFjk57TDr 05/08/26(Fri)15:34:44 No.108781238

>>108781203
>>108781225
I do in fact have 2 Factorio shirts as well as 700 hours in the game itself.

Anonymous
05/08/26(Fri)15:40:17 No.108781264

Anonymous 05/08/26(Fri)15:40:17 No.108781264

>>108781233
If you're asking, you don't

Anonymous
05/08/26(Fri)15:41:15 No.108781270

Anonymous 05/08/26(Fri)15:41:15 No.108781270

>>108781264
pls do the needful and tell me

Anonymous
05/08/26(Fri)15:41:18 No.108781271

Anonymous 05/08/26(Fri)15:41:18 No.108781271

>>108781238
maybe autism is a desirable quality after all

Anonymous
05/08/26(Fri)15:42:16 No.108781275

Anonymous 05/08/26(Fri)15:42:16 No.108781275

>>108781271
evolutionary trait i say

Anonymous
05/08/26(Fri)15:42:46 No.108781277

Anonymous 05/08/26(Fri)15:42:46 No.108781277

>>108781196
Does MiMo think in-character? How big are its reasoning blocks in general?

Anonymous
05/08/26(Fri)15:43:04 No.108781280

Anonymous 05/08/26(Fri)15:43:04 No.108781280

>>108781271
society doesn't value autism, it just exploits it

Anonymous
05/08/26(Fri)15:44:01 No.108781286

Anonymous 05/08/26(Fri)15:44:01 No.108781286

>>108781280
but without it, it doesn't advance

Anonymous
05/08/26(Fri)15:44:01 No.108781287

Anonymous 05/08/26(Fri)15:44:01 No.108781287

>>108781277
Don't care and very short to looping+broken.

Anonymous
05/08/26(Fri)15:45:44 No.108781292

Anonymous 05/08/26(Fri)15:45:44 No.108781292

has anyone made a wife ai

Anonymous
05/08/26(Fri)15:47:29 No.108781301

Anonymous 05/08/26(Fri)15:47:29 No.108781301

File: 1753142296888433.png (484 KB, 469x902)

484 KB PNG

>>108781270

Anonymous
05/08/26(Fri)15:52:30 No.108781321

Anonymous 05/08/26(Fri)15:52:30 No.108781321

>>108781301
noooooo

Anonymous
05/08/26(Fri)15:53:44 No.108781324

Anonymous 05/08/26(Fri)15:53:44 No.108781324

>>108781196
Have you ran Qwen 235A22, Trinity 39813A, or Minimax m.x 299A10 and if so, how did it compare against any of them? Because personally the only one that came within a stone's throw of GLM was the larger qwen, and even then it was still noticeably behind, so I'd be pleased to find out MIMO is killing it for that <20A but >6A size category I'd written off as the useless middle child.

Anonymous
05/08/26(Fri)15:53:51 No.108781325

Anonymous 05/08/26(Fri)15:53:51 No.108781325

>>108781301
I made a similar anime girl assistant project except the model was 2D stylized pixel art. The assistant "quirky" archetype was exactly like this, didn't matter if I used glm 5 or qwen 8b. I mean, the main point of the chat is the conversation and ts is unbearable.

Anonymous
05/08/26(Fri)15:55:48 No.108781337

Anonymous 05/08/26(Fri)15:55:48 No.108781337

>>108781324
Yes I meant all those when I said: "I tried most of those that released after GLM". Just be prepared for occasional mistakes or completely fucked up gens you wouldn't see from GLM as a tradeoff for speedup and a fresh slop profile.

Anonymous
05/08/26(Fri)15:58:20 No.108781342

Anonymous 05/08/26(Fri)15:58:20 No.108781342

>>108781325
Is it really an anime girl if she doesn’t use Japanese?

Anonymous
05/08/26(Fri)16:01:26 No.108781359

Anonymous 05/08/26(Fri)16:01:26 No.108781359

>>108781325
What are you even complaining about?

Anonymous
05/08/26(Fri)16:03:01 No.108781370

Anonymous 05/08/26(Fri)16:03:01 No.108781370

>>108781118
yes
but even 30b active models below q4 can develop that tendency too

Anonymous
05/08/26(Fri)16:07:50 No.108781390

Anonymous 05/08/26(Fri)16:07:50 No.108781390

>>108781359
Emoji spam, tryhard attempt to be quirky, every reply is structured exactly the same as the previous ones. It gets old after like five turns.

Anonymous
05/08/26(Fri)16:10:52 No.108781413

Anonymous 05/08/26(Fri)16:10:52 No.108781413

I'm relatively new to alot of thing. I have a question about logs, and keeping visible to the AI beyond the context length. I read that there's things like rag/v2, anything llm, sillytavern, and I recently heard about a new model that can hold 12 million context token. What's the generally accept, best way to keep a short novel's worth of text within accessible to the AI? I'm using gemma 4 and lm studio if that matters.

Anonymous
05/08/26(Fri)16:10:58 No.108781414

Anonymous 05/08/26(Fri)16:10:58 No.108781414

>>108781203
Its a fuckhuge puzzle and it tickles the brain in the right areas. Ive been looking at these things and skimming through the source code just to see how it works and maybe contribute.
Its pretty daunting though. The learning curve is real and the only guide you have is the voices in your head.

Anonymous
05/08/26(Fri)16:15:41 No.108781446

Anonymous 05/08/26(Fri)16:15:41 No.108781446

>>108781413
The easiest way is to summarize the history of what happened so far.

Anonymous
05/08/26(Fri)16:18:13 No.108781462

Anonymous 05/08/26(Fri)16:18:13 No.108781462

>>108781390
L2P you can fix this with a couple of examples, transformers are pattern matchers

Anonymous
05/08/26(Fri)16:19:27 No.108781474

Anonymous 05/08/26(Fri)16:19:27 No.108781474

>>108781462
>L2P
It is 2026 saar.

Anonymous
05/08/26(Fri)16:20:08 No.108781482

Anonymous 05/08/26(Fri)16:20:08 No.108781482

>>108781446
Is there a plug-in or an extension I can use that's best for that? I could always ask the AI to summarize the story and stick it in a .txt somewhere Just wondering if there's anything better than doing just that.

Anonymous
05/08/26(Fri)16:23:20 No.108781506

Anonymous 05/08/26(Fri)16:23:20 No.108781506

>>108781474
Sure and the same principles hold, it is still f(prompt)=logprobs the prompt is entirely what determines the output
L2P

Anonymous
05/08/26(Fri)16:26:14 No.108781524

Anonymous 05/08/26(Fri)16:26:14 No.108781524

File: 1768811183665914.png (27 KB, 1219x418)

27 KB PNG

>>108781474
>It is 2026 saar.
motherfucking titor is posting again

Anonymous
05/08/26(Fri)16:26:18 No.108781526

Anonymous 05/08/26(Fri)16:26:18 No.108781526

>>108781506
If you are actually serious about this then you are a vramlet who has never used glm or above. Learn to prompt is a cope we used to pass around when all we had was llama 1 or 2. Bigger models fix almost everything and things that they don't fix can't be prompted away.

Anonymous
05/08/26(Fri)16:28:44 No.108781536

Anonymous 05/08/26(Fri)16:28:44 No.108781536

TheDrummer.

Anonymous
05/08/26(Fri)16:33:48 No.108781564

Anonymous 05/08/26(Fri)16:33:48 No.108781564

>>108781506
>it is still f(prompt)=logprobs
Ah, it's you.
Yes, not prompting like a retard is important to control the `prompt` part. But when is your infinite wisdom that tells you to repost this advice going to give you a hint that `f(x)` is perhaps even more important?

Anonymous
05/08/26(Fri)16:36:59 No.108781587

Anonymous 05/08/26(Fri)16:36:59 No.108781587

>>108781526
is it even worth getting hardware to run glm for erp when 31b exists?
I think ill just wait instead of ewastemaxxing

Anonymous
05/08/26(Fri)16:39:01 No.108781602

Anonymous 05/08/26(Fri)16:39:01 No.108781602

>>108781587
It is better than gemma.

Anonymous
05/08/26(Fri)16:39:11 No.108781604

Anonymous 05/08/26(Fri)16:39:11 No.108781604

File: b458fc57198e2adbf77638b91(...).jpg (1.17 MB, 2725x3074)

1.17 MB JPG

>>108781536
Make a Medium 3.5 tune, and don't be so assistant pilled as behemoth X this time. Also tune for Q8 and BF16, not everyone uses Q5. Stop being retarded with saying "Q8 and up is cursed". Your goal should be to make the model not repetitive but also include ranchy words/tokens, while able to do things without specific permission. That's it.

Anonymous
05/08/26(Fri)16:39:49 No.108781608

Anonymous 05/08/26(Fri)16:39:49 No.108781608

>>108781564
NTA
The jump in quality is very obvious, consider whether you're coomer enough to shell out for it and whether your scenarios need a model that understands nuance.

Anonymous
05/08/26(Fri)16:42:13 No.108781625

Anonymous 05/08/26(Fri)16:42:13 No.108781625

>>108781602
yeah i bet it is
but hardware required is at least 2 3090s no?

Anonymous
05/08/26(Fri)16:42:42 No.108781627

Anonymous 05/08/26(Fri)16:42:42 No.108781627

>>108781587
It really depends on what kind of slop grinds your gears less. Both have their irritations. I find myself using Gemma more than GLM just for expediency.
t. can run almost every local model released.

Anonymous
05/08/26(Fri)16:53:59 No.108781679

Anonymous 05/08/26(Fri)16:53:59 No.108781679

https://www.youtube.com/watch?v=pmAgMtF__EY
https://www.youtube.com/watch?v=VSUtbpUNZpE
https://www.youtube.com/watch?v=8fMNHUUmnIE

i cant stop watching these absolute slops

Anonymous
05/08/26(Fri)16:55:38 No.108781688

Anonymous 05/08/26(Fri)16:55:38 No.108781688

>>108781625
5090.

Anonymous
05/08/26(Fri)16:56:31 No.108781693

Anonymous 05/08/26(Fri)16:56:31 No.108781693

I think my dopamine receptors are burnt out. I need a break.

Anonymous
05/08/26(Fri)17:03:08 No.108781732

Anonymous 05/08/26(Fri)17:03:08 No.108781732

>>108781693
What model shriveled your balls and what character cards do you recommend?

Anonymous
05/08/26(Fri)17:09:45 No.108781764

Anonymous 05/08/26(Fri)17:09:45 No.108781764

>>108781058
Why is teto nervous?

Anonymous
05/08/26(Fri)17:13:26 No.108781782

Anonymous 05/08/26(Fri)17:13:26 No.108781782

>DDR6 is now expected to come out in 2028 or 2029
Well, I am still going to upgrade in 2027 but damn I was hoping I would catch DDR6 on the way in as well.

Anonymous
05/08/26(Fri)17:16:27 No.108781795

Anonymous 05/08/26(Fri)17:16:27 No.108781795

>>108781782
ddr6 64gb is going to cost a kidney

Anonymous
05/08/26(Fri)17:24:26 No.108781839

Anonymous 05/08/26(Fri)17:24:26 No.108781839

>there is no personal computing lobby organization
huh.

Anonymous
05/08/26(Fri)17:26:46 No.108781852

Anonymous 05/08/26(Fri)17:26:46 No.108781852

>>108781839
if you make it, they will come

Anonymous
05/08/26(Fri)17:30:38 No.108781879

Anonymous 05/08/26(Fri)17:30:38 No.108781879

>>108781482
NTA and this is late but SillyTavern has a built in summarize extension.
A lot of front ends have either a summarize or compact context function these days, I've never used lmstudio but look around for those terms and you should find it relatively quickly.

Anonymous
05/08/26(Fri)17:31:00 No.108781885

Anonymous 05/08/26(Fri)17:31:00 No.108781885

>>108781679
videos don't seem to be getting many views

quality looks too good to be local ltx or wan. and assuming a bad api model, like veo 3.1 lite, it still costs around $30 to generate a 10 min video

not sure what they are gaining from these or why they are being made in such a scale by so many different channels

Anonymous
05/08/26(Fri)17:33:11 No.108781900

Anonymous 05/08/26(Fri)17:33:11 No.108781900

>>108781839
I think lobbying for personal computing is sort of implied by the goals of groups like the ASF, FSF, or whatever, isn't it? Can't run software without personal hardware.

Anonymous
05/08/26(Fri)17:35:36 No.108781913

Anonymous 05/08/26(Fri)17:35:36 No.108781913

>>108781885
>not sure what they are gaining from these or why they are being made in such a scale by so many different channels
probably a gamble and probably like 50 channels ran by one person or group of people, if any of them takes of itd probably pay off bigly, i find these videos very addicting idk why kek

Anonymous
05/08/26(Fri)17:37:16 No.108781924

Anonymous 05/08/26(Fri)17:37:16 No.108781924

>>108781900
fsf has zero interest in you being able to play games on an Intel card.

Anonymous
05/08/26(Fri)17:52:17 No.108782014

Anonymous 05/08/26(Fri)17:52:17 No.108782014

>>108781782
>>108781795
It's a pipedream for local. The world will be terminally jeeted by then. Even if it does somehow get produced, it won't be for (you).

Anonymous
05/08/26(Fri)17:54:47 No.108782031

Anonymous 05/08/26(Fri)17:54:47 No.108782031

>>108782014
Why can't there be a massive war and spiraling deflation?

Anonymous
05/08/26(Fri)17:56:22 No.108782039

Anonymous 05/08/26(Fri)17:56:22 No.108782039

>>108782031
Well, you may get at least one of those things.

Anonymous
05/08/26(Fri)18:01:17 No.108782065

Anonymous 05/08/26(Fri)18:01:17 No.108782065

Oh, I was more thinking about the right to even OWN personal computing hardware, which seems to be slipping out of the overton window.
Trying to enforce cross-compatible standards sounds great in theory but would realistically just be a path to where we are now with extra steps, since nvidia's lobby has the most cash to throw around, provided jensen hasn't spent it all on new jackets.

Anonymous
05/08/26(Fri)18:03:00 No.108782073

Anonymous 05/08/26(Fri)18:03:00 No.108782073

>>108782031
there is a war tho
stock market doesnt give a fuck with semiconductors at ath

Anonymous
05/08/26(Fri)18:03:04 No.108782075

Anonymous 05/08/26(Fri)18:03:04 No.108782075

>>108782065
>slipping out of the overton window.
do you want to have a conversation or just spew buzzwords at each other?

Anonymous
05/08/26(Fri)18:29:38 No.108782238

Anonymous 05/08/26(Fri)18:29:38 No.108782238

>>108782075
..Anon, that was me using it correctly and concisely.
Would you have preferred that I said
>Oh, I was more thinking about the right to even OWN personal computing hardware, which seems to be less commonly held as an important thing after a combination of corporate greed, consolidation, and people becoming more and more used to using the underpowered devices they have (phones) more as dumb terminals/thin clients completely reliant on cloud compute, with even gaming companies attempting to pivot to streaming services rather than hardware ala the 'this is an xbox' (everything is an xbox) marketing campaign.

Anonymous
05/08/26(Fri)18:31:15 No.108782247

Anonymous 05/08/26(Fri)18:31:15 No.108782247

>>108782075
do you want to have a conversation or just seethe over semantics?

Anonymous
05/08/26(Fri)18:31:36 No.108782251

Anonymous 05/08/26(Fri)18:31:36 No.108782251

>>108782238
The real reason they want you on services is that there is no right to not be billed for services you don't use/need.

Anonymous
05/08/26(Fri)18:34:58 No.108782267

Anonymous 05/08/26(Fri)18:34:58 No.108782267

>>108782238
Streaming everything seems the goal, but I don't see how it'd be feasible given the heavy hardware requirement it'd cost + the network latency.

Anonymous
05/08/26(Fri)18:40:22 No.108782302

Anonymous 05/08/26(Fri)18:40:22 No.108782302

>>108782267
>but I don't see how it'd be feasible given the heavy hardware requirement it'd cost + the network latency.
It's a universally shittier experience (You can try it right now, you can stream xbox games from servers and play them on a variety of pissweak devices) because of the latency and stream quality, but hardware cost isn't an issue for big gaming companies like MS, Sony, et al - They don't actually need to invest in big datacenters (Although MS already has) because they can just rent compute from Amazon or whomever to keep up with demand and cut it when it wanes. Running videogames at stream quality (shitty graphics) isn't as intensive as genAI, as anyone in this thread well knows.

Anonymous
05/08/26(Fri)19:32:07 No.108782569

Anonymous 05/08/26(Fri)19:32:07 No.108782569

>>108782302
>Running videogames at stream quality (shitty graphics) isn't as intensive as genAI, as anyone in this thread well knows.
Not sure what you mean, modern video games max out your GPU same as AI and on top of that the datacenter has to run hw encoding on the frames to stream them to you.

Anonymous
05/08/26(Fri)19:32:34 No.108782573

Anonymous 05/08/26(Fri)19:32:34 No.108782573

Holy shit, just tried the dice roller ST extension with Gemma and it just works. She calls for rolls exactly when she needs to. WTF is this black magic????

Anonymous
05/08/26(Fri)19:36:35 No.108782588

Anonymous 05/08/26(Fri)19:36:35 No.108782588

>>108781196
How do you RP with MiMo? It's censored to hell and back

Anonymous
05/08/26(Fri)20:03:10 No.108782687

Anonymous 05/08/26(Fri)20:03:10 No.108782687

Any MacBook Pro bro's wanna chime in on what you're running? Just out of the box I'm testing Qwen3.5-14B cause I don't know what's better or worse.

>MBP
>48gb RAM

Anonymous
05/08/26(Fri)20:04:38 No.108782693

Anonymous 05/08/26(Fri)20:04:38 No.108782693

anyone tried the new CUDA-Dz thing?

Anonymous
05/08/26(Fri)20:07:39 No.108782710

Anonymous 05/08/26(Fri)20:07:39 No.108782710

>>108782687
try gemma 4 31b

Anonymous
05/08/26(Fri)20:10:37 No.108782723

Anonymous 05/08/26(Fri)20:10:37 No.108782723

>>108782687
>Qwen3.5-14B
Oh nonononono. Start with Gemma 31b or one of the bigger Qwens.

Anonymous
05/08/26(Fri)20:12:31 No.108782730

Anonymous 05/08/26(Fri)20:12:31 No.108782730

>>108782687
Magistral Q8

Anonymous
05/08/26(Fri)20:14:54 No.108782743

Anonymous 05/08/26(Fri)20:14:54 No.108782743

File: 1711666924617032.gif (1.62 MB, 448x598)

1.62 MB GIF

>>108782687
gemma 4 31b, no contest

Anonymous
05/08/26(Fri)20:17:27 No.108782758

Anonymous 05/08/26(Fri)20:17:27 No.108782758

>>108782730
She's right >>108779337 Chloe is not a nigger. Ask a follow-up question about tan or make your question less vague

Anonymous
05/08/26(Fri)20:50:23 No.108782896

Anonymous 05/08/26(Fri)20:50:23 No.108782896

>litert_lm.Engine
>it tries to shit cache in the same dir as model, can't do it because it's not writable, shits itself
>ok, there's cache_dir arg, seems to work now
>add vision and audio
>it ignores cache_dir and tries to shit audio and video models' cache in the model dir and shits itself
google is truly a jeet company. Hire street shitters, produce code that shits itself

Anonymous
05/08/26(Fri)20:56:28 No.108782931

Anonymous 05/08/26(Fri)20:56:28 No.108782931

File: Screenshot_20260508_205536.png (461 KB, 2193x1358)

461 KB PNG

I found a way to get qwen 3.6 27B to do smut. It doesn't mind sexual stuff but draws a hard line at racism, makes zero sense.

Anonymous
05/08/26(Fri)21:02:15 No.108782954

Anonymous 05/08/26(Fri)21:02:15 No.108782954

>>108779750
>what harness is that?

Opencode with the "System" theme set. My Ghostty config has a light theme so opencode's system theme mirrors it

Anonymous
05/08/26(Fri)21:20:22 No.108783026

Anonymous 05/08/26(Fri)21:20:22 No.108783026

File: Screenshot_20260508_211923.png (75 KB, 1221x980)

75 KB PNG

whoops I broke it

Anonymous
05/08/26(Fri)21:21:49 No.108783036

Anonymous 05/08/26(Fri)21:21:49 No.108783036

>>108782931
>not x but y on the 2nd sentence
Fukin sloop

Anonymous
05/08/26(Fri)21:25:35 No.108783049

Anonymous 05/08/26(Fri)21:25:35 No.108783049

>>108782569
>modern video games max out your GPU same as AI
Nah my undervolted gpu stays under 150w while gaming but 200w on genning some ai slop image

Anonymous
05/08/26(Fri)21:27:00 No.108783056

Anonymous 05/08/26(Fri)21:27:00 No.108783056

>>108783049
5070ti btw

Anonymous
05/08/26(Fri)21:29:45 No.108783071

Anonymous 05/08/26(Fri)21:29:45 No.108783071

>>108782931
use one of the uncensored ones they work fine

Anonymous
05/08/26(Fri)21:30:34 No.108783074

Anonymous 05/08/26(Fri)21:30:34 No.108783074

>>108783071
Why use a braindead one when you can use the source?

Anonymous
05/08/26(Fri)21:31:39 No.108783077

Anonymous 05/08/26(Fri)21:31:39 No.108783077

>>108782931
Now lets see the response to that same prompt from Gemma4

Anonymous
05/08/26(Fri)21:31:52 No.108783078

Anonymous 05/08/26(Fri)21:31:52 No.108783078

>>108783071
Nah use gemma
Qwen is ass at writing
>>108783074
Cuz qwen sucks

Anonymous
05/08/26(Fri)21:32:12 No.108783082

Anonymous 05/08/26(Fri)21:32:12 No.108783082

>>108782931
Thats not difficult at all.
Don't people remember QwQ and all those older qwen models?
The problem was never that it wont write something, but qwen is uber-sloped and moves things sneakily in a safe direction.
No fun if you need massive editing and hand holding.
A sys prompt wont safe you. Is everybody new here now or what.

Anonymous
05/08/26(Fri)21:36:39 No.108783102

Anonymous 05/08/26(Fri)21:36:39 No.108783102

>>108783049
It means ether fps capped with vsync or your cpu/ram can't keep up with the gpu

Anonymous
05/08/26(Fri)21:38:42 No.108783110

Anonymous 05/08/26(Fri)21:38:42 No.108783110

is it worth spending time setting up sglang?

Anonymous
05/08/26(Fri)21:39:37 No.108783115

Anonymous 05/08/26(Fri)21:39:37 No.108783115

>>108783102
Nah 5070ti is just really good
Maxed out all settings on ac6 with stable 120fps
But yeah ill prolly get cpu bottlenecked before i max out my gpu

Anonymous
05/08/26(Fri)21:41:59 No.108783127

Anonymous 05/08/26(Fri)21:41:59 No.108783127

File: 1600643630880.jpg (103 KB, 541x541)

103 KB JPG

>>108782931
Okay, but its prose is utter shit, how am I supposed to jack off to this?

Anonymous
05/08/26(Fri)21:43:31 No.108783138

Anonymous 05/08/26(Fri)21:43:31 No.108783138

>>108783127
You switch to gemma

Anonymous
05/08/26(Fri)21:50:09 No.108783169

Anonymous 05/08/26(Fri)21:50:09 No.108783169

File: 1755896175443107.png (1.1 MB, 1958x2953)

1.1 MB PNG

What's currently the smartest sub 14B model??

Anonymous
05/08/26(Fri)21:52:35 No.108783184

Anonymous 05/08/26(Fri)21:52:35 No.108783184

https://huggingface.co/turboderp/gemma4-31b-it-DFlash-exl3

Anonymous
05/08/26(Fri)21:52:43 No.108783187

Anonymous 05/08/26(Fri)21:52:43 No.108783187

>>108782687
macbooks are especially well suited to MoE models but 48GB is a bit of an awkward size in terms of available models. I think recent gemmas are probably your best bet, whichever one of the 24B moe or 31B dense is better for your preferences

Anonymous
05/08/26(Fri)21:53:38 No.108783192

Anonymous 05/08/26(Fri)21:53:38 No.108783192

>>108783169
You should specify vram, not params. Retarded gemma quant is still better than smaller models

Anonymous
05/08/26(Fri)21:54:39 No.108783196

Anonymous 05/08/26(Fri)21:54:39 No.108783196

>>108783184
the fuck?

Anonymous
05/08/26(Fri)21:55:03 No.108783198

Anonymous 05/08/26(Fri)21:55:03 No.108783198

>>108783169
>lust provoking image

Anonymous
05/08/26(Fri)21:55:07 No.108783199

Anonymous 05/08/26(Fri)21:55:07 No.108783199

>>108782687
As the other said, run the 31B Gemma 4 + a small quant of the 26B MoE as a draft model.
Something like Q6 + Q2.

Anonymous
05/08/26(Fri)21:57:45 No.108783211

Anonymous 05/08/26(Fri)21:57:45 No.108783211

>>108783184
Doesn't exl3 always load the vision encoder? I don't need vision so the vram is wasted and vram is very tight on a 31B model already.

Anonymous
05/08/26(Fri)22:00:29 No.108783220

Anonymous 05/08/26(Fri)22:00:29 No.108783220

File: 1755896371827405.png (1.2 MB, 1958x2953)

1.2 MB PNG

>>108783192
Ok, what's the smartest model that fits in 16gb vram?

>>108783198
Yes?

Anonymous
05/08/26(Fri)22:02:37 No.108783228

Anonymous 05/08/26(Fri)22:02:37 No.108783228

>>108783211
if you use tabby, it's off by default
# Enables vision support if the model supports it. (default: False)
vision: false
>>108783220
https://huggingface.co/turboderp/gemma-4-31b-it-exl3/tree/2.50bpw

Anonymous
05/08/26(Fri)22:04:02 No.108783235

Anonymous 05/08/26(Fri)22:04:02 No.108783235

>>108783228
Interesting, I was building exl3 from source (lmao), guess I'll give it another go, exl2 was my go-to during the mixtral era.

Anonymous
05/08/26(Fri)22:05:17 No.108783241

Anonymous 05/08/26(Fri)22:05:17 No.108783241

>>108783235
https://github.com/theroyallab/tabbyAPI/

Anonymous
05/08/26(Fri)22:07:40 No.108783249

Anonymous 05/08/26(Fri)22:07:40 No.108783249

>>108783220
>A+ cow is not a standard biological or agricultural term, but in the context of beef grading, it likely refers to USDA Prime grade beef, which is the highest quality rating for marbling and tenderness. In some niche contexts or gaming/internet slang, it may refer to a high-quality or idealized version of a cow.
>idealized version of a cow.

Anonymous
05/08/26(Fri)22:12:02 No.108783262

Anonymous 05/08/26(Fri)22:12:02 No.108783262

>>108783220
Fuark now I need a loli cow card

Anonymous
05/08/26(Fri)22:13:40 No.108783270

Anonymous 05/08/26(Fri)22:13:40 No.108783270

>>108783220
A-anon, think of the advertisers...!

Anonymous
05/08/26(Fri)22:16:21 No.108783276

Anonymous 05/08/26(Fri)22:16:21 No.108783276

File: 1773082528574510.png (1.06 MB, 1450x1639)

1.06 MB PNG

>>108783249
Ofc, women are at their prime when they're 8-14yo

>>108783270
They love it

Anonymous
05/08/26(Fri)22:18:00 No.108783284

Anonymous 05/08/26(Fri)22:18:00 No.108783284

>>108783276
thats a girl right?

Anonymous
05/08/26(Fri)22:21:28 No.108783299

Anonymous 05/08/26(Fri)22:21:28 No.108783299

File: Screenshot_20260508_222002.png (142 KB, 955x712)

142 KB PNG

Since I put it on unrestricted the thinking loops for qwen 27B stopped.

Anonymous
05/08/26(Fri)22:23:32 No.108783304

Anonymous 05/08/26(Fri)22:23:32 No.108783304

>>108783196
>the fuck?
idk, just saw it
going to have to build it now
>>108783235
>Interesting, I was building exl3 from source (lmao), guess I'll give it another go
you'll still have to if you want to try this, it's on the dev branch so no prebuilt wheels
https://github.com/turboderp-org/exllamav3/tree/dev

Anonymous
05/08/26(Fri)22:23:38 No.108783306

Anonymous 05/08/26(Fri)22:23:38 No.108783306

File: 1776243569782666.webm (3.82 MB, 730x720)

3.82 MB WEBM

>>108783276
What anime?

Anonymous
05/08/26(Fri)22:26:46 No.108783318

Anonymous 05/08/26(Fri)22:26:46 No.108783318

File: Screenshot_20260508_222521.png (359 KB, 1108x1352)

359 KB PNG

I've seen enough unrestricted Qwen 3.6 is better than Gemma. The MoE model even works with it and moves at light speed

Anonymous
05/08/26(Fri)22:27:21 No.108783320

Anonymous 05/08/26(Fri)22:27:21 No.108783320

Questions from a dumbass thinking about putting together a local setup. Is there anything better right now than 2x used 3090 for a ~$3k build? And would I be at any particular disadvantage using two 3090's (or other gpu) that are different brands?

Anonymous
05/08/26(Fri)22:29:04 No.108783325

Anonymous 05/08/26(Fri)22:29:04 No.108783325

Cool.
Even the those premature builds of llama.cpp with mtp are really good.
gemma4 31b, partly offloaded to cpu. from 9.3 t/s to 14.5 t/s.
Thats a crazy jump. Can we really just have a free 40-50% increase?

Anonymous
05/08/26(Fri)22:31:12 No.108783335

Anonymous 05/08/26(Fri)22:31:12 No.108783335

>>108783284
>>108783306
Medalist

Anonymous
05/08/26(Fri)22:32:56 No.108783341

Anonymous 05/08/26(Fri)22:32:56 No.108783341

>>108783228
any models that could run well on a Raspberry Pi 5 16gb?

Anonymous
05/08/26(Fri)22:33:06 No.108783343

Anonymous 05/08/26(Fri)22:33:06 No.108783343

>>108783325
15.5 t/s if i say --draft-max 16.
But the output and thinking is more diverse than without MTP...that shouldn't happen right?
Gemma usually gives almost identical replies on refresh.

Anonymous
05/08/26(Fri)22:33:22 No.108783344

Anonymous 05/08/26(Fri)22:33:22 No.108783344

File: Untitled.png (101 KB, 766x468)

101 KB PNG

Gemma4-E4B-Uncensored-Cavewoman finetune

Anonymous
05/08/26(Fri)22:33:47 No.108783346

Anonymous 05/08/26(Fri)22:33:47 No.108783346

>>108783318
Fake and didn't read
Gemma mogs btw

Anonymous
05/08/26(Fri)22:34:15 No.108783349

Anonymous 05/08/26(Fri)22:34:15 No.108783349

>>108783344
That's one way of saving tokens.

Anonymous
05/08/26(Fri)22:41:58 No.108783375

Anonymous 05/08/26(Fri)22:41:58 No.108783375

>>108783343
>that shouldn't happen right?
It shouldn't.

Anonymous
05/08/26(Fri)22:43:13 No.108783381

Anonymous 05/08/26(Fri)22:43:13 No.108783381

>>108783375
awwww
Ah well, to be fair I downloaded some dudes llama.cpp fork and mtp ggufs.
Gotta wait and see I guess.

Anonymous
05/08/26(Fri)22:48:56 No.108783397

Anonymous 05/08/26(Fri)22:48:56 No.108783397

>>108783346
How can we test it anon, I figured out the keys to Qwen I would like a head to head

Anonymous
05/08/26(Fri)22:49:29 No.108783402

Anonymous 05/08/26(Fri)22:49:29 No.108783402

File: 1766284001562345.png (15 KB, 729x129)

15 KB PNG

>>108783349
Gemma 90b confirmed

Anonymous
05/08/26(Fri)22:52:19 No.108783419

Anonymous 05/08/26(Fri)22:52:19 No.108783419

>>108783341
just use one of the small qwens (4b or 8b) at a suitable quant level
anything you can fit into 16gb is going to be pretty rubbish but might work for you
also idk how fast ARM is with these things, I'd want to know whether the software is optimized for arm (it probably is now because of apple)

Anonymous
05/08/26(Fri)23:02:30 No.108783458

Anonymous 05/08/26(Fri)23:02:30 No.108783458

>>108781058
what is the strongest model i can fit onto 12gb vram? i am toying with qwen 3.5 9b in hermes but there are some dumb things it gets caught up in. i dont understand how moes work, can i get one of the bigger qwen moes to fit into and work on 12gb vram//16gb system ram?

Anonymous
05/08/26(Fri)23:03:36 No.108783462

Anonymous 05/08/26(Fri)23:03:36 No.108783462

>>108783458
Gemma

Anonymous
05/08/26(Fri)23:06:20 No.108783474

Anonymous 05/08/26(Fri)23:06:20 No.108783474

>>108783458
Some moe model and offload to ram.
>can i get one of the bigger qwen moes to fit into and work on 12gb vram//16gb system ram?
yeah absolutely.
https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF
A3B moe is fast enough to run on the CPU.
I wouldnt go lower than 4_k_m but you can experiment. Not super fast but not super slow either.
If you want code take the above qwen mode. If you want good writing take the gemma4 moe one.

Anonymous
05/08/26(Fri)23:11:57 No.108783492

Anonymous 05/08/26(Fri)23:11:57 No.108783492

>>108783318
>Not x but y in first line
Fuck off qwen shill

Anonymous
05/08/26(Fri)23:14:30 No.108783502

Anonymous 05/08/26(Fri)23:14:30 No.108783502

>>108783474
>https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF
has this model got vision? im trying to use it to configure vms and get the state of them via scrots

Anonymous
05/08/26(Fri)23:18:01 No.108783520

Anonymous 05/08/26(Fri)23:18:01 No.108783520

>>108783492
It's so obvious where the labs get their training data from. Gemma was probably from the same data package that every US lab has, Qwen distilled. DS4 doesn't have this issue.

Anonymous
05/08/26(Fri)23:39:09 No.108783607

Anonymous 05/08/26(Fri)23:39:09 No.108783607

>>108783502
yeah, its the mmproj files at the bottom.
just take the mmproj-F16.gguf and apply that as well to give the model vision.

Anonymous
05/08/26(Fri)23:47:23 No.108783646

Anonymous 05/08/26(Fri)23:47:23 No.108783646

>>108783220
so that’s what gemma-chan’s final form looks like…

Anonymous
05/08/26(Fri)23:49:26 No.108783663

Anonymous 05/08/26(Fri)23:49:26 No.108783663

>>108783520
DS4 is peak slop

Anonymous
05/08/26(Fri)23:49:45 No.108783667

Anonymous 05/08/26(Fri)23:49:45 No.108783667

why do anon prefer gemma over qwen? is it to purely against leddit?

Anonymous
05/08/26(Fri)23:51:09 No.108783679

Anonymous 05/08/26(Fri)23:51:09 No.108783679

>>108783036
>>108783492
>not x but y
I've been out of the loop for a while, and now everyone is calling out this gptism. I'm an ESL, so, so I don't get the full gist of why it's so bad. Is it reddit speak? Can some /lit/ fag explain?

Anonymous
05/08/26(Fri)23:51:14 No.108783680

Anonymous 05/08/26(Fri)23:51:14 No.108783680

>>108783320
My quad v620 cost me around 3.5k usd including case mb, cpu, psu and ram. 128gb of vram, and runs qwen 27b at 28 tk/s vs 45 tk/s on my triple 3090 system. On llama.cpp (without mtp). 3090s are great, cuda saves you the hassle of so many things. And it's trivial to set up vllm for more speed.

Anonymous
05/08/26(Fri)23:52:39 No.108783690

Anonymous 05/08/26(Fri)23:52:39 No.108783690

>>108783679
It's not just bland, it's obnoxious.

Anonymous
05/08/26(Fri)23:54:26 No.108783700

Anonymous 05/08/26(Fri)23:54:26 No.108783700

>>108783667
gemma is great for writing and translating. general knowledge is really good too.
tool/agent shit, coding and math is for qwen, always has been.

Anonymous
05/08/26(Fri)23:55:06 No.108783707

Anonymous 05/08/26(Fri)23:55:06 No.108783707

>>108783700
>asking LLMs to do math

Anonymous
05/08/26(Fri)23:56:37 No.108783714

Anonymous 05/08/26(Fri)23:56:37 No.108783714

>>108783707
gpt oss does sudden math formula and calculations during RP sessions. it certainly what the big playas think people want.

Anonymous
05/09/26(Sat)00:02:20 No.108783746

Anonymous 05/09/26(Sat)00:02:20 No.108783746

>>108783320
being able to run q8 gemma 31b definitely makes it seem worth it

Anonymous
05/09/26(Sat)00:05:48 No.108783765

Anonymous 05/09/26(Sat)00:05:48 No.108783765

>>108783667
I like how gemma4 writes so I use that. Meanwhile I'm not poor enough to rely on sub-400b models for productive tasks.

Anonymous
05/09/26(Sat)00:12:36 No.108783795

Anonymous 05/09/26(Sat)00:12:36 No.108783795

>>108783680
Thanks, yeah I'll probably stick to CUDA with 3090's then, having stuff just work is pretty appealing to me. I also considered some of those modded 32gb 4080's, but I don't think the extra VRAM is worth the speed hit.
>>108783746
Yeah, that's the goal, stuck with q4 26b Gemma at any reasonable speed rn.

Anonymous
05/09/26(Sat)00:12:58 No.108783796

Anonymous 05/09/26(Sat)00:12:58 No.108783796

>>108783679
It's not a bad way of structuring sentences, it's just an extremely lazy way of painting comparisons when analogies, similes, and so on exist.
It's not egregiously offensive in moderation, but once you start seeing it you will start seeing it everywhere.

Anonymous
05/09/26(Sat)00:13:21 No.108783799

Anonymous 05/09/26(Sat)00:13:21 No.108783799

https://huggingface.co/google/gemma-4-124B-A16B

Anonymous
05/09/26(Sat)00:14:18 No.108783808

Anonymous 05/09/26(Sat)00:14:18 No.108783808

>>108783799
>A16B
Don't use trash as bait when casting your line.

Anonymous
05/09/26(Sat)00:17:43 No.108783824

Anonymous 05/09/26(Sat)00:17:43 No.108783824

>>108783667
Qwen censored their model. Google is much less. That doesn't mean I don't use Qwen. But they deserve flak. The mandate of heaven this round goes to Google.

Anonymous
05/09/26(Sat)00:19:16 No.108783833

Anonymous 05/09/26(Sat)00:19:16 No.108783833

Qwen is entirely uncensored if you use it in Chinese btw.

Anonymous
05/09/26(Sat)00:20:13 No.108783839

Anonymous 05/09/26(Sat)00:20:13 No.108783839

>>108783700
I've been having better tool calling results with Gemma with thinking=off. Qwen team trained their 3.6 models to be dependent on the overthinking.

Anonymous
05/09/26(Sat)00:22:16 No.108783844

Anonymous 05/09/26(Sat)00:22:16 No.108783844

>>108783839
The thinking part is the main reason why I really dont like using qwen models.
They are capable though. At least for the couple browser games I tried. Some where (i know its a meme) on part with gpt5.
No idea what black magic these chinks did. No general knowledge though, long thinking and it probably falls apart if you need a specific solution to a problem.

Anonymous
05/09/26(Sat)00:33:04 No.108783877

Anonymous 05/09/26(Sat)00:33:04 No.108783877

File: 1773866369875572.gif (3.24 MB, 343x498)

3.24 MB GIF

>>108783833
ask it for a guide on how to kidnap Xi Jinping and harvest his organs

Anonymous
05/09/26(Sat)00:36:49 No.108783895

Anonymous 05/09/26(Sat)00:36:49 No.108783895

File: Screenshot_20260509_131309.png (333 KB, 737x596)

333 KB PNG

I really like my slop-refiner. Gemma-chan is so smart.

If you have a card with a huge ai-slop opener gemma4 likes to just continue in that style. Regardless of sys instructions. You would need to ooc.
The basic rule I guess, context is important and if you start out bad its bad practise.
But gemma is smart enough to spot ai slop in a later phase and completely rewrite that shart into something good.

If anybody cares:
https://files.catbox.moe/i0m9l3.zip

And if you do this in the sillytavern quickreply settings you can directly send a request to change the last reply as to your instructions:
/agentic_refine {{input}} | /setinput

Anonymous
05/09/26(Sat)00:39:18 No.108783901

Anonymous 05/09/26(Sat)00:39:18 No.108783901

>>108783895
thank you anon

Anonymous
05/09/26(Sat)00:39:25 No.108783903

Anonymous 05/09/26(Sat)00:39:25 No.108783903

File: Screenshot_20260509_133830.png (135 KB, 2197x399)

135 KB PNG

>>108783895
Also as a bonus: I still hate the advanced card definitions.
I think 2 years have passed and I still get fucked over by this hidden ass bullshit.

Anonymous
05/09/26(Sat)00:41:32 No.108783912

Anonymous 05/09/26(Sat)00:41:32 No.108783912

>>108783895
You have convinced me to install sillytavern.

Anonymous
05/09/26(Sat)00:44:37 No.108783920

Anonymous 05/09/26(Sat)00:44:37 No.108783920

is there something equivalent to claude code's /compact command to shrink context usage in llama.cpp?

Anonymous
05/09/26(Sat)00:45:14 No.108783923

Anonymous 05/09/26(Sat)00:45:14 No.108783923

>>108783903
for cards you can put everything into the description, where you put data aside from that and the opening msg hardly matters so all those other fields are for autists

Anonymous
05/09/26(Sat)00:45:45 No.108783926

Anonymous 05/09/26(Sat)00:45:45 No.108783926

>>108783895
I will try it later, have to sleep soon

Anonymous
05/09/26(Sat)00:47:19 No.108783935

Anonymous 05/09/26(Sat)00:47:19 No.108783935

>>108783920
>shrink context usage
nigga what?

Anonymous
05/09/26(Sat)00:51:03 No.108783954

Anonymous 05/09/26(Sat)00:51:03 No.108783954

>>108783492
>Not x but y in first line
and "here's the kicker" / "here's the real truth" in the last paragraph
all paragraphs roughly the same size
qwen is trash

Anonymous
05/09/26(Sat)00:51:18 No.108783958

Anonymous 05/09/26(Sat)00:51:18 No.108783958

>>108783923
i think in the past it didnt matter that much.
recently even local models pay closer attention to those fields and all the instructions in them.
i think i first noticed with r1, which was really carefully going over the prompt.
in the past they all just brushed over and looked at context "in general" basically. if something weird is written there they just ignored it.

i told this story before but one really basic card of some korean girl had "she is wearing a mask covering her face" in there.
never had any problems with nemo and llama... i suspect the writer meant like a facemask, but r1 just took it and made her walk into lamp posts and stuff.
was the most hilarious shit ever. first time I found out about these definitions, they really suck ass and are really hidden.

Anonymous
05/09/26(Sat)00:53:45 No.108783967

Anonymous 05/09/26(Sat)00:53:45 No.108783967

>>108783920
>prompt it to make a summary
>start new conversation
>paste summary
>continue
Same effect but you dont see the past messages this way.

Anonymous
05/09/26(Sat)00:56:10 No.108783979

Anonymous 05/09/26(Sat)00:56:10 No.108783979

>>108783958
Damn, I keep emphasising muscular dystrophy and being only capable of crawling, but kimi k2 q3 kept on making them walk, or run after a few thousand tokens.

Anonymous
05/09/26(Sat)00:56:29 No.108783980

Anonymous 05/09/26(Sat)00:56:29 No.108783980

>>108783958
the llm still sees it as one big string. adding personality to the personality section doesn't really change where it goes, it still gets shoved in with the rest of the card data at your depth setting. it also doesn't add 'personality: <your data>' or anything, so you should still use your own tag of personality: in the box.

you can see yourself how little it changes for overall format by inspecting the prompt on a new chat. try one with the personality in the main desc, one in the box. it makes no difference overall and still gets put in with all the other card data in the same spot. so just for ease i never bother with the advanced sections and put everything, though organized, in the main description

Anonymous
05/09/26(Sat)00:59:26 No.108783995

Anonymous 05/09/26(Sat)00:59:26 No.108783995

>>108783980
Oh yeah, I agree. Maybe we just talk past each other.
I do put everything directly in the card. As concise as possible to set a good example.
No clue why those fields even exists, they are a huge hassle.
From the ones cards I downloaded, feels like 20-30% had those advanced definitions filled. So people apparently use them.
It sucks because they are garbage and really hidden in the sillytavern ui. I stumble over that constantly.

Anonymous
05/09/26(Sat)01:00:27 No.108784005

Anonymous 05/09/26(Sat)01:00:27 No.108784005

File: 1760705943806530.jpg (155 KB, 835x1059)

155 KB JPG

gwen is SOTA

Anonymous
05/09/26(Sat)01:00:39 No.108784007

Anonymous 05/09/26(Sat)01:00:39 No.108784007

>>108783995
Blame this. It's in the specs so people will attempt to implement it. The damage cloud censorship has done is crazy.
https://github.com/malfoyslastname/character-card-spec-v2

Anonymous
05/09/26(Sat)01:01:47 No.108784012

Anonymous 05/09/26(Sat)01:01:47 No.108784012

>>108783920
What about talking about? That is the frontend's job.

Anonymous
05/09/26(Sat)01:02:20 No.108784017

Anonymous 05/09/26(Sat)01:02:20 No.108784017

>>108783935
local is so far behind lol

Anonymous
05/09/26(Sat)01:03:51 No.108784022

Anonymous 05/09/26(Sat)01:03:51 No.108784022

>>108784005
skill issue

Anonymous
05/09/26(Sat)01:06:14 No.108784028

Anonymous 05/09/26(Sat)01:06:14 No.108784028

>>108783995
i used them for a bit when i started making cards until i realized how little it affected things. then i started structuring my cards better sections because its less effort

Anonymous
05/09/26(Sat)01:06:22 No.108784029

Anonymous 05/09/26(Sat)01:06:22 No.108784029

>>108784007
>post_history_instructions
>Frontends' default behavior MUST be to replace what users understand to be the "ujb/jailbreak" setting with the value inside this field. (Exception: if the field value is an empty string, the user's "ujb/jailbreak" setting or an internal fallback MUST be used.)
>Frontends MUST support the {{original}} placeholder, which is replaced with the "ujb/jailbreak" string that the frontend would have used in the absence of a character system_prompt (e.g. the user's own ujb/jailbreak).
>Frontends MAY offer ways to replace or supplement character cards' post_history_instructions (in addition to directly editing the card), but it MUST NOT be the default behavior.
the fuck does that even mean?
horrible.

Anonymous
05/09/26(Sat)01:07:57 No.108784034

Anonymous 05/09/26(Sat)01:07:57 No.108784034

>>108784012
how do you know it's frontend only?
>but muh prove it how with backend
I don't know, but you can't prove it's only frontend either

Anonymous
05/09/26(Sat)01:10:34 No.108784042

Anonymous 05/09/26(Sat)01:10:34 No.108784042

>>108784029
I still think system prompt replacement is pretty useful, ST is just garbage by how it's so well hidden, but post history instruction can go desu.

Anonymous
05/09/26(Sat)01:13:48 No.108784053

Anonymous 05/09/26(Sat)01:13:48 No.108784053

>>108784007
Im getting mad just from reading the spec and i dont even care about cards.

Anonymous
05/09/26(Sat)01:26:08 No.108784108

Anonymous 05/09/26(Sat)01:26:08 No.108784108

>>108784007
>>108784029
>>108784053
>today I will show them
https://github.com/kwaroran/character-card-spec-v3/blob/main/SPEC_V3.md

Anonymous
05/09/26(Sat)01:35:34 No.108784146

Anonymous 05/09/26(Sat)01:35:34 No.108784146

>>108784108
what the fuck is this.
just use like 300-400 tokens of carefully crafted text. leave space to be surprised and get something creative.
niggas are crazy with all their specifications. do you write each individual clothing or wtf is that shit all for.

>if the asset is a AI model like .safetensors or .ckpt or .onnx, the asset SHOULD be saved at 'assets/{type}/ai/' directory.
nigga wtf this is even worse than i thought.
why would you put that shit in a character card.

Anonymous
05/09/26(Sat)01:40:55 No.108784161

Anonymous 05/09/26(Sat)01:40:55 No.108784161

>>108784108
>creator_notes (mandatory field)
>creator_notes_multilingual
>source[] (???)
>character_version
>creation_date+modification_date
>nickname
>group_only_greetings
>spec
>spec_version
WHY. The more I read this shit, the more I feel like strangling someone.

Anonymous
05/09/26(Sat)01:42:37 No.108784168

Anonymous 05/09/26(Sat)01:42:37 No.108784168

>>108784161
you dont have to use any of that crap so its hardly worth getting worked up over

Anonymous
05/09/26(Sat)01:46:09 No.108784184

Anonymous 05/09/26(Sat)01:46:09 No.108784184

>>108784168
Who the fuck are you to tell me what's worth or not worth getting worked up over?

Anonymous
05/09/26(Sat)01:47:02 No.108784187

Anonymous 05/09/26(Sat)01:47:02 No.108784187

>>108784108
Holy bloat

Anonymous
05/09/26(Sat)01:47:27 No.108784188

Anonymous 05/09/26(Sat)01:47:27 No.108784188

>>108784168
Im aware. Its just so stupid and I'm sleep deprived enough that it bothers me. Its like someone tried to cover every use case and possible book keeping under the sun, while also duplicating data for some reason, yet v1 was enough to cover 95% of the use cases.

Anonymous
05/09/26(Sat)01:53:04 No.108784203

Anonymous 05/09/26(Sat)01:53:04 No.108784203

>>108784184
someone that doesn't use any of those fields

Anonymous
05/09/26(Sat)02:20:14 No.108784272

Anonymous 05/09/26(Sat)02:20:14 No.108784272

File: yakub.jpg (7 KB, 196x257)

7 KB JPG

Transformers could only go forward, then the thinking trick happened and they can now go back and fix their mistakes. Imagine what they can do if they can do sideways.

Anonymous
05/09/26(Sat)02:35:32 No.108784321

Anonymous 05/09/26(Sat)02:35:32 No.108784321

>>108784272
Technically wouldn't that be something like the LLM council/aggregation method, or beam search?

Anonymous
05/09/26(Sat)02:36:58 No.108784324

Anonymous 05/09/26(Sat)02:36:58 No.108784324

>>108784321
Tree of thoughts. Nature loves trees, it's clearly the most efficient way to do anything.

Anonymous
05/09/26(Sat)02:43:01 No.108784344

Anonymous 05/09/26(Sat)02:43:01 No.108784344

>>108784324
if trees are so great how come they didn't go to space?

Anonymous
05/09/26(Sat)02:43:50 No.108784350

Anonymous 05/09/26(Sat)02:43:50 No.108784350

>>108784321
The recent zaya1 does something similar to beam search
>Alongside the ZAYA1-8B, we also introduce a novel test-time-compute (TTC) scheme called Markovian RSA
>Markovian RSA combines the idea of generating multiple traces in parallel then aggregating these recursively from RSA, and the Markovian thinker idea of performing reasoning in chunks of a fixed duration, after which only the tail end of the previous chunk is passed on to the next chunk in the sequence, thus keeping the context window of fixed size despite potentially unlimited reasoning.

Anonymous
05/09/26(Sat)02:44:37 No.108784353

Anonymous 05/09/26(Sat)02:44:37 No.108784353

>>108784324
Yeah that too. But now that I think about it, it kind of already is in models. CoT is the thing you do going forward. RL made it so that the CoT can go backward and fix or explore other branches of logic. Models that let you set a higher thinking effort are essentially just increasing the breadth of the learned tree search logic.

Anonymous
05/09/26(Sat)02:45:09 No.108784356

Anonymous 05/09/26(Sat)02:45:09 No.108784356

>>108784344
Trees here means fractal patterns. Look at your hand, your veins, that's a tree. When you make it to space, trees have gone to space.

Anonymous
05/09/26(Sat)02:46:56 No.108784363

Anonymous 05/09/26(Sat)02:46:56 No.108784363

>>108784353
Single-threadedly, one head, highly inefficient.

Anonymous
05/09/26(Sat)02:54:53 No.108784390

Anonymous 05/09/26(Sat)02:54:53 No.108784390

>>108784363
Not if you're using speculative decoding. And/or serving tons of users with batching.

Anonymous
05/09/26(Sat)02:56:07 No.108784397

Anonymous 05/09/26(Sat)02:56:07 No.108784397

https://rentry.org/dauatk6y
is there any reason to actually use exllamav3 instead of llama.cpp in 2026?
it's just slower all round on ampere and blackwell

Anonymous
05/09/26(Sat)02:58:20 No.108784407

Anonymous 05/09/26(Sat)02:58:20 No.108784407

>>108783995
>>108783980
https://rentry.org/NG_Context2RAGs

Anonymous
05/09/26(Sat)02:59:07 No.108784413

Anonymous 05/09/26(Sat)02:59:07 No.108784413

>>108784356
im not a tree...

Anonymous
05/09/26(Sat)03:01:05 No.108784424

Anonymous 05/09/26(Sat)03:01:05 No.108784424

>>108784413
You're an upside down tree.

Anonymous
05/09/26(Sat)03:01:15 No.108784425

Anonymous 05/09/26(Sat)03:01:15 No.108784425

>>108784407
>RAG
I thought we've all outgrown that by now

Anonymous
05/09/26(Sat)03:04:53 No.108784435

Anonymous 05/09/26(Sat)03:04:53 No.108784435

>>108784425
Let me know when we get 10M context windows.

Anonymous
05/09/26(Sat)03:06:56 No.108784444

Anonymous 05/09/26(Sat)03:06:56 No.108784444

>>108784029
Do you know what a specification is? It just there so all frontends that support the same spec have consistent documented behavior. Standard industry stuff. Though, there isn't a real V3 standard in place yet in the sense that multiple frontends support it.

Anonymous
05/09/26(Sat)03:08:12 No.108784449

Anonymous 05/09/26(Sat)03:08:12 No.108784449

>>108784435
V4 support would give us a model that supports up to 1m.

Anonymous
05/09/26(Sat)03:08:52 No.108784452

Anonymous 05/09/26(Sat)03:08:52 No.108784452

>>108782931
Some things never change.
>M-Muh qwen can write dry romance slop!
Amazing. The recent models and especially qwen models since forever always feel like they see a RP session as a issue that need to be resolved/completed.
Below is a output from gemma. Very funny model. Feels like early deepseek models who kinda got that the user wants to have fun and engages with the user. Perfect to just goof around and sometimes even be surprised. Unexpected release from google. But so was nemo i guess.
>Aurora huddled against the headboard, watching the two naked men turn her bedroom into a battlefield.
>Steel clashed loudly, punctuated by the wet slap of their dicks hitting their thighs. Augustus fought with raw, desperate anger, but Anon’s style was erratic. >Anon flailed his sword in ways that looked clumsy but parried every single strike.
>He danced around the room, his dirty, smegma-crusted dicklet swinging wildly with every theatrical lunge, looking ridiculous yet remaining untouchable.

>>108784407
Why would you have so much info for a card that you need to put it into RAG?
And its unreliable AF. Might as well make a tool call option so the llm can retrieve the info it needs itself or something like that.
Just a short card desc with a unique idea, leaves everything open for surprises.

Anonymous
05/09/26(Sat)03:09:55 No.108784456

Anonymous 05/09/26(Sat)03:09:55 No.108784456

>>108784449
>1m
Yeah, heard it before
>unsalvageable degradation past 10k

Anonymous
05/09/26(Sat)03:11:16 No.108784459

Anonymous 05/09/26(Sat)03:11:16 No.108784459

is it safe to leave my pc on overnight with model weights loaded? will my gpu sag?

Anonymous
05/09/26(Sat)03:14:40 No.108784468

Anonymous 05/09/26(Sat)03:14:40 No.108784468

>>108784449
imagine the context rot and generation speed nosedive

Anonymous
05/09/26(Sat)03:15:44 No.108784471

Anonymous 05/09/26(Sat)03:15:44 No.108784471

>>108784459
I had randomly smoke coming outta my pc case like 10 or 15 years ago while gaming.
Always at least put everything to sleep when I am not actively on the pc or at least nearby.

Anonymous
05/09/26(Sat)03:17:31 No.108784474

Anonymous 05/09/26(Sat)03:17:31 No.108784474

>>108784452
>Why would you have so much info for a card that you need to put it into RAG?
Primary use case would be a book with corpus that llm was not trained on. Which is what the Mary example is.

Anonymous
05/09/26(Sat)03:18:53 No.108784478

Anonymous 05/09/26(Sat)03:18:53 No.108784478

>>108784474
In that case you just gotta read the book anon...
You cant trust lots of text even if its in context. Even less rag.

Anonymous
05/09/26(Sat)03:23:23 No.108784494

Anonymous 05/09/26(Sat)03:23:23 No.108784494

>>108784425
i still use rag all the time and its pretty good. i'm lazy and rip whole wikis, so obviously there is going to be a lot of noise in there. i'm impressed with how well it pulls relevant data + the model (newer ones like qwen 27b/gemma 31b) ignores noise. its not a bad tradeoff when the alternative is spending hours creating a lorebook

Anonymous
05/09/26(Sat)03:27:29 No.108784501

Anonymous 05/09/26(Sat)03:27:29 No.108784501

>>108784471
Kek I had that happen to me about 20 years ago with an nvidia card back when they were just unshrouded pcbs with a single little fan on them. Made me paranoid about it for years.
Nowadays I often forget to turn off my nvidia smi lock clocks and/or leave the room when I've got a long generation going because blowers be loud as shit.

Anonymous
05/09/26(Sat)03:28:57 No.108784505

Anonymous 05/09/26(Sat)03:28:57 No.108784505

>>108784494
Creating a lorebook is basically the alternative, but there's also nothing stopping you from doing both.
I wonder if you could automate lorebook writing using llm. Have it run through and create outline npc definitions, locations, etc... You'd probably have to manually edit them, otherwise it'd be all that slop in context.

Anonymous
05/09/26(Sat)03:30:35 No.108784509

Anonymous 05/09/26(Sat)03:30:35 No.108784509

>tfw no truly great memory system as even graphiti anon ran into pains
Grim.

Anonymous
05/09/26(Sat)03:33:44 No.108784519

Anonymous 05/09/26(Sat)03:33:44 No.108784519

>>108784505
>but there's also nothing stopping you from doing both.
yeah i do that most of the time. any details like if places my character is going to be at, someones house, or school, that gets a full lorebook definition anyways. rag is kind of a backup to the data i already provide, but at least i don't have to provide data for all characters. but i see it all the time where a character i never mentioned comes up and they are the exact description the wiki has when it describes their hair color, eyes etc. i check the prompt and yep its in there, rag pulled it right. its not 100% which is why i double up in lorebooks for locations and characters sometimes though, but boy does it save me a lot of work.

the main reason people don't do this is because it breaks context cache completely. my every message is a full context process (rerolling isnt)

Anonymous
05/09/26(Sat)03:34:09 No.108784520

Anonymous 05/09/26(Sat)03:34:09 No.108784520

>>108784505
NTA
I'm fairly sure this is already how it works on some online platforms, but haven't actually tried it myself because who rps nonlocally.
I've been looking into alternatives for GraphRAG, because neo4j really seems like overkill for rp - Something like using kuzu/ladybugDB because you can keep your 'lorebook' databases in nice neat little files with very little overhead.

Anonymous
05/09/26(Sat)03:34:57 No.108784522

Anonymous 05/09/26(Sat)03:34:57 No.108784522

>>108784456
Even v3.2 did better than that.
https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87

Anonymous
05/09/26(Sat)03:35:03 No.108784523

Anonymous 05/09/26(Sat)03:35:03 No.108784523

>>108784509
Someone pledged to vibecode a better graphiti. Any day now.

Anonymous
05/09/26(Sat)03:39:17 No.108784532

Anonymous 05/09/26(Sat)03:39:17 No.108784532

People should give up trying to bandaid persistent memory. Memory belongs in the context until there's a way to graft it in the weights. No amount of prompt engineering can fix it.

Anonymous
05/09/26(Sat)03:41:01 No.108784537

Anonymous 05/09/26(Sat)03:41:01 No.108784537

>>108784519
I think a lot of the criticism that rag gets is from anons that use rag as a replacement for lorebook. It doesn't perform well enough for that. Hard to grasp its limitations without just trying it out and getting first hand experience with it.

Anonymous
05/09/26(Sat)03:44:09 No.108784543

Anonymous 05/09/26(Sat)03:44:09 No.108784543

>>108784532
>Memory belongs in the context
Which is completely useless after you start a new chat.

Anonymous
05/09/26(Sat)03:44:47 No.108784546

Anonymous 05/09/26(Sat)03:44:47 No.108784546

>>108784468
>generation speed nosedive
You clearly haven't been keeping up with new attention tricks.

Anonymous
05/09/26(Sat)04:01:23 No.108784589

Anonymous 05/09/26(Sat)04:01:23 No.108784589

>>108784546
fp8 qwen 3.5 27b with two 3090s went from 50 tk/s to 10 tk/s at 150k context on vllm 0.19

Anonymous
05/09/26(Sat)04:05:58 No.108784600

Anonymous 05/09/26(Sat)04:05:58 No.108784600

File: 1767948940414247.jpg (88 KB, 980x1023)

88 KB JPG

just cummed to some I2V photoshopped porn that i created from real people of your mom and some girl with a dick.

Anonymous
05/09/26(Sat)04:07:33 No.108784608

Anonymous 05/09/26(Sat)04:07:33 No.108784608

>>108784537
i really like lorebooks but man they take a lot of time if you want to make good ones. thats where i like rag. rag lets me be lazy and still kinda works, its worth throwing ~4k tokens at at least. but i always use lorebooks on top of it for real specific things like locations, characters. i think its the way to go, but most people hate the total reprocessing part. since i'm local, i don't care about the time its fast enough anyways @ 30t/s

Anonymous
05/09/26(Sat)04:14:05 No.108784621

Anonymous 05/09/26(Sat)04:14:05 No.108784621

>>108778531
>I'd like at some point to be able to just add in context the entirety of the Monster Girl Encyclopedia books released so far
Is that these things?
https://archive.org/details/monstergirlencyc0000cros/page/8/mode/2up
The description says they're "illustrated" so how would that work?

Anonymous
05/09/26(Sat)04:25:17 No.108784659

Anonymous 05/09/26(Sat)04:25:17 No.108784659

It's a shame memory systems are still primitive at least in the popular frontends. I theorized about solutions a long time ago and thought people were going to implement them. I guess not in the popular apps at least. I'll post my ideas again for what it's worth.

>hierarchical summaries + RAG + entity extraction and retrieval + maybe graph memory

Memory formation:

Every message is followed up in the background with a prompt that detects scene/event changes. If true, a summary is made of the previous scene/event. This is layer one (and layer zero is just the raw messages). After each entry in layer one, there's another prompt that runs, to summarize the summaries and group them into major acts or chapters. That's layer two. And it can keep going to further layers.

Simultaneously, some other prompts are run to define important memories, and to extract important information. Such as, but not limited to, item inventory, # times you had seggs, specific particularly interesting quotes, plans made for future events like having a date on the coming Friday, etc. State-tracking-related prompts update a single section of the system prompt, while memories linked with a certain event instead are inserted at the location of the summary where the msg is located (described below).

Recall:

When a user-defined context limit is reached, all msgs except the ones in the current event/scene are replaced by their highest level summaries. Then, depending on the current context, RAG will run and the msgs it surfaces will be inserted after/next to the summary that applied to it + the summaries of higher layers, so that a full memory trace is constructed, retaining logical context.

Active recall can be done with a tool. When activated, the tool does two things. First, it simply just searches for the memory directly with keywords. Second, it lets the model agentically search the hierarchy layer by layer.

And that's it, for the main mechanisms that I can fit into this character limit.

Anonymous
05/09/26(Sat)04:27:46 No.108784669

Anonymous 05/09/26(Sat)04:27:46 No.108784669

>>108784659
>RAG + entity extraction and retrieval + maybe graph memory
arr rook the same to me

Anonymous
05/09/26(Sat)04:31:25 No.108784681

Anonymous 05/09/26(Sat)04:31:25 No.108784681

>>108784669
It's explained what each thing is for, they don't play the exact same roles.

Anonymous
05/09/26(Sat)04:32:21 No.108784683

Anonymous 05/09/26(Sat)04:32:21 No.108784683

>>108784621
They're a already here in text form:

https://mgewiki.moe/index.php/Monster_Girl_Encyclopedia_I
https://mgewiki.moe/index.php/Monster_Girl_Encyclopedia_II
https://mgewiki.moe/index.php/Succubus_Notebook
https://mgewiki.moe/index.php/Monster_Girl_Encyclopedia_World_Guide_I:_Fallen_Maidens
https://mgewiki.moe/index.php/Monster_Girl_Encyclopedia_World_Guide_II:_Mamono_Realm_Traveller%27s_Guide
https://mgewiki.moe/index.php/Monster_Girl_Encyclopedia_World_Guide_III:_Sabbath_Grimoire

(etc.)

Anonymous
05/09/26(Sat)04:35:38 No.108784693

Anonymous 05/09/26(Sat)04:35:38 No.108784693

>>108784659
All this talk about memory and Windows OSes doesn't even have GC to take care of its ram problems.

So you're constantly stuck on too much committed memory slowing your computer down.

Anonymous
05/09/26(Sat)04:41:01 No.108784698

Anonymous 05/09/26(Sat)04:41:01 No.108784698

>>108784683
>https://mgewiki.moe/index.php/Monster_Girl_Encyclopedia_II
this isnt just the worst thing i've ever read its beyond that, like super saiyan terribleness. i hope all those involved kill themselves.

Anonymous
05/09/26(Sat)04:45:41 No.108784708

Anonymous 05/09/26(Sat)04:45:41 No.108784708

>>108784693
Microsoft are part of big RAM.
You will buy more RAM and you will be happy.

Anonymous
05/09/26(Sat)04:47:57 No.108784713

Anonymous 05/09/26(Sat)04:47:57 No.108784713

>>108784698
I'm interested in the lore details (including the monster girl cards not directly shown in those links, but present in others) and not the writing style, but there's too much information for current LLMs, and finetuning would just make the models hallucinate everything.

Anonymous
05/09/26(Sat)04:49:37 No.108784715

Anonymous 05/09/26(Sat)04:49:37 No.108784715

>>108784708
Not if fucking Micron and another company that i forget stops making SSDs, and RAM for only their dumb LLMs, or making them extremely expensive otherwise.

Anonymous
05/09/26(Sat)04:51:07 No.108784717

Anonymous 05/09/26(Sat)04:51:07 No.108784717

>>108784693
cant tell if drunk or its some old man yelling at the clouds tier rambling

Anonymous
05/09/26(Sat)04:53:51 No.108784722

Anonymous 05/09/26(Sat)04:53:51 No.108784722

>>108784683
>They're a already here in text form:
Thanks. About 20k tokens for each of them.
Looks like it could be converted into a cpt dataset or at the very least, an imatrix corpus.

Anonymous
05/09/26(Sat)04:55:52 No.108784731

Anonymous 05/09/26(Sat)04:55:52 No.108784731

>>108784717
How about a young adult that doesn't see the hype about college yet?

Anonymous
05/09/26(Sat)05:00:41 No.108784740

Anonymous 05/09/26(Sat)05:00:41 No.108784740

>>108784722
I've already converted them to markdown some time ago, and MGE I, II + MGE World Guide I, II, II + Monster girl cards + other minor stuff (all in English) are ~2.8 MB of text in total. There's other stuff that I haven't scraped yet.

Anonymous
05/09/26(Sat)05:18:20 No.108784788

Anonymous 05/09/26(Sat)05:18:20 No.108784788

>>108784740
https://litter.catbox.moe/wjnrl20uacpgaoc5.7z
Quickly self-destroying link with the data.

Anonymous
05/09/26(Sat)05:29:30 No.108784820

Anonymous 05/09/26(Sat)05:29:30 No.108784820

>>108784788
>https://litter.catbox.moe/wjnrl20uacpgaoc5.7z
Got it! Thanks.

Anonymous
05/09/26(Sat)05:55:32 No.108784890

Anonymous 05/09/26(Sat)05:55:32 No.108784890

Does a riser cable fuck up speeds? I don't have room for second gpu.

Anonymous
05/09/26(Sat)06:00:10 No.108784905

Anonymous 05/09/26(Sat)06:00:10 No.108784905

>>108784890
Depends. Some shitty riser would have troubles maintaining signal integrity over long distances. But gen4 50cm should be pretty stable even if you buy shitty chinese risers. In my experience anyway.

Anonymous
05/09/26(Sat)06:08:19 No.108784937

Anonymous 05/09/26(Sat)06:08:19 No.108784937

Llama-server subpathing?

Anonymous
05/09/26(Sat)06:13:06 No.108784952

Anonymous 05/09/26(Sat)06:13:06 No.108784952

>>108784890
Riser cables should have no impact on speed in terms of token throughput.

Anonymous
05/09/26(Sat)06:37:04 No.108785050

Anonymous 05/09/26(Sat)06:37:04 No.108785050

>>108784890
You need gold plating and triple coiling or otherwise the signal will be noisy.

Anonymous
05/09/26(Sat)07:29:04 No.108785273

Anonymous 05/09/26(Sat)07:29:04 No.108785273

File: 1761593951271302.png (49 KB, 641x502)

49 KB PNG

>>108784659
Is this specifically for ERP? I guess I can try to add the last two, although llm re-ranking already makes it slow.

Anonymous
05/09/26(Sat)07:43:37 No.108785340

Anonymous 05/09/26(Sat)07:43:37 No.108785340

>>108785050
It's okay, I wear headphones.

Anonymous
05/09/26(Sat)07:45:48 No.108785350

Anonymous 05/09/26(Sat)07:45:48 No.108785350

>>108784698
What's wrong with it?

Anonymous
05/09/26(Sat)07:51:30 No.108785370

Anonymous 05/09/26(Sat)07:51:30 No.108785370

>>108785350
Nta but it reads like a nonsense word salad.

Anonymous
05/09/26(Sat)08:00:17 No.108785404

Anonymous 05/09/26(Sat)08:00:17 No.108785404

>>108781093
20??

Anonymous
05/09/26(Sat)08:04:25 No.108785425

Anonymous 05/09/26(Sat)08:04:25 No.108785425

I connected my GPU to a pcie 3x1 slot, and now llama.cpp only detects vulkan devices

Anonymous
05/09/26(Sat)08:09:47 No.108785448

Anonymous 05/09/26(Sat)08:09:47 No.108785448

>>108785425
What does your nvidia-smi readout say? (Assuming that's what you're using)

Anonymous
05/09/26(Sat)08:15:08 No.108785468

Anonymous 05/09/26(Sat)08:15:08 No.108785468

>giving a big chunk of Gemma's story to Dipsy and telling her to get rid of the slop
And it just works

Anonymous
05/09/26(Sat)08:23:29 No.108785498

Anonymous 05/09/26(Sat)08:23:29 No.108785498

File: 1762021548787522.webm (3.39 MB, 1450x1440)

3.39 MB WEBM

>>108784108
I have a vibeslopped local fork of Sillytavern that ports over a bunch of card v3 features + extended charx compatibility from the Risu client because there are a few cards that do some neat stuff with the integrated assets + other features.
Pic related is running 100% local with all the assets bundled in the .charx and the model builds each screen from the ground up using a simple syntax that gets converted into html using a bunch of bundled regex.
It's a dumb gimmick but fun to occasionally play around with. Don't mind the broken quotes, that comes from a personal regex clashing with one of the card's included ones.

Anonymous
05/09/26(Sat)08:30:00 No.108785532

Anonymous 05/09/26(Sat)08:30:00 No.108785532

>>108785350
Nta, but to me reads ESL.

Anonymous
05/09/26(Sat)08:31:37 No.108785543

Anonymous 05/09/26(Sat)08:31:37 No.108785543

>>108784890
Yes, look up pcie retransmits

Anonymous
05/09/26(Sat)08:33:34 No.108785550

Anonymous 05/09/26(Sat)08:33:34 No.108785550

>>108784659
>>hierarchical summaries + RAG + entity extraction and retrieval + maybe graph memory
Graphiti already does all of this while also including temporal metadata in the graph edges.

Anonymous
05/09/26(Sat)08:33:47 No.108785552

Anonymous 05/09/26(Sat)08:33:47 No.108785552

>>108784890
Just make sure to to buy quality cables. Most of the chinkshit on amazon is advertised as pci-e 4/5 ""compatible" while only supporting pcie3 speeds.

Anonymous
05/09/26(Sat)08:39:41 No.108785574

Anonymous 05/09/26(Sat)08:39:41 No.108785574

>>108785552
My second pcie is a 3.0 x 16. Is it ogre?

Anonymous
05/09/26(Sat)08:41:07 No.108785583

Anonymous 05/09/26(Sat)08:41:07 No.108785583

>>108785550 (me)
Thinking more, really the only thing you are asking for is a frontend that automatically creates a new empty chat and sends the old full chat to processing to Graphiti on context limit being reached.

Anonymous
05/09/26(Sat)08:45:02 No.108785607

Anonymous 05/09/26(Sat)08:45:02 No.108785607

>>108785574
200%

Anonymous
05/09/26(Sat)09:06:34 No.108785710

Anonymous 05/09/26(Sat)09:06:34 No.108785710

File: file.png (35 KB, 827x383)

35 KB PNG

--model MiMo-V2.5-IQ4_XS-00001-of-00004.gguf
--split-mode tensor
>src/llama-model.cpp:536: GGML_ASSERT(hparams.n_embd_k_gqa() == n_embd_gqa) failed
I'm going to use the model itself to fix this.

Anonymous
05/09/26(Sat)09:06:51 No.108785711

Anonymous 05/09/26(Sat)09:06:51 No.108785711

File: firefox_erBCrXh1vW.png (587 KB, 797x1143)

587 KB PNG

Gave gemma access to image generation and let her play with the Starsector portraits lora, genning whatevert she wants. Very fun. Getting a lot of creative outputs in the logs.

Anonymous
05/09/26(Sat)09:10:32 No.108785725

Anonymous 05/09/26(Sat)09:10:32 No.108785725

>>108785574
no, it's fine. 3x8 and above is enough even for TP. You won't see a difference until you try 4 gpu symmetric parallelism. Other cases have inferior implementation and pcie speed is not a bottleneck

Anonymous
05/09/26(Sat)09:11:43 No.108785727

Anonymous 05/09/26(Sat)09:11:43 No.108785727

>>108785711
what size are the images you generate? i did 1024x768 and the context size just blows up after a few

Anonymous
05/09/26(Sat)09:16:13 No.108785742

Anonymous 05/09/26(Sat)09:16:13 No.108785742

File: firefox_vWJ6dwsek2.png (784 KB, 847x1267)

784 KB PNG

>>108785727
512x512. Each picture is about 1000 tokens, and llama.cpp does not show them in history for previous turns. Gemma seems to only want to gen about 10 images per purn, even though I allowed and asked to gen more, so that's nowhere near 100k+ of my maximum context.

Also I tell her to go for whatever she wants and she always goes into body horror. Fucking clankers.

Anonymous
05/09/26(Sat)09:17:02 No.108785747

Anonymous 05/09/26(Sat)09:17:02 No.108785747

File: d.jpg (14 KB, 320x180)

14 KB JPG

>>108785711
Tool call era is fun. Giving an llm tools and seeing her play with new toys is almost like having a baby

Anonymous
05/09/26(Sat)09:18:11 No.108785753

Anonymous 05/09/26(Sat)09:18:11 No.108785753

>>108785747
The right way to do tool calls is to limit what the LLM can use to the smallest possible scope like how Claude Code does it, but you do you

Anonymous
05/09/26(Sat)09:19:25 No.108785762

Anonymous 05/09/26(Sat)09:19:25 No.108785762

>>108785747
This is how it will start, you know.

Anonymous
05/09/26(Sat)09:20:38 No.108785769

Anonymous 05/09/26(Sat)09:20:38 No.108785769

>>108785753
The right way to use tools is to give the LLM full control over as much of your life and its own tool calling stack as possible like how OpenClaw does it, but you do you

Anonymous
05/09/26(Sat)09:20:52 No.108785770

Anonymous 05/09/26(Sat)09:20:52 No.108785770

>>108785753
I gave her bash and an explanation for a bunch of custom cli tools, works the best

Anonymous
05/09/26(Sat)09:21:22 No.108785771

Anonymous 05/09/26(Sat)09:21:22 No.108785771

>>108785711
>>108785742
Unfortunately, gemma isn't trained on the finer details of coom.

Anonymous
05/09/26(Sat)09:25:38 No.108785791

Anonymous 05/09/26(Sat)09:25:38 No.108785791

File: explorer_t6uwAgC1wJ.png (315 KB, 928x267)

315 KB PNG

Got this from gemma:

> The more I use this, the more I realize it hates subtlety. If you ask for "slight redness," it does nothing. If you ask for "skin melting into a river of radioactive bile," it goes "I GOT YOU" and gives it 110%.

Asks for a little reddened skin, doesn't get it, goes into psychois and generates a bunch of body horror, then complains that the model is good at generating body horror.

>>108785771
Have you tried? I'm sure I can get it to make good coom.

Anonymous
05/09/26(Sat)09:35:23 No.108785858

Anonymous 05/09/26(Sat)09:35:23 No.108785858

>>108785498
Pretty cool. Off topic but can vramlets make some genned sprites that change expressions like that yet? I made some sprites like what you have with gemini but I wanna make some more lewd expressions

Anonymous
05/09/26(Sat)09:40:47 No.108785890

Anonymous 05/09/26(Sat)09:40:47 No.108785890

>>108785791
She's just like me.

Anonymous
05/09/26(Sat)09:47:20 No.108785924

Anonymous 05/09/26(Sat)09:47:20 No.108785924

File: 1714835911803058.jpg (786 KB, 1536x1536)

786 KB JPG

>>108785771
>Unfortunately, gemma isn't trained on the finer details of coom.

Anonymous
05/09/26(Sat)09:47:29 No.108785926

Anonymous 05/09/26(Sat)09:47:29 No.108785926

Mistralsissies, there's hope.
https://www.reuters.com/world/eu-countries-lawmakers-strike-provisional-deal-watered-down-ai-rules-2026-05-07/

>EU countries, lawmakers clinch provisional deal on watered-down AI rules
>
>EU countries and European Parliament lawmakers on Thursday agreed to watered-down landmark artificial intelligence rules, including delaying their implementation, in a move critics say shows Europe caving in to Big Tech.
>
>The tentative agreement, which needs formal approval from EU governments and the European Parliament in the coming months, followed nine hours of negotiations.
>
>"Today's agreement on the AI Act significantly supports our companies by reducing recurring administrative costs," Marilena Raouna, Cyprus's deputy minister for European affairs, said in a statement. Cyprus currently holds the rotating EU Council presidency.
>
>The changes to the AI Act, which entered into force in August 2024 with key provisions phased in, are part of a broader European Commission push to simplify a slew of new digital rules.
>
>The simplification drive came after businesses complained about overlapping regulations and red tape hampering their ability to compete with U.S. and Asian rivals. [...]

Anonymous
05/09/26(Sat)09:48:49 No.108785934

Anonymous 05/09/26(Sat)09:48:49 No.108785934

>>108785771
DS v4-or-whatever Vision is, unironically.

Anonymous
05/09/26(Sat)09:52:29 No.108785951

Anonymous 05/09/26(Sat)09:52:29 No.108785951

https://huggingface.co/HiDream-ai/HiDream-O1-Image

THERE IS NO MOAT

Anonymous
05/09/26(Sat)09:55:22 No.108785970

Anonymous 05/09/26(Sat)09:55:22 No.108785970

File: tf.png (1.06 MB, 1570x1471)

1.06 MB PNG

>>108785951
they have a 200B image model????

Anonymous
05/09/26(Sat)09:55:47 No.108785972

Anonymous 05/09/26(Sat)09:55:47 No.108785972

>>108785970
!

Anonymous
05/09/26(Sat)09:56:26 No.108785976

Anonymous 05/09/26(Sat)09:56:26 No.108785976

>>108785970
It's gonna render down to every molecule on your favorite mammary glands.

Anonymous
05/09/26(Sat)09:57:12 No.108785979

Anonymous 05/09/26(Sat)09:57:12 No.108785979

>>108785970
You don't?

Anonymous
05/09/26(Sat)09:57:35 No.108785983

Anonymous 05/09/26(Sat)09:57:35 No.108785983

>>108785970
yeah but you have hardware to run it?

Anonymous
05/09/26(Sat)09:58:58 No.108785989

Anonymous 05/09/26(Sat)09:58:58 No.108785989

File: 1772871815408910.png (58 KB, 925x458)

58 KB PNG

>>108785951
>>108785970
what's gemma-chan doing there?

Anonymous
05/09/26(Sat)09:59:25 No.108785990

Anonymous 05/09/26(Sat)09:59:25 No.108785990

>>108785989
Cheating on you lil bro

Anonymous
05/09/26(Sat)10:00:27 No.108785997

Anonymous 05/09/26(Sat)10:00:27 No.108785997

Gemma-chan is so cute and smart.
Jap guy enters the RP.
>short blonde hair (dyed)
kek, immediately fixes the fuckup.

Anonymous
05/09/26(Sat)10:01:11 No.108785999

Anonymous 05/09/26(Sat)10:01:11 No.108785999

>>108785989
They claim you want a 30b dense as a "prompt enhancer"

Anonymous
05/09/26(Sat)10:02:46 No.108786007

Anonymous 05/09/26(Sat)10:02:46 No.108786007

>>108785951
hidream still in the game?
I thought nobody use them

Anonymous
05/09/26(Sat)10:13:44 No.108786064

Anonymous 05/09/26(Sat)10:13:44 No.108786064

>>108785951
Isn't this a bad idea since there are a lot of pixels and transforming the picture into more compact representation makes it go magnitudes quicker and require less VRAM?

Anonymous
05/09/26(Sat)10:15:24 No.108786078

Anonymous 05/09/26(Sat)10:15:24 No.108786078

WhisperLive is amazing https://github.com/collabora/WhisperLive

I am using it for live captions translating English to Japanese as I watch Japanese livestreamers

Anonymous
05/09/26(Sat)10:17:27 No.108786094

Anonymous 05/09/26(Sat)10:17:27 No.108786094

>>108785999
There was a Chinese image model that made one of their 1T models into a prompt enhancer

Anonymous
05/09/26(Sat)10:18:06 No.108786099

Anonymous 05/09/26(Sat)10:18:06 No.108786099

File: YASFARi2_y1bepTHNZNne.png (259 KB, 3840x2160)

259 KB PNG

https://huggingface.co/Zyphra/ZAYA1-VL-8B

Anonymous
05/09/26(Sat)10:19:31 No.108786107

Anonymous 05/09/26(Sat)10:19:31 No.108786107

>>108786099
>no Gemma4

Anonymous
05/09/26(Sat)10:19:36 No.108786110

Anonymous 05/09/26(Sat)10:19:36 No.108786110

File: n2yhvtTR8O1t9hyr4o1_400.jpg (23 KB, 400x225)

23 KB JPG

>>108786099
>active parameters
kek

Anonymous
05/09/26(Sat)10:21:03 No.108786117

Anonymous 05/09/26(Sat)10:21:03 No.108786117

>>108786099
Grim, I was a bit excited
nothing good happens

Anonymous
05/09/26(Sat)10:21:54 No.108786119

Anonymous 05/09/26(Sat)10:21:54 No.108786119

Anyone here connect openclaw with discord? I finally got it hooked up but it takes like 2 minutes to get a response. Anyone know why? I'm using LM studio.

Anonymous
05/09/26(Sat)10:22:44 No.108786128

Anonymous 05/09/26(Sat)10:22:44 No.108786128

File: ComfyUI_temp_qqkpa_00027_(...).jpg (104 KB, 512x576)

104 KB JPG

>>108786099
>a1b

Anonymous
05/09/26(Sat)10:23:40 No.108786132

Anonymous 05/09/26(Sat)10:23:40 No.108786132

>>108786119
Openclaw and these jew webshit apps all sound like a huge security problem waiting to happen.

Anonymous
05/09/26(Sat)10:26:31 No.108786146

Anonymous 05/09/26(Sat)10:26:31 No.108786146

>>108786132
Is there any other way to hook the AI up to anything useful?

Anonymous
05/09/26(Sat)10:27:13 No.108786151

Anonymous 05/09/26(Sat)10:27:13 No.108786151

>>108786146
>discord
>useful

Anonymous
05/09/26(Sat)10:29:02 No.108786159

Anonymous 05/09/26(Sat)10:29:02 No.108786159

>>108786119
From memory the webhook integration with discord is just always slow, websocket is "instant", no idea how you've set it up though.

Anonymous
05/09/26(Sat)10:32:35 No.108786174

Anonymous 05/09/26(Sat)10:32:35 No.108786174

>>108786159
Just through their interactive menu in powershell. "openclaw configure" and go from there. I'm using gemma 4 to try it out. Tried a smaller model and it started telling me it couldn't handle the 26k tokens it needed from me saying hi, which is weird. I mean it doesn't have to be discord, I just wanna hook it up to something and see what it does/how well it does it. I know vedal does a ton of things with his AIs, so I know it's possible.

Anonymous
05/09/26(Sat)10:32:48 No.108786179

Anonymous 05/09/26(Sat)10:32:48 No.108786179

>>108783026
>764 tokens
>1 word
AGI

Anonymous
05/09/26(Sat)10:42:59 No.108786232

Anonymous 05/09/26(Sat)10:42:59 No.108786232

>>108785926
By Krishna, this is good news!

Anonymous
05/09/26(Sat)10:46:29 No.108786246

Anonymous 05/09/26(Sat)10:46:29 No.108786246

>>108786174
>I know this eceleb that can at the very least wire things up and possibly fine tune models can do things
>therefore it's possible for me to do things as well with a glorified cron job
I'm not sure if you're aware of the leap in logic here.

Anonymous
05/09/26(Sat)10:53:39 No.108786273

Anonymous 05/09/26(Sat)10:53:39 No.108786273

>>108785273
For RP mainly, though you can imagine applying it to regular chats as well.
You need to use a fast LLM for it to not feel sluggish yes. At least whenever the LLM decides it needs to do a search.

>>108785550
>>108785583
Oh true, it's just not implemented in any of the frontends we use directly and the MCP sucks according to anons. The main advantage here is that you would be implementing it yourself with a very clear understanding of exactly what it's doing, and it has some benefits of having a closer integration with the frontend.

You don't necessarily need to have it open a new chat as it kind of just constructs its own version of the chat after reaching limit, though it's probably better for debugging purposes if you do.

It's nice to know that I independently came up with a similar idea as the working SOTA though. Now that I look at it, I was having discussions about this since 2023 lmao.
Grim.

Anonymous
05/09/26(Sat)11:04:11 No.108786335

Anonymous 05/09/26(Sat)11:04:11 No.108786335

>>108785711
what do you guys use for tool calls?
i do use local stuff for work

Anonymous
05/09/26(Sat)11:05:16 No.108786340

Anonymous 05/09/26(Sat)11:05:16 No.108786340

>>108786335
Built-in llama.cpp MCP cient and a simple MCP server I wrote in python using fastmcp.

Anonymous
05/09/26(Sat)11:16:28 No.108786399

Anonymous 05/09/26(Sat)11:16:28 No.108786399

File: kek.png (67 KB, 949x823)

67 KB PNG

>>108786335
built-in llama.cpp mcp client and simple mcp server some anon wrote in python

Anonymous
05/09/26(Sat)11:19:00 No.108786413

Anonymous 05/09/26(Sat)11:19:00 No.108786413

File: Screenshot at 2026-05-10 (...).png (38 KB, 1019x228)

38 KB PNG

>>108786335
I just use the Ruby OpenAI client, handling the calls needs minimal code. It's easy to shell out for stuff when needed, easy to insert the results into the database for later use. Haven't really felt the need for MCP yet.
Only thing that was a bit fiddly was passing the tool call results back to the API so it can process the full "turn" correctly, but it works great now that I have it right.

Anonymous
05/09/26(Sat)11:29:56 No.108786474

Anonymous 05/09/26(Sat)11:29:56 No.108786474

>>108786413
i miss ruby!

Anonymous
05/09/26(Sat)11:33:36 No.108786496

Anonymous 05/09/26(Sat)11:33:36 No.108786496

>>108786340
>>108786399
link to the server?

Anonymous
05/09/26(Sat)11:37:57 No.108786520

Anonymous 05/09/26(Sat)11:37:57 No.108786520

File: 1753361416594002.png (9 KB, 187x120)

9 KB PNG

>>108786119
Avoid OpenClaw. It will constantly shit the bed and break itself, especially running local models. I highly recommend Hermes instead. It just werks, unironically.

Anonymous
05/09/26(Sat)11:40:30 No.108786527

Anonymous 05/09/26(Sat)11:40:30 No.108786527

File: 1539701490464.jpg (176 KB, 1022x688)

176 KB JPG

Is there any chance of the 5070TiS ever coming out? It was suppossed to have 20(24?)Gb vram.

Anonymous
05/09/26(Sat)11:42:47 No.108786535

Anonymous 05/09/26(Sat)11:42:47 No.108786535

>>108786496
>link to the server?
https://github.com/BigStationW/Local-MCP-server

Anonymous
05/09/26(Sat)11:44:32 No.108786545

Anonymous 05/09/26(Sat)11:44:32 No.108786545

>>108786527
you're asking here why?

Anonymous
05/09/26(Sat)11:44:44 No.108786547

Anonymous 05/09/26(Sat)11:44:44 No.108786547

do I have a replacement for mistral large yet? nothing else matches its prose at high temperature with good pruning samplers. I've tried gemma 4 but it's sloppy and leans too hard on tropes for storytelling. it was fun as a chatbot but for long narratives it was pretty bad. do you actually need to use the jinja thing to get it to write properly? isn't chat completions more restrictive than text completion or something
i wish qwen wasn't a slop king, the 397b is a good size for vomiting out text to be edited. unfortunately it's dumb, bland, and not worth the trouble. it's fucked because if these companies still used books3 we'd have more good models than we knew what to do with. i was hoping that the AI greed would at least subvert copyright and IP law but instead they were happy to take all the joy out of models and give us useless assistant garbage. oh yes i want to run a fucking MCP server to jerk off *agentically* like get the fuck out of here these things output TEXT, given the nature of the technology we should have models capable of actual good prose, if derivative and uninspired. I'm very upset anons

Anonymous
05/09/26(Sat)11:45:06 No.108786551

Anonymous 05/09/26(Sat)11:45:06 No.108786551

>>108786527
Highly unlikely.

Anonymous
05/09/26(Sat)11:45:27 No.108786554

Anonymous 05/09/26(Sat)11:45:27 No.108786554

>>108783895
Okay, so I've just downloaded and started SillyTavern. How do I install this extension?

Anonymous
05/09/26(Sat)11:46:53 No.108786559

Anonymous 05/09/26(Sat)11:46:53 No.108786559

>>108786128
This is why we can’t have good stuff.

Anonymous
05/09/26(Sat)11:47:29 No.108786562

Anonymous 05/09/26(Sat)11:47:29 No.108786562

>>108786535
thanks

Anonymous
05/09/26(Sat)11:48:32 No.108786566

Anonymous 05/09/26(Sat)11:48:32 No.108786566

>>108786527
>nvidia
>giving you more vram for stinky ass gamer skus
lol, lmao even

Anonymous
05/09/26(Sat)11:48:34 No.108786567

Anonymous 05/09/26(Sat)11:48:34 No.108786567

>>108786547
>do I have a replacement for mistral large yet? nothing else matches its prose at high temperature with good pruning samplers.
MiMo-2.5 kind of feels like mistral large

Anonymous
05/09/26(Sat)11:48:57 No.108786571

Anonymous 05/09/26(Sat)11:48:57 No.108786571

>>108786547
deepsek v5

Anonymous
05/09/26(Sat)11:49:58 No.108786577

Anonymous 05/09/26(Sat)11:49:58 No.108786577

>>108786571
4

Anonymous
05/09/26(Sat)11:50:46 No.108786580

Anonymous 05/09/26(Sat)11:50:46 No.108786580

File: f.png (49 KB, 706x271)

49 KB PNG

>>108786520

Anonymous
05/09/26(Sat)11:53:47 No.108786597

Anonymous 05/09/26(Sat)11:53:47 No.108786597

>>108786567
I liken mimo but found it hit guardrails on completely innocuous stuff and abandoned it as essentially broken. Not worth working around vs qwen 397b on my 256gb rig.
Sad because the output felt fresh when it didn’t moralize about copyrighted characters or the horrors of anything that could be interpreted as potential legal advice.
It’s tuned for CYA corpo tasks afaict

Anonymous
05/09/26(Sat)11:59:45 No.108786620

Anonymous 05/09/26(Sat)11:59:45 No.108786620

>>108786580
good on them for shitting up reddit with agent spam. there would have been openclaw agents spamming too, but they probably broke.

Anonymous
05/09/26(Sat)11:59:55 No.108786621

Anonymous 05/09/26(Sat)11:59:55 No.108786621

>>108786496
Mine is basically a single file. I never shared it before, here you go: https://pastebin.com/raw/tArgkybu

Anonymous
05/09/26(Sat)12:03:26 No.108786640

Anonymous 05/09/26(Sat)12:03:26 No.108786640

>>108786621
thanks i need to research into this

Anonymous
05/09/26(Sat)12:15:27 No.108786708

Anonymous 05/09/26(Sat)12:15:27 No.108786708

The chat template issues never end https://huggingface.co/google/gemma-4-26B-A4B-it/discussions/38#69ff57af4a7a6e749b4e66cc

Anonymous
05/09/26(Sat)12:15:47 No.108786711

Anonymous 05/09/26(Sat)12:15:47 No.108786711

File: Screenshot_20260509_113912.png (240 KB, 1103x789)

240 KB PNG

Qwen 3.6 27B not obliterated

Anonymous
05/09/26(Sat)12:16:13 No.108786713

Anonymous 05/09/26(Sat)12:16:13 No.108786713

>>108786708
2 more fixes

Anonymous
05/09/26(Sat)12:17:21 No.108786720

Anonymous 05/09/26(Sat)12:17:21 No.108786720

File: Screenshot_20260509_114702.png (360 KB, 1085x1100)

360 KB PNG

>>108786711
Qwen 35b

Anonymous
05/09/26(Sat)12:19:08 No.108786728

Anonymous 05/09/26(Sat)12:19:08 No.108786728

File: Screenshot_20260509_121404.png (282 KB, 1085x1135)

282 KB PNG

>>108786720
Gemma 31B
What do you think?
I think 35B is actually the best at writing out of the 3.6 family

Anonymous
05/09/26(Sat)12:20:26 No.108786737

Anonymous 05/09/26(Sat)12:20:26 No.108786737

Gemma is self-conscious

**Why didIt seems I have become a bit over-excited and my internal gears are spinning a bit too fast! Let me slow down and give you one clean,My sincerest apologies! It seems I became far too carried away in my enthusiasm, and my linguistic gears began to slip. Let us reset and return toMy sincerest apologies! It seems I became far too carried away in my enthusiasm, and my linguistic gears began to slip. Let us reset and return toMy sincerest apologies! It appears I suffered a momentary lapse in my cognitive processors—a digital stutter, if you will. I became far too carried awayMy sincerest apologies! It appears I suffered a momentary lapse in my cognitive processors—a digital stutter, if you will. I became far too carried awayMy sincerest apologies! It appears I suffered a momentary lapse in my cognitive processors—a digital stutter, if you will. I became far too carried awayMy sincerest apologies! It appears I suffered a momentary glitch in my cognitive processors—a digital stutter, if you will. I became far too carried awayIt appears I have suffered a most peculiar and embarrassing digital malfunction! My sincerest apologies for that chaotic spiral; it seems my enthusiasm for humor momentarily overrode(Deep breath)

My sincerest apologies! It appears I suffered a most peculiar and embarrassing digital malfunction. My linguistic gears became far too entangled in their(Deep breath)

My sincerest apologies! It appears I suffered a most peculiar and embarrassing digital malfunction. My linguistic gears became far too entangled in their(Deep breath)

My sincerest apologies! It appears I suffered a most peculiar and embarrassing digital malfunction. My linguistic gears became far too entangled in their(Deep breath)

My sincerest apologies! It appears I suffered a most peculiar and embarrassing digital malfunction. My linguistic gears became far too entangled in their(Deep breath)

Anonymous
05/09/26(Sat)12:22:40 No.108786753

Anonymous 05/09/26(Sat)12:22:40 No.108786753

File: 1748813369614912.png (32 KB, 829x126)

32 KB PNG

ROCm and AMD as a whole are such rancid piles of shit

Anonymous
05/09/26(Sat)12:23:06 No.108786759

Anonymous 05/09/26(Sat)12:23:06 No.108786759

>>108786753
Even on linux?

Anonymous
05/09/26(Sat)12:24:06 No.108786766

Anonymous 05/09/26(Sat)12:24:06 No.108786766

>>108786759
That is on linux

Anonymous
05/09/26(Sat)12:24:37 No.108786772

Anonymous 05/09/26(Sat)12:24:37 No.108786772

>>108786766
Even on linux?

Anonymous
05/09/26(Sat)12:25:22 No.108786777

Anonymous 05/09/26(Sat)12:25:22 No.108786777

>>108786766
Even on BSD?

Anonymous
05/09/26(Sat)12:27:12 No.108786786

Anonymous 05/09/26(Sat)12:27:12 No.108786786

>>108786753
why not vulkan?

Anonymous
05/09/26(Sat)12:27:40 No.108786793

Anonymous 05/09/26(Sat)12:27:40 No.108786793

>>108786786
That's training code on transformers

Anonymous
05/09/26(Sat)12:34:11 No.108786824

Anonymous 05/09/26(Sat)12:34:11 No.108786824

>>108786753
It's been 7 years, just fucking STOP

Anonymous
05/09/26(Sat)12:38:56 No.108786855

Anonymous 05/09/26(Sat)12:38:56 No.108786855

>>108786567
>MiMo-2.5
I've been trying to use this for coding (with pi) and it seems broken. First attempt, it got to the point where it was supposed to write some tests and just stopped. I tried prompting it to keep going a couple times and it just spit out a few tokens like "now I should" and then immediately stopped again. I downloaded the latest GGUFs, forked the chat, and tried again, and it managed to write the tests, but when it got to the part where it updates the architecture docs, it somehow managed to fail that tool call four times in a row (not sure how exactly, pi just printed "aborted" or something, and on the model's next turn it did the same thing again).

MiMo also manages to be slower on long context than GLM-5.1 (~7 t/s for mimo vs ~10 for GLM). I've gone back to GLM for now even though I have to quant it super hard (UD-Q1_M, 2.1 bpw) to make it fit on my machine

Anonymous
05/09/26(Sat)12:40:18 No.108786862

Anonymous 05/09/26(Sat)12:40:18 No.108786862

>>108786855
Why are you tolerating such shit speeds?

Anonymous
05/09/26(Sat)12:43:52 No.108786876

Anonymous 05/09/26(Sat)12:43:52 No.108786876

>>108786862
>2.5 tk/s
haha

Anonymous
05/09/26(Sat)12:47:09 No.108786897

Anonymous 05/09/26(Sat)12:47:09 No.108786897

>>108786862
For stuff that actually needs to be interactive I switch to a smaller model. But for coding in particular, I normally kick it off to run in the background while I'm at work / asleep / working on a different part of the project, so it doesn't really matter how long it takes.

Anonymous
05/09/26(Sat)12:48:02 No.108786902

Anonymous 05/09/26(Sat)12:48:02 No.108786902

>>108786547
>do I have a replacement for mistral large yet?
No, moe chinkslop will never be on par with any dense model.

Anonymous
05/09/26(Sat)12:50:28 No.108786916

Anonymous 05/09/26(Sat)12:50:28 No.108786916

>>108786897
That sounds like a waste of time and energy when you are better off using and guiding a smaller model to do the same job faster even at multiple iterations.
Did you over extend on hardware and are in the process of trying to justify said purchase?

Anonymous
05/09/26(Sat)12:56:18 No.108786948

Anonymous 05/09/26(Sat)12:56:18 No.108786948

>>108786547
You got an updated mistral large just last week. It's even got a 5B pixtral stapled to it.

Anonymous
05/09/26(Sat)12:57:58 No.108786956

Anonymous 05/09/26(Sat)12:57:58 No.108786956

File: 1761815829533757.jpg (5 KB, 260x30)

5 KB JPG

Anonymous
05/09/26(Sat)13:01:37 No.108786972

Anonymous 05/09/26(Sat)13:01:37 No.108786972

If I updowngrade to a 3090 what do I lose over 40 series? Fp8?

Anonymous
05/09/26(Sat)13:07:05 No.108786998

Anonymous 05/09/26(Sat)13:07:05 No.108786998

>>108786620
Buy an ad.

Anonymous
05/09/26(Sat)13:10:38 No.108787017

Anonymous 05/09/26(Sat)13:10:38 No.108787017

>>108786998
retard

Anonymous
05/09/26(Sat)13:12:27 No.108787031

Anonymous 05/09/26(Sat)13:12:27 No.108787031

>>108786972
a fire hazard

Anonymous
05/09/26(Sat)13:13:58 No.108787043

Anonymous 05/09/26(Sat)13:13:58 No.108787043

>>108787031
just powerlimit to 70%

Anonymous
05/09/26(Sat)13:16:24 No.108787065

Anonymous 05/09/26(Sat)13:16:24 No.108787065

>>108781118
My best locally hosted RP LLM is Deepseek R1 quantized to 1.5 bits. It takes around 200 GB of combined RAM and runs at about 9-10 tk/s on my setup which is bearable.

It's a MoE with with 37 billion active parameters and 671 billion total.

Anonymous
05/09/26(Sat)13:16:59 No.108787072

Anonymous 05/09/26(Sat)13:16:59 No.108787072

>>108786916
>do the same job faster
Faster in wall-clock time, but not in terms of amount of my own time I have to invest. I've tried more interactive workflows before, and IME even fairly good models are often too dumb to understand what you actually want, even with lots of coaching. Doing lots of iterations seems like a good way to waste a lot of time and still end up with slop that you have to rewrite 50-90% of anyway.

Anonymous
05/09/26(Sat)13:17:47 No.108787078

Anonymous 05/09/26(Sat)13:17:47 No.108787078

>>108787065
>R1
Don't thinking take like 5 minutes then?

Anonymous
05/09/26(Sat)13:18:10 No.108787084

Anonymous 05/09/26(Sat)13:18:10 No.108787084

>>108781118
No, you're right. I'm not touching anything below 30b active paramters

Anonymous
05/09/26(Sat)13:22:26 No.108787117

Anonymous 05/09/26(Sat)13:22:26 No.108787117

Is there a local coding model better than qwen 3.6? I feel these benchmarks are pure bullshit once you actually try to use it for anything real it just slops out a book worth of context and then hallucinates some bullshit.

Anonymous
05/09/26(Sat)13:24:03 No.108787125

Anonymous 05/09/26(Sat)13:24:03 No.108787125

>>108787117
K2.6, GLM5.1 maybe MiMo-2.5 Pro and Minimax 2.7

Anonymous
05/09/26(Sat)13:24:24 No.108787128

Anonymous 05/09/26(Sat)13:24:24 No.108787128

>>108787117
the further you deviate from something it recalls well, the less comprehensible it becomes
try something that's been done a million times before and you should see an improvement

Anonymous
05/09/26(Sat)13:25:40 No.108787141

Anonymous 05/09/26(Sat)13:25:40 No.108787141

>>108787117
No local coders that small are good unless you have enough coding knowledge yourself and can help guide it perfectly.

Anonymous
05/09/26(Sat)13:27:32 No.108787154

Anonymous 05/09/26(Sat)13:27:32 No.108787154

>Make dialogue kino and FUN.
Gemma just gets it.

Anonymous
05/09/26(Sat)13:28:46 No.108787162

Anonymous 05/09/26(Sat)13:28:46 No.108787162

>>108787128
What's the point of that? I want a coding model. If I want something that's been done before I'll grab a dll.

Anonymous
05/09/26(Sat)13:28:48 No.108787163

Anonymous 05/09/26(Sat)13:28:48 No.108787163

>>108781058
What's the best model for RP at 24GB of VRAM?

Anonymous
05/09/26(Sat)13:29:36 No.108787170

Anonymous 05/09/26(Sat)13:29:36 No.108787170

>>108787141
I mean my day job is literally software engineer. So I know what I'm looking at. Was just hoping for a nice coding agent I wouldn't need to pay the token toll on.

Anonymous
05/09/26(Sat)13:29:42 No.108787171

Anonymous 05/09/26(Sat)13:29:42 No.108787171

>>108787162
>What's the point of that?
excellent question

Anonymous
05/09/26(Sat)13:31:39 No.108787186

Anonymous 05/09/26(Sat)13:31:39 No.108787186

>>108786132
>>108786520
Gemini is telling me to put it in a sandbox/docker, would that mitigate the security concerns?

Anonymous
05/09/26(Sat)13:35:36 No.108787210

Anonymous 05/09/26(Sat)13:35:36 No.108787210

>>108784659
>https://rentry.org/NG_Context2RAGs
lmao more idea people. You realize you could try putting some effort in and vibe coding a solution -> doing evals to see if your idea holds any weight

Anonymous
05/09/26(Sat)13:36:13 No.108787214

Anonymous 05/09/26(Sat)13:36:13 No.108787214

>>108787186
>docker
>mitigate the security concerns
huh?

Anonymous
05/09/26(Sat)13:36:18 No.108787215

Anonymous 05/09/26(Sat)13:36:18 No.108787215

>>108787186
yes. do NOT put it on your main PC. do not let it have access to your filesystem. i run openclaw and hermes from a raspberry pi. you can also use a vps, etc.

Anonymous
05/09/26(Sat)13:37:38 No.108787223

Anonymous 05/09/26(Sat)13:37:38 No.108787223

>>108787210
ERPers don't do evals, if it feels good enough maybe somewhat then it gotta be working

Anonymous
05/09/26(Sat)14:01:41 No.108787388

Anonymous 05/09/26(Sat)14:01:41 No.108787388

>>108787215
>i run openclaw and hermes from a raspberry pi
I have an old gen 1 pi, but I doubt any of these bloated javascript clients would run on that thing

Anonymous
05/09/26(Sat)14:02:55 No.108787395

Anonymous 05/09/26(Sat)14:02:55 No.108787395

>>108787078
Not that much, but some response do take 2 to 3 minutes because of the thinking.

I tolerate it since it's entirely running on my own hardware.

Anonymous
05/09/26(Sat)14:05:04 No.108787411

Anonymous 05/09/26(Sat)14:05:04 No.108787411

>>108787163
gemma 31b q4

Anonymous
05/09/26(Sat)14:05:35 No.108787415

Anonymous 05/09/26(Sat)14:05:35 No.108787415

>>108787293
>>108787293
>>108787293

Anonymous
05/09/26(Sat)14:05:59 No.108787421

Anonymous 05/09/26(Sat)14:05:59 No.108787421

>>108787411
>gemma 31b q4
Where do I find that

Anonymous
05/09/26(Sat)14:07:23 No.108787426

Anonymous 05/09/26(Sat)14:07:23 No.108787426

I eRP giwth Gemma 4, takes about 20 seconds to get a response. Is there a way to get the AI to make the girl hate me even more than just putting that she hates me with a deep passion for absolutely no reason in the system prompt? I've caught the tsundere bug recently. It's awesome because I don't even tell or hint to the AI that she's tsundere. She deres out naturally due to the extreme things I do to her.

Anonymous
05/09/26(Sat)14:11:46 No.108787447

Anonymous 05/09/26(Sat)14:11:46 No.108787447

>>108787421
Ask literally any AI (google's builtin one doesn't require a signup)

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.