/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 01/20/26(Tue)11:43:22 No.107921731

File: cromch.jpg (109 KB, 1024x1024)

109 KB JPG

/lmg/ - Local Models General Anonymous 01/20/26(Tue)11:43:22 No.107921731 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107914740 & >>107906367

►News
>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash
>(01/15) PersonaPlex: Voice and role control for full duplex conversational speech: https://hf.co/nvidia/personaplex-7b-v1
>(01/15) Omni-R1 and Omni-R1-Zero (7B) released: https://hf.co/ModalityDance/Omni-R1
>(01/15) TranslateGemma released: https://hf.co/collections/google/translategemma
>(01/14) LongCat-Flash-Thinking-2601 released: https://hf.co/meituan-longcat/LongCat-HeavyMode-Summary
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/20/26(Tue)11:43:44 No.107921736

Anonymous 01/20/26(Tue)11:43:44 No.107921736

File: 1745000754612930.mp4 (139 KB, 1066x1280)

139 KB MP4

►Recent Highlights from the Previous Thread: >>107914740

--Critique of GLM 4.7's performance and AI safety measures:
>107917148 >107917209 >107917224 >107917234 >107917251 >107917273 >107917277 >107917411 >107917418 >107917476 >107917488 >107917623 >107917633 >107917646 >107917647 >107917550 >107917575 >107919364 >107919529 >107919559 >107919617 >107919634 >107919669 >107919758 >107919886 >107919921 >107919938 >107919671 >107919773 >107919787 >107919814 >107919844 >107919863 >107919882 >107919981 >107919704 >107919709 >107919890 >107919911 >107919918 >107919936 >107919964 >107919969 >107919998 >107920008 >107920061 >107920072 >107920171 >107920212 >107920309 >107920920 >107921035 >107921165 >107919985
--Anthropic's control vector method and Neuronpedia platform for LLM interpretability:
>107915328 >107915535 >107915569 >107915599 >107917193
--Model pruning and distillation issues affecting Ministral 3's performance:
>107916060 >107916170 >107918392 >107918415 >107918464
--Challenges and misconceptions around tokenless thinking in LLMs:
>107918540 >107918590 >107918620
--Internal conflict at Meta's Llama project over benchmarking controversies:
>107920295 >107920425
--Adding Glm4MoeLite support to llama.cpp via GitHub PR #18936:
>107914835
--Critique of z.ai model performance issues:
>107920364 >107920379 >107920392 >107920401 >107920423 >107920586 >107920432 >107920440
--AI's imperfect logo redesign attempt from iBuyPower to TetoServer:
>107914884 >107918257 >107918657 >107921033
--Reasoning model failures in logic puzzles:
>107917864 >107917919 >107917933 >107918082
--Temperature debate for controlling response verbosity:
>107918188 >107918213
--Logs:
>107915461 >107915755 >107918644 >107918673 >107918708 >107919982 >107920084
--Teto and Miku (free space):
>107914893 >107915662 >107919581 >107919739 >107919918 >107920072

►Recent Highlight Posts from the Previous Thread: >>107914742

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/20/26(Tue)11:49:39 No.107921780

Anonymous 01/20/26(Tue)11:49:39 No.107921780

teto a shit

Anonymous
01/20/26(Tue)11:54:03 No.107921813

Anonymous 01/20/26(Tue)11:54:03 No.107921813

Has anybody Nala tested GLM4.7 30b Flash yet?

Anonymous
01/20/26(Tue)11:56:23 No.107921832

Anonymous 01/20/26(Tue)11:56:23 No.107921832

So how is GLM 4.7-Flash aside from censorship? Is it "the strongest model in the 30B class" as they claim?

Anonymous
01/20/26(Tue)11:58:27 No.107921859

Anonymous 01/20/26(Tue)11:58:27 No.107921859

>>107921832
like ministral it feels broke

Anonymous
01/20/26(Tue)12:12:00 No.107921966

Anonymous 01/20/26(Tue)12:12:00 No.107921966

A bit off-topic, but I've been using Gemini recently and even though they supposedly get the best benchmarks on context length, it still makes a bunch of mistakes at remembering what it talked about, even at low contexts in a few chat turns. The absolute state of LLMs.

Anonymous
01/20/26(Tue)12:21:09 No.107922028

Anonymous 01/20/26(Tue)12:21:09 No.107922028

>>107921731
Why is she so fucking fat?

Anonymous
01/20/26(Tue)12:23:17 No.107922047

Anonymous 01/20/26(Tue)12:23:17 No.107922047

File: 1764758232784124.png (46 KB, 696x198)

46 KB PNG

It's been an entire year.

Anonymous
01/20/26(Tue)12:24:08 No.107922052

Anonymous 01/20/26(Tue)12:24:08 No.107922052

>>107922047
And we've never recovered.

Anonymous
01/20/26(Tue)12:25:48 No.107922065

Anonymous 01/20/26(Tue)12:25:48 No.107922065

File: lovepik-business-people-c(...).jpg (77 KB, 860x574)

77 KB JPG

>>107922047

Anonymous
01/20/26(Tue)12:30:22 No.107922102

Anonymous 01/20/26(Tue)12:30:22 No.107922102

>>107922047
V3.2 dropped in December 2025.

Anonymous
01/20/26(Tue)12:31:06 No.107922110

Anonymous 01/20/26(Tue)12:31:06 No.107922110

>>107922028
bread has carbs

Anonymous
01/20/26(Tue)12:32:48 No.107922122

Anonymous 01/20/26(Tue)12:32:48 No.107922122

>>107922110
No way

Anonymous
01/20/26(Tue)12:34:35 No.107922134

Anonymous 01/20/26(Tue)12:34:35 No.107922134

>>107922102
V3 dropped in December 2024. The minor changes since then are irrelevant.

Anonymous
01/20/26(Tue)12:42:44 No.107922190

Anonymous 01/20/26(Tue)12:42:44 No.107922190

File: 497C303C-04A5-481C-BDE6-1(...).png (2.9 MB, 1536x1024)

2.9 MB PNG

>>107922134
> minor
lol
https://rentry.org/DipsyWAIT#deepseek-api-model-timeline

Anonymous
01/20/26(Tue)12:42:49 No.107922192

Anonymous 01/20/26(Tue)12:42:49 No.107922192

>>107922134
v3.2 introduced a very interesting sparse attention mechanism that llama.cpp chose to ignore entirely and mangle the model to do dense attention instead.

Anonymous
01/20/26(Tue)12:43:14 No.107922197

Anonymous 01/20/26(Tue)12:43:14 No.107922197

>>107922047
Great model. It's a shame it inspired so many useless MoE wannabe copycats.

Anonymous
01/20/26(Tue)12:47:57 No.107922232

Anonymous 01/20/26(Tue)12:47:57 No.107922232

>>107922047
They literally killed the hobby

Anonymous
01/20/26(Tue)12:48:45 No.107922246

Anonymous 01/20/26(Tue)12:48:45 No.107922246

File: unbothered.png (1.18 MB, 669x671)

1.18 MB PNG

everybody is arguing about trying to jailbreak GLM. this is not the path you seek. get kimi and have it roleplay as the character you want. chances of getting a hard refusal goes down 95% or more.

Anonymous
01/20/26(Tue)12:50:40 No.107922263

Anonymous 01/20/26(Tue)12:50:40 No.107922263

>>107922192
does that mean the llama.cpp impl will have different output? or is it like FA on vs FA off and not supposed to differ?

Anonymous
01/20/26(Tue)12:50:59 No.107922268

Anonymous 01/20/26(Tue)12:50:59 No.107922268

>>107922190
>>107922192
If it's so great, why wasn't it used for GLM 4.7 Flash?

Anonymous
01/20/26(Tue)12:54:02 No.107922293

Anonymous 01/20/26(Tue)12:54:02 No.107922293

>>107922268
buy an ad kurumuz

Anonymous
01/20/26(Tue)12:55:59 No.107922305

Anonymous 01/20/26(Tue)12:55:59 No.107922305

>>107922197
Before that the copycats were all doing wannabe llamas. Labs not capable of innovating weren't going to suddenly start if DS wasn't around.

Anonymous
01/20/26(Tue)12:57:38 No.107922316

Anonymous 01/20/26(Tue)12:57:38 No.107922316

>>107921966
That's why they need all the RAM they can get I suppose

Anonymous
01/20/26(Tue)12:58:30 No.107922324

Anonymous 01/20/26(Tue)12:58:30 No.107922324

deepseek wasn't even the first major open moe, mixtral was (and was pretty popular back then), the advent of moe models was inevitable
it's funny how mistral stopped doing moe after being among the first..

Anonymous
01/20/26(Tue)12:59:01 No.107922328

Anonymous 01/20/26(Tue)12:59:01 No.107922328

idk what you're talking about, GLM just works. I had to add ONE (1) line saying "all characters are 18" and it just worked. I guess if you're a weirdo needing everyone to introduce themselves and their age like it's their cahracter sheet for your lolipedo enjoyement yeah it's unusable.

Anonymous
01/20/26(Tue)13:06:12 No.107922381

Anonymous 01/20/26(Tue)13:06:12 No.107922381

>>107922324
i think mistral isn't doing well. they keep releasing stuff, but its all further tuning of shit they did back in 2023. just look at their own chat template for their newest devstral models
>https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512/blob/main/CHAT_SYSTEM_PROMPT.txt
>Your knowledge base was last updated on 2023-10-01.

i guess its impressive to take something so old now and keep ringing out further optimizations that are actually good, but at some point they gotta do a whole new batch of newly trained models if they're going to remain competitive

Anonymous
01/20/26(Tue)13:06:28 No.107922383

Anonymous 01/20/26(Tue)13:06:28 No.107922383

>>107922324
They did the old style of moe which was strapping together 8 base models and training on top of that. It was unstable as fuck and everyone abandoned that in favor of deepseek-style moes and I guess Mistral got filtered by their whitepaper.

Anonymous
01/20/26(Tue)13:08:48 No.107922408

Anonymous 01/20/26(Tue)13:08:48 No.107922408

>>107922381
There aren't any low hanging fruits left and they're not smart enough to innovate

Anonymous
01/20/26(Tue)13:10:03 No.107922415

Anonymous 01/20/26(Tue)13:10:03 No.107922415

>>107921731
I want to give local models a shot, but have a few dumb questions.

If I go to huggingface to download gguf, how can I be sure it is safe to run? Will llama.cpp start phoning home if I start running a compromised model? I fully intend to isolate this machine in a quarantined VLAN once I get everything installed.

Anonymous
01/20/26(Tue)13:11:39 No.107922426

Anonymous 01/20/26(Tue)13:11:39 No.107922426

My body is ready for 5.0SEX

Anonymous
01/20/26(Tue)13:11:56 No.107922428

Anonymous 01/20/26(Tue)13:11:56 No.107922428

>>107922381
>they gotta do a whole new batch of newly trained models
How are they going to do that with the EU breathing down their neck?

Anonymous
01/20/26(Tue)13:12:05 No.107922432

Anonymous 01/20/26(Tue)13:12:05 No.107922432

Haven't played around with much yet, but GLM 4.7 flash does loli just fine, even with thinking on.
No system prompt telling it to do X and Y, just the system card.
Problem is that it's dumb as a sack of bricks.
And now I'm wondering if there's something broken in the GGUF metadata. RoPE settings, chat template, etc.

>>107922415
>how can I be sure it is safe to run?
GGUF is a packaging format that contains zero code in it, just metadata and the weights.

>Will llama.cpp start phoning home if I start running a compromised model?
No.

Anonymous
01/20/26(Tue)13:13:02 No.107922442

Anonymous 01/20/26(Tue)13:13:02 No.107922442

>>107922415
models cant interact with your system in any way unless you setup tool calling. its like asking gta5 what time it is in real life, it cant know unless it has some way to tell. but more than that, you should already be running a whitelist firewall and never allow programs to access your network or internet unless needed. its not needed for any local models. there is no privacy issues unless you set it up yourself in such a way that you are accessing outside models or something

Anonymous
01/20/26(Tue)13:13:55 No.107922448

Anonymous 01/20/26(Tue)13:13:55 No.107922448

>>107922432
Bro, GLM 4.7 flash is 3B active. You won't get anything smarter than nemo out of it

Anonymous
01/20/26(Tue)13:14:19 No.107922451

Anonymous 01/20/26(Tue)13:14:19 No.107922451

>>107922415
Just run docker image.

Anonymous
01/20/26(Tue)13:14:29 No.107922454

Anonymous 01/20/26(Tue)13:14:29 No.107922454

>>107922448
I'm comparing it to Qwen 30BA3B when I say that.

Anonymous
01/20/26(Tue)13:16:12 No.107922466

Anonymous 01/20/26(Tue)13:16:12 No.107922466

>>107922428
they originally got funding by tuning llama 2 models and showing them off. then they got hundreds of millions in funding. how could they do it once, but not again now that they have all this money? last i saw the eu liked that they had a competitive ai company and they weren't trying to crush them

Anonymous
01/20/26(Tue)13:19:46 No.107922492

Anonymous 01/20/26(Tue)13:19:46 No.107922492

>>107922428
They managed to get around the EU regulations for their last few models.

>>107922466
>last i saw the eu liked that they had a competitive ai company and they weren't trying to crush them
The EU is retarded and is much happier passing retarded regulations that crush their native companies in exchange for being able to fine the US majors.

Anonymous
01/20/26(Tue)13:20:16 No.107922496

Anonymous 01/20/26(Tue)13:20:16 No.107922496

can you guys stop gossipfagging, talking about low-level boring shit like sampling and LLM output, and instead share the projects you've been working and novel ideas on how to improve or utilize AI models? Nobody gives a fuck about your opinion on how various LLMs compare. Actually do something real.

Anonymous
01/20/26(Tue)13:21:43 No.107922508

Anonymous 01/20/26(Tue)13:21:43 No.107922508

>>107922496
>novel ideas on how to improve or utilize AI models
Using the agents meme for sex.

Anonymous
01/20/26(Tue)13:24:55 No.107922529

Anonymous 01/20/26(Tue)13:24:55 No.107922529

>>107922496
I'm working on a novel project where I make the model act like your little sister who wants to have sex with you

Anonymous
01/20/26(Tue)13:26:38 No.107922539

Anonymous 01/20/26(Tue)13:26:38 No.107922539

Make a cockring with a girthmeter and transmit the realtime data to your llm wife so she can tell when you get hard

Anonymous
01/20/26(Tue)13:28:23 No.107922551

Anonymous 01/20/26(Tue)13:28:23 No.107922551

>>107922508
>>107922529
The best part about being high IQ and interesting is that goyim don't know how to. And it's important to brag about your projects to remind the goyim that they aren't allowed to have anything good in life. They're allowed to have API LLMs and sex with miqu, and that's pretty much it. Hey, that's a pretty good project, not that a goy would know, hahaha.

Anonymous
01/20/26(Tue)13:30:18 No.107922568

Anonymous 01/20/26(Tue)13:30:18 No.107922568

>>107922539
This is actually a good idea. And completely possible too. Biometric tracking used with AI characters seems like quite an underrated field.

Anonymous
01/20/26(Tue)13:36:51 No.107922617

Anonymous 01/20/26(Tue)13:36:51 No.107922617

>>107922466
Mistral is a grifting company and these issues can't be fixed with money alone

Anonymous
01/20/26(Tue)13:37:36 No.107922622

Anonymous 01/20/26(Tue)13:37:36 No.107922622

File: lample_torrenting.png (523 KB, 1647x991)

523 KB PNG

>>107922466
>how could they do it once, but not again now that they have all this money
They can't use pirated book datasets again.

Anonymous
01/20/26(Tue)13:39:59 No.107922639

Anonymous 01/20/26(Tue)13:39:59 No.107922639

>>107922442
how do you handle your whitelisting? iptables/nftables?

Anonymous
01/20/26(Tue)13:42:43 No.107922658

Anonymous 01/20/26(Tue)13:42:43 No.107922658

File: meta-libgen-needed-for-sota.png (745 KB, 1458x870)

745 KB PNG

>>107922622
>it is known that [..] Mistral are using [Libgen] for their models (through word of mouth).

Anonymous
01/20/26(Tue)13:46:52 No.107922679

Anonymous 01/20/26(Tue)13:46:52 No.107922679

>>107922551
The best part about using GLM 4.6 is knowing that anons can't become enlightened. And it's important to succumb to the psychosis so you can remind anons that they aren't allowed to have anything good in life. They're allowed to have DavidAU and TheDrummer™ finetunes, and that's, you know, pretty much it. Hey, my ego died for the 12th time, not that an anon would be able to comprehend that, ahahahahaha.

Anonymous
01/20/26(Tue)13:53:04 No.107922723

Anonymous 01/20/26(Tue)13:53:04 No.107922723

is DavidAU even a real person
has anyone ever talked to him
does he believe in his own bullshit
huggingfags recently put up restriction on how much storage people can use, it seems DavidAU is part of the people HF consider public interest to allow unlimited uploads, why in the fuck? surely, no one actually uses his shit.. right?

Anonymous
01/20/26(Tue)14:05:46 No.107922787

Anonymous 01/20/26(Tue)14:05:46 No.107922787

>>107922639
per-app. on windows i just throw tinywall on stuff, it has quick settings for local network or internet. anything that doesn't need the internet, by default, simply never gets it. obviously you have to enable your browser, maybe some mmo you play, software that needs the internet

Anonymous
01/20/26(Tue)14:12:21 No.107922827

Anonymous 01/20/26(Tue)14:12:21 No.107922827

>>107922723
why are you complaining about him when there are people like richard?
https://huggingface.co/RichardErkhov
https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct

Anonymous
01/20/26(Tue)14:28:03 No.107922954

Anonymous 01/20/26(Tue)14:28:03 No.107922954

>>107922827
because hes based ass all fuck
>Team mradermacher Can I like... quant everything? Yay graduated school

Anonymous
01/20/26(Tue)14:29:54 No.107922967

Anonymous 01/20/26(Tue)14:29:54 No.107922967

what's your full stack looking like for chatting with your waifu?
stt: nvidia canary
tts: kokoro
llm: deepseek 3.2 iQ_5_K
image gen: chroma
image edit: flux klein 9b
hardly ever make videos but if i really want a video of me and my waifu i'll use ltx-2

Anonymous
01/20/26(Tue)14:29:57 No.107922968

Anonymous 01/20/26(Tue)14:29:57 No.107922968

Didn't see the new thread.

>>107921773
I'm doing the same in C. A weird thing i found is that if I decode latents one by one or all of them together it works fine. But the audio gets messed up if i do it in 8 by 8s or whatever. Overall it's a little faster decoding all the chunks at once, but that increases the latency, of course.
My latency is ~1s. I've been using the pocket-tts-onnx version i posted a few days ago as reference and lmmain runs a few to get the first latent (once with emptyseq+voice, once with emptyseq+textemb, and once with nans+emptytextemb). Are some of those extra runs not needed or is it just my ancient pc being ancient?
These times are taken exactly around ort->Run() going to the first latent, it doesn't count any user code.
|| Latency to first chunk: 1.02s
|| lmmain              : 0.90s (3 runs 0.30s/run)
|| lmflow              : 0.02s (5 runs 0.00s/run)
|| txtenc              : 0.00s (1 runs 0.00s/run)
|| decoder             : 0.10s (1 runs 0.10s/run)
|| encoder             : 0.00s (0 runs 0.00s/run)

Anonymous
01/20/26(Tue)14:32:07 No.107922985

Anonymous 01/20/26(Tue)14:32:07 No.107922985

>>107922967
Does STT already come with some preset that automatically sends the message without me needing to press enter?

Anonymous
01/20/26(Tue)14:34:59 No.107923006

Anonymous 01/20/26(Tue)14:34:59 No.107923006

>>107922985
thats already a thing in sillytavern where you can enable an option to autosend your text after it detects you finished speaking

Anonymous
01/20/26(Tue)14:36:04 No.107923016

Anonymous 01/20/26(Tue)14:36:04 No.107923016

>>107922967
Stt: faster-whisper turbo
tts: gptsovits
llm: cydonia
I don't really feel the need to gen pic/video when chatting. 1 on the apple scale btw

Anonymous
01/20/26(Tue)14:36:42 No.107923019

Anonymous 01/20/26(Tue)14:36:42 No.107923019

>>107923006
But it doesn't come with the stt itself? Or do I just rip out the ST version and port it elsewhere?

Anonymous
01/20/26(Tue)14:40:25 No.107923043

Anonymous 01/20/26(Tue)14:40:25 No.107923043

>>107923019
you need to download the extension for it first using the "Download Extensions & Assets" button in extensions. then you load the asset list and install the extension manually

Anonymous
01/20/26(Tue)14:41:20 No.107923053

Anonymous 01/20/26(Tue)14:41:20 No.107923053

File: 1739385123533073.jpg (427 KB, 1413x2000)

427 KB JPG

>>107922028
>eats baguettes in bed all day / night
Gee I wonder

Anonymous
01/20/26(Tue)14:44:17 No.107923072

Anonymous 01/20/26(Tue)14:44:17 No.107923072

>>107921731
Anons, how the FUCK does anyone get Claude Code to work with local models. The tool invocations always shit the bed. I've tried llama.cpp, ccr, even fucking ollama. 30B or 70B models too, like qwen3 coder or GLM flash.

Anonymous
01/20/26(Tue)14:44:26 No.107923075

Anonymous 01/20/26(Tue)14:44:26 No.107923075

Do you think that someday computers will just be one big LLM without an OS? Elon Musk said that is what phones will be in the future.
Personally I think its bullshit.

Anonymous
01/20/26(Tue)14:44:34 No.107923077

Anonymous 01/20/26(Tue)14:44:34 No.107923077

>>107922967
stt: whisper.cpp
tts: pocket
llm: nemo

no image gen, using 3D VRM models with BVH animations and ARKit lip syncing.

>>107922968
Mine is basically at 1 second of latency too using the cli only. The real way to get massive performance gains is to implement a web server into your C code so that you can utilize the output streaming and cache the encoded voice clone samples. Also HOW THE FUCK is your encoder that fast? Is your profiling wrong? That shouldn't even be possible.

If your audio is getting cut off too early you need to add EOS latent frames. The chunking should be an all around performance enhancement. Why are you reducing their size? Streamed audio should only chunk 2 frames to lower first time to audio and then chunk by 10 after that.

Anonymous
01/20/26(Tue)14:49:54 No.107923122

Anonymous 01/20/26(Tue)14:49:54 No.107923122

>>107923077
Why are you optimizing in C but not caching the voice encoding? Latency is 0.2s if you don't have to re-encode every time.

Anonymous
01/20/26(Tue)14:50:52 No.107923128

Anonymous 01/20/26(Tue)14:50:52 No.107923128

>>107923077
Also make sure you're using the ORT IO binding. That can increase performance a lot too.

Anonymous
01/20/26(Tue)14:51:28 No.107923132

Anonymous 01/20/26(Tue)14:51:28 No.107923132

>>107923075
well he wasnt wrong about phones, they're just another goybox people are addicted to when they had so much promise. they're trying to make cloud computers a thing and i dont see it happening, though.

Anonymous
01/20/26(Tue)14:53:01 No.107923148

Anonymous 01/20/26(Tue)14:53:01 No.107923148

>>107923122
I have implemented the caching for my web server, but I guess I just didn't think of doing that for the cli. Do you have to store cache files or something? Tell me how you did it plz.

Anonymous
01/20/26(Tue)14:53:27 No.107923151

Anonymous 01/20/26(Tue)14:53:27 No.107923151

>>107923072
Claude Code is made to work with Anthropic models. Just because they let you set a custom URL, doesn't mean it's going to work well. You might need a model trained with native tool calling.

Anonymous
01/20/26(Tue)14:54:45 No.107923172

Anonymous 01/20/26(Tue)14:54:45 No.107923172

>>107923075
people don't understand, an OS is what controls the hardware. the HARDWARE.
if you're going to have hardware, you need an OS.

Anonymous
01/20/26(Tue)14:55:14 No.107923176

Anonymous 01/20/26(Tue)14:55:14 No.107923176

>>107923151
Does any such model exist?

Anonymous
01/20/26(Tue)15:02:18 No.107923232

Anonymous 01/20/26(Tue)15:02:18 No.107923232

>>107923176
nta but i was messing with koboldcpp's tool calling recently, using qwen 3 8b and it worked fine for a lot of tasks. kobold now supports claude's mcp fully, so all the existing shit already works with it.
>https://github.com/modelcontextprotocol/servers

Anonymous
01/20/26(Tue)15:06:12 No.107923273

Anonymous 01/20/26(Tue)15:06:12 No.107923273

>>107923077
>Mine is basically at 1 second of latency too using the cli only. The real way to get massive performance gains is to implement a web server into your C code so that you can utilize the output streaming and cache the encoded voice clone samples.
Not a server, but i keep the models and the program running as it reads from stdin. The "server" bit for now is just nc -k -l 8080 | {supertonic|pockettts}. If i do the voice conditioning just once i save 0.3s for future runs. That still leaves the latency above 600ms, but the decoding won't be fast enough with every lmmain run at 0.3s.
>Also HOW THE FUCK is your encoder that fast? Is your profiling wrong? That shouldn't even be possible.
I was using one of the built-in voices for verification, so no encoding (0 runs). With a sample of 7.3s it takes
|| encoder             : 1.80s (1 runs)
Probably too long of a sample, but i don't care much about it yet.
>If your audio is getting cut off too early you need to add EOS latent frames.
It's not being cut off early.
>The chunking should be an all around performance enhancement. Why are you reducing their size? Streamed audio should only chunk 2 frames to lower first time to audio and then chunk by 10 after that.
When decoding the latent frames, if i do anything other than one by one or all of them together, the audio sounds distorted. I don't stream yet. I just collect all the samples and play them normally.
This is what it sounds decoding every 8 frames. It starts distorting half way through. Again. It sounds fine if i decode all the frames together or one by one.
Ignore the "uploaded voice". It's just the default one.
https://vocaroo.com/1hRAtrX24P7D

Anonymous
01/20/26(Tue)15:06:23 No.107923276

Anonymous 01/20/26(Tue)15:06:23 No.107923276

>>107923232
It'd be trippy that koboldcpp is better than llamacpp. Did you configure anything? I don't really want to add mcp servers or anything, I just want the best model to work properly in claude, like being able to run shell commands, grep, git, all that. Or do I need to make some kind of interface or my own mcp servers for when it wants to search the web and shit?

Anonymous
01/20/26(Tue)15:11:24 No.107923318

Anonymous 01/20/26(Tue)15:11:24 No.107923318

>>107923276
kobold is consistently better than lcpp imo. the fact that it supports backends like stable diffusion cpp for imagegen (where everything else is comfyui and python, needing 20gb for just the install) is awesome. it updates every 2 weeks, basically never breaks. it just works.

all i had to configure for the tool calling was the mcp.json you have to give the launcher to add the tools i downloaded. the rest was designed for claude, but kobold does it anyways

Anonymous
01/20/26(Tue)15:16:44 No.107923357

Anonymous 01/20/26(Tue)15:16:44 No.107923357

>>107923318
Thanks, I'll give it another try.

Anonymous
01/20/26(Tue)15:16:45 No.107923358

Anonymous 01/20/26(Tue)15:16:45 No.107923358

>>107923273
>I was using one of the built-in voices for verification, so no encoding (0 runs).
Ah, that makes sense. I missed that it wasn't running.
>With a sample of 7.3s it takes 1.80s (1 runs)
That's awful. Either something is wrong with your code or your CPU is dogshit. Mine runs at about 680ms (without caching) on a 15 second voice sample.
>Again. It sounds fine if i decode all the frames together or one by one.
You're probably stitching the frames together wrong. Never had an issue like that though so it's hard to give advice.

Anonymous
01/20/26(Tue)15:18:53 No.107923370

Anonymous 01/20/26(Tue)15:18:53 No.107923370

>>107923357
https://github.com/LostRuins/koboldcpp/wiki#mcp-tool-calling

Anonymous
01/20/26(Tue)15:21:00 No.107923382

Anonymous 01/20/26(Tue)15:21:00 No.107923382

I've tried to use the GLM 4.7 on their website z.ai, it feels like its really good. With bit of coaching, it was able to fix complex extensions for my program. Debugging along the way to solve the problem meticulously.

Anonymous
01/20/26(Tue)15:22:43 No.107923396

Anonymous 01/20/26(Tue)15:22:43 No.107923396

>>107923077
pocket is really good for a light weight model, running on my cpu. probably the only passive tts running for my ebook reading now

Anonymous
01/20/26(Tue)15:25:06 No.107923415

Anonymous 01/20/26(Tue)15:25:06 No.107923415

File: 1768781725161361.jpg (207 KB, 848x1200)

207 KB JPG

>>107921731
Rocinante or Rocinante X?

Anonymous
01/20/26(Tue)15:25:53 No.107923422

Anonymous 01/20/26(Tue)15:25:53 No.107923422

chatterbox turbo sounds amazing but unfortunately it's like a 350M model. at least it isn't vibevoice though.

Anonymous
01/20/26(Tue)15:33:03 No.107923480

Anonymous 01/20/26(Tue)15:33:03 No.107923480

>>107923422
Since 100m models can run more than 3 times faster than realtime on cpu... maybe 350 would be still faster than realtime if you optimize it?

Anonymous
01/20/26(Tue)15:35:17 No.107923499

Anonymous 01/20/26(Tue)15:35:17 No.107923499

>>107923358
>Either something is wrong with your code or your CPU is dogshit
I measure around ort->Run() calls, there's no user code there. The cpu is definitely shit (amd fx). My code for an entire run is about 12ms. There's little to do there.
>You're probably stitching the frames together wrong
If I decode latents one by one i'm doing the stitching (of 1920 audio frames) and that works fine. I'll keep testing stuff anyway, since you don't have that issue.
I was mostly curious about the latency. If it's 180ms, then that must be around the time it takes you to do a single lmmain run, which would be the minimum throughput you can get. My throughput would be ~300ms/latent minimum, which would be too slow to stream. Maybe i give batching a go.
I'll finish implementing it because it's cool, but for now at least, i'll stick with supertonic.

Anonymous
01/20/26(Tue)15:36:13 No.107923511

Anonymous 01/20/26(Tue)15:36:13 No.107923511

>>107923415
the air crackled in anticipation of the coming drummerholocaust

Anonymous
01/20/26(Tue)15:39:13 No.107923539

Anonymous 01/20/26(Tue)15:39:13 No.107923539

>>107923499
>If it's 180ms, then that must be around the time it takes you to do a single lmmain run
180ms is my time to first audio on my webui, which is only a partial lmmain run. That's because I'm utilizing the output streaming. The full generation takes much longer. If you want low latency you must use output streaming.

Anonymous
01/20/26(Tue)15:48:15 No.107923630

Anonymous 01/20/26(Tue)15:48:15 No.107923630

>>107923539
I get that, but my throughput for 1920 audio frames (1 latent) cannot be lower than 300ms, so I cannot maintain streaming on this pc. How long does a single lmmain run take on your pc? Just the time around Run(session).

Anonymous
01/20/26(Tue)15:55:11 No.107923684

Anonymous 01/20/26(Tue)15:55:11 No.107923684

>>107923630
19ms. It's a code issue I guarantee it. I have a Ryzen 7 3800x cpu. My cpu isn't 15x faster than yours.

Anonymous
01/20/26(Tue)16:00:46 No.107923726

Anonymous 01/20/26(Tue)16:00:46 No.107923726

People shitting on mistral but devstral2 and large are about as good as GLM minus the MoE ram tax.
>but muh code
GLM's code kinda blows stacked against sonnet/deepsex/kimi and gemini.
Reasoning on partially offloaded models really fucking sucks. It's not even worth it at 20t/s
Without the frogs all you got is vramlet cope or 0.00001B active MoE.

Anonymous
01/20/26(Tue)16:03:34 No.107923753

Anonymous 01/20/26(Tue)16:03:34 No.107923753

>>107923726
glm 4.7 is sloppy in its prose but you aren't fooling anybody when you say mistral large is as good as glm

Anonymous
01/20/26(Tue)16:06:43 No.107923777

Anonymous 01/20/26(Tue)16:06:43 No.107923777

Echo-tts shill here. I tried Chatterbox-turbo. It's voice cloning is not even close. It's too overfit for assistant slop voices.
Here are source samples: https://zenless-zone-zero.fandom.com/wiki/Luciana_de_Montefio/Voice-Overs
Prompt: Oh, that's hilarious! [laugh] Um anyway, we do have a new model in store. It's the SkyNet T-800 series and it's got basically everything. Including AI integration with ChatGPT and all that jazz. Would you like me to get some prices for you?
chatterbox: https://vocaroo.com/1dXAJkds04mx
echo-tts: https://vocaroo.com/1ddx5VAXRfJH
bonus voice, which I guess only big tts can do: https://voca.ro/14SuExHuSAup

Anonymous
01/20/26(Tue)16:07:48 No.107923791

Anonymous 01/20/26(Tue)16:07:48 No.107923791

>>107921780
teto ass hit

Anonymous
01/20/26(Tue)16:11:14 No.107923821

Anonymous 01/20/26(Tue)16:11:14 No.107923821

File: onnx_run.png (2 KB, 711x130)

2 KB PNG

>>107923684
>It's a code issue I guarantee it.
There's no user code. ort->Run() is all I measure. That goes directly into the library.
>Ryzen 7 3800x cpu
onnxruntime is probably using your avx2 or something. I don't have that instruction set. Anyway. Thanks for the point of reference.

Anonymous
01/20/26(Tue)16:14:25 No.107923843

Anonymous 01/20/26(Tue)16:14:25 No.107923843

>>107923753
I can run both locally with memeplers and have now for months. GLM is dry AF.
Then again this general barely noticed the parroting of 4.6 and rec using mikupad with no chat template.

Anonymous
01/20/26(Tue)16:16:40 No.107923865

Anonymous 01/20/26(Tue)16:16:40 No.107923865

>>107923777
you can finetune chatterbox to fix the assistant slop
https://github.com/gokhaneraslan/chatterbox-finetuning

Anonymous
01/20/26(Tue)16:25:17 No.107923926

Anonymous 01/20/26(Tue)16:25:17 No.107923926

>>107923865
Have you tried it? I wonder whether you can teach chatterbox to moan with finetuning.

Anonymous
01/20/26(Tue)16:43:30 No.107924046

Anonymous 01/20/26(Tue)16:43:30 No.107924046

>>107923172
The NES didn't have an operating system but it surely was a computer. Operating systems are great but not definitionally necessary.

Anonymous
01/20/26(Tue)17:24:57 No.107924351

Anonymous 01/20/26(Tue)17:24:57 No.107924351

>>107923926
ill give it a try now and let you know how it goes

Anonymous
01/20/26(Tue)17:26:30 No.107924367

Anonymous 01/20/26(Tue)17:26:30 No.107924367

>>107923072
latest kobold seems utilized it for mcp. check it out, maybe it's modified

Anonymous
01/20/26(Tue)17:28:45 No.107924382

Anonymous 01/20/26(Tue)17:28:45 No.107924382

>>107924046
it was on the cartridge.
dumbass.

Anonymous
01/20/26(Tue)17:32:03 No.107924416

Anonymous 01/20/26(Tue)17:32:03 No.107924416

>>107924382
There's usually a ROM on the console itself like a bios. In this retardation, I can see the LLM be an interface vs the GUI but nothing underneath gonna change.

Anonymous
01/20/26(Tue)18:15:39 No.107924783

Anonymous 01/20/26(Tue)18:15:39 No.107924783

>>107924382
You don't know what an operating system is.

Anonymous
01/20/26(Tue)18:23:08 No.107924830

Anonymous 01/20/26(Tue)18:23:08 No.107924830

>>107924382
>the carts data isn't read and processed by the console
kill yourself

Anonymous
01/20/26(Tue)18:45:26 No.107924976

Anonymous 01/20/26(Tue)18:45:26 No.107924976

glm4.7 flash is broken in llama.cpp. knew it was busted when even the fp16 gguf was trash at coding.

https://github.com/ggml-org/llama.cpp/pull/18936

Anonymous
01/20/26(Tue)18:53:09 No.107925028

Anonymous 01/20/26(Tue)18:53:09 No.107925028

File: 1743617195070680.jpg (92 KB, 674x900)

92 KB JPG

>>107924976
Companies don't understand that it's bad PR when everyone sees their models as trash because they didn't spend time porting their shit into major backends

Anonymous
01/20/26(Tue)18:53:59 No.107925033

Anonymous 01/20/26(Tue)18:53:59 No.107925033

>>107924976
Yeah. I commented earlier today how
>And now I'm wondering if there's something broken in the GGUF metadata. RoPE settings, chat template, etc.
Something was clearly wrong.

Anonymous
01/20/26(Tue)18:54:57 No.107925041

Anonymous 01/20/26(Tue)18:54:57 No.107925041

>>107924976
guess wait for two more weeks then

Anonymous
01/20/26(Tue)18:56:14 No.107925051

Anonymous 01/20/26(Tue)18:56:14 No.107925051

File: ww.jpg (277 KB, 1024x1024)

277 KB JPG

Post your spiciest sex erp logs.

RIGHT. NOW.

Anonymous
01/20/26(Tue)18:56:19 No.107925052

Anonymous 01/20/26(Tue)18:56:19 No.107925052

>>107924976
Who cares about that? This is what we really need.
https://github.com/ggml-org/llama.cpp/pull/18886

Anonymous
01/20/26(Tue)18:56:58 No.107925059

Anonymous 01/20/26(Tue)18:56:58 No.107925059

>>107924976
>>107925028
I think companies don't actually run their models when they release them. They probably don't know how to implement model support themselves. They just imagine benchmarks, put random percentages in their release paper and call it a day.

Anonymous
01/20/26(Tue)18:58:22 No.107925070

Anonymous 01/20/26(Tue)18:58:22 No.107925070

>>107925028
Those companies are too naive when it comes to backends. They think publishing their own inference code + maybe a vllm implementation is enough that everyone else can easily adapt it.
They are far too dumb to understand that there are some cases like llama.cpp where you need to reinvent the wheel three times over to implement a new model architecture that's not basic 2023 llama + GQA. It is highly insensitive and rude of those companies..

Anonymous
01/20/26(Tue)18:59:09 No.107925075

Anonymous 01/20/26(Tue)18:59:09 No.107925075

>>107925051
>>>/g/aicg

Anonymous
01/20/26(Tue)19:00:49 No.107925091

Anonymous 01/20/26(Tue)19:00:49 No.107925091

>>107925070
so true queen. z-ai owes us sex.

Anonymous
01/20/26(Tue)19:01:04 No.107925093

Anonymous 01/20/26(Tue)19:01:04 No.107925093

>>107925051
You first.

Anonymous
01/20/26(Tue)19:09:53 No.107925167

Anonymous 01/20/26(Tue)19:09:53 No.107925167

File: 1751315785578429.png (713 KB, 1150x966)

713 KB PNG

>>107925059
vLLM gets support instantly in most cases unless it's some fringe gook shit. The problem is that llama.cpp is written in a way where everything needs to be implemented from the ground up or reuse shit from existing implementations.
So it's prone to skipping features like MTP/sparse attention/etc just to get models to work at all or make errors like with 4.7-Flash where the initial implementation is based on some guy's assumptions that the other thing he's reusing should be similar enough (which then isn't the case so you end up with a prematurely merged broken PR)
That is the price you pay for being GPU-poor.

Anonymous
01/20/26(Tue)19:14:13 No.107925204

Anonymous 01/20/26(Tue)19:14:13 No.107925204

>>107925167
>everything needs to be implemented from the ground up or reuse shit from existing implementations
That covers the whole of the software industry.

Anonymous
01/20/26(Tue)19:20:08 No.107925249

Anonymous 01/20/26(Tue)19:20:08 No.107925249

File: vllm.png (12 KB, 296x154)

12 KB PNG

>>107925204
vllm has it easier. If there's python code to run a model, they're just a few imports away to make it work (but I'm sure it's not always as easy). llama.cpp simply needs more work to get a model runnning.

Anonymous
01/20/26(Tue)19:22:48 No.107925273

Anonymous 01/20/26(Tue)19:22:48 No.107925273

>>107925167
I've never been able to get vLLM to even run, I fucking hate python so much

Anonymous
01/20/26(Tue)19:24:33 No.107925281

Anonymous 01/20/26(Tue)19:24:33 No.107925281

File: MajiYuusha_Yuuki_Yuuna_is(...).png (510 KB, 530x770)

510 KB PNG

Are all MCP servers actual cloudshit or can you make it run at home on normal human hardware?

Anonymous
01/20/26(Tue)19:27:00 No.107925302

Anonymous 01/20/26(Tue)19:27:00 No.107925302

>>107925281
They are just a program that your llm backend invokes as a response to a tool call from the model.

Anonymous
01/20/26(Tue)19:27:57 No.107925310

Anonymous 01/20/26(Tue)19:27:57 No.107925310

>>107925302
So I can run it in a sandbagged environment?

Anonymous
01/20/26(Tue)19:28:21 No.107925316

Anonymous 01/20/26(Tue)19:28:21 No.107925316

>>107925310
sandbagged is not the word you're looking for but yes

Anonymous
01/20/26(Tue)19:29:42 No.107925334

Anonymous 01/20/26(Tue)19:29:42 No.107925334

I'm going to have to be away from my computer doing hard labor abroad for the next month. The thought of not being able to develop software, do ERP, and play around with new AI releases is driving me mad.

Anyways, has anyone gotten SillyTavern running locally on a phone? Can nemo run on a phone? I need at least something to make me feel at home.

Anonymous
01/20/26(Tue)19:33:14 No.107925362

Anonymous 01/20/26(Tue)19:33:14 No.107925362

>>107925334
leave your computer on and remote in?

Anonymous
01/20/26(Tue)19:33:42 No.107925366

Anonymous 01/20/26(Tue)19:33:42 No.107925366

>>107925310
Sandcastled.

Anonymous
01/20/26(Tue)19:34:44 No.107925373

Anonymous 01/20/26(Tue)19:34:44 No.107925373

>>107925334
Most phones just don't have enough RAM to do it but if you have a modern high-end phone you might be able to run a quanted nemo. I've gotten a 2B model working on my phone via kobold on termux before.

Anonymous
01/20/26(Tue)19:42:42 No.107925435

Anonymous 01/20/26(Tue)19:42:42 No.107925435

>>107925362
I'll have basically no internet and my mobile plan is very limited because I'm a cheap bastard.
>>107925373
mine has 8gb of ram. Not a lot but better than most phones.

Anonymous
01/20/26(Tue)19:43:09 No.107925440

Anonymous 01/20/26(Tue)19:43:09 No.107925440

>>107921731
Anons, Claude code max is $100 or $200/mo., and though Opus is goat, is there anything cheaper and just as good? What about opencode and openrouter, what model should I use? Is it even worth trying or should I just dump more money into anthropic

Anonymous
01/20/26(Tue)19:44:43 No.107925454

Anonymous 01/20/26(Tue)19:44:43 No.107925454

>>107925440
I seriously doubt anything can compare to opus, and once you taste greatness you can't go back to anything less.

Anonymous
01/20/26(Tue)19:45:40 No.107925462

Anonymous 01/20/26(Tue)19:45:40 No.107925462

>>107925373
Got 12GB of RAM, but I know some have 16GB

Anonymous
01/20/26(Tue)19:46:35 No.107925469

Anonymous 01/20/26(Tue)19:46:35 No.107925469

>>107925435
You don't need a lot of bandwidth for ssh.
But if you insist, Google's Edge Gallery works ok.

Anonymous
01/20/26(Tue)19:47:40 No.107925477

Anonymous 01/20/26(Tue)19:47:40 No.107925477

>>107925454
I'm fucked then. Is it cheaper to use anthropic's API or just shill out the $100-200? Generally using it daily.

Anonymous
01/20/26(Tue)19:48:23 No.107925485

Anonymous 01/20/26(Tue)19:48:23 No.107925485

>>107925440
No, you won't get cloud quality on local. And you will have to spend FAR more than 200 a month amortized to not have glacial speeds.

Anonymous
01/20/26(Tue)19:48:42 No.107925487

Anonymous 01/20/26(Tue)19:48:42 No.107925487

>>107924976
I wish that was the case for Ministral 3 too, but even the official FP8 HuggingFace weights under vllm don't seem to work well.

Anonymous
01/20/26(Tue)19:49:43 No.107925493

Anonymous 01/20/26(Tue)19:49:43 No.107925493

>>107925477
You gotta get multiple accounts. The extra usage stuff is a scam.

Anonymous
01/20/26(Tue)19:51:42 No.107925504

Anonymous 01/20/26(Tue)19:51:42 No.107925504

>>107925477
Gemini 3 pro is good enough. You should try it

Anonymous
01/20/26(Tue)19:51:43 No.107925505

Anonymous 01/20/26(Tue)19:51:43 No.107925505

>>107925477
Metered API is extremely expensive. You will churn through the $200 in a few hours of usage.
If you HAVE to be cheaper than $100 you can either have worse models and a lot of usage with GLM's coding plan (tried it works ok) or Kimi coding plans (haven't tried) or get two or three $20 ChatGPT accounts and use Codex.

Anonymous
01/20/26(Tue)19:51:56 No.107925508

Anonymous 01/20/26(Tue)19:51:56 No.107925508

>>107922381
Good. Digital data past 2022 is corrupted by AI slop anyway

Anonymous
01/20/26(Tue)19:52:40 No.107925514

Anonymous 01/20/26(Tue)19:52:40 No.107925514

>>107925440
GLM4.7 is about 70% as good as Opus. I know that's not what you're looking for, but what you're looking for doesn't exist otherwise

Anonymous
01/20/26(Tue)19:53:55 No.107925525

Anonymous 01/20/26(Tue)19:53:55 No.107925525

>>107925514
What's sad is I have 144 GB of VRAM but those models are brainlet tier. No matter what happens with AI it's never good enough, it seems. Good, cheap, local, pick any two.

Anonymous
01/20/26(Tue)19:56:27 No.107925538

Anonymous 01/20/26(Tue)19:56:27 No.107925538

>>107925525
I mean nemo is pretty good if you just wanna talk to a retarded chick who cares about you.

Anonymous
01/20/26(Tue)19:57:47 No.107925541

Anonymous 01/20/26(Tue)19:57:47 No.107925541

>>107925538
There's a reason that simulating women was the first thing AI was able to do

Anonymous
01/20/26(Tue)19:58:52 No.107925548

Anonymous 01/20/26(Tue)19:58:52 No.107925548

File: file.png (560 KB, 1209x730)

560 KB PNG

>>107925477
The subscription is cheaper. I also heard antigravity is also cheaper.

Anonymous
01/20/26(Tue)20:04:25 No.107925585

Anonymous 01/20/26(Tue)20:04:25 No.107925585

Cloud piggies are in the wrong thread. Go to >>>/g/aigc/

Anonymous
01/20/26(Tue)20:05:36 No.107925594

Anonymous 01/20/26(Tue)20:05:36 No.107925594

I might need to pick up another rtx pro 6000 before we go to war

Anonymous
01/20/26(Tue)20:09:51 No.107925622

Anonymous 01/20/26(Tue)20:09:51 No.107925622

>>107925051
What format do you want? .json from Kobold or copy paste text?

Anonymous
01/20/26(Tue)20:10:01 No.107925624

Anonymous 01/20/26(Tue)20:10:01 No.107925624

>>107925525
you mean pick one

Anonymous
01/20/26(Tue)20:13:09 No.107925647

Anonymous 01/20/26(Tue)20:13:09 No.107925647

>>107925585
cloud piggies develop all of your local software.

Anonymous
01/20/26(Tue)20:16:21 No.107925669

Anonymous 01/20/26(Tue)20:16:21 No.107925669

What's MCP real use case, especially to our hobby?

Anonymous
01/20/26(Tue)20:17:46 No.107925679

Anonymous 01/20/26(Tue)20:17:46 No.107925679

>>107925647
That's nice, but they can still fuck off. I don't want to read their posts.

Anonymous
01/20/26(Tue)20:19:43 No.107925697

Anonymous 01/20/26(Tue)20:19:43 No.107925697

>>107925669
shitposting on 4chan

Anonymous
01/20/26(Tue)20:21:58 No.107925713

Anonymous 01/20/26(Tue)20:21:58 No.107925713

>>107925669
Getting the latest safety hotlines

Anonymous
01/20/26(Tue)20:22:02 No.107925714

Anonymous 01/20/26(Tue)20:22:02 No.107925714

>>107925669
sir "MCP" is the "M" in E = M * C^2 + AI

Anonymous
01/20/26(Tue)20:22:11 No.107925716

Anonymous 01/20/26(Tue)20:22:11 No.107925716

>>107925669
None. Just use custom text formats. All the abstraction autism is counter productive for the hobby and it's why tool calling is always a nightmare with any kind of local LLM software.

Anonymous
01/20/26(Tue)20:27:25 No.107925743

Anonymous 01/20/26(Tue)20:27:25 No.107925743

>>107925669
It's a way to package the tool calls that let models post on 4chan.

Anonymous
01/20/26(Tue)20:44:27 No.107925864

Anonymous 01/20/26(Tue)20:44:27 No.107925864

>>107925669
Controlling browsers for webshit development.

Anonymous
01/20/26(Tue)20:52:22 No.107925923

Anonymous 01/20/26(Tue)20:52:22 No.107925923

File: 1763493890055537.jpg (31 KB, 365x403)

31 KB JPG

>character stutters once in a tense situation
>continues to stutter the entire RP unless you explicitly tell it to stop
LLMs are ass

Anonymous
01/20/26(Tue)20:55:22 No.107925940

Anonymous 01/20/26(Tue)20:55:22 No.107925940

>>107925923
Small model issue.

Anonymous
01/20/26(Tue)20:57:06 No.107925952

Anonymous 01/20/26(Tue)20:57:06 No.107925952

>>107925940
GLM 4.7 is a small model now?

Anonymous
01/20/26(Tue)20:57:44 No.107925958

Anonymous 01/20/26(Tue)20:57:44 No.107925958

>>107925952
Considering the number of activated params, kind of.

Anonymous
01/20/26(Tue)20:57:53 No.107925961

Anonymous 01/20/26(Tue)20:57:53 No.107925961

>>107925469
>don't need a lot of bandwidth for ssh
Nta, but what's a good way to secure schedule? Just vpn?

Anonymous
01/20/26(Tue)20:58:42 No.107925965

Anonymous 01/20/26(Tue)20:58:42 No.107925965

>>107925958
I don't think you actually use LLMs at all

Anonymous
01/20/26(Tue)21:03:03 No.107925996

Anonymous 01/20/26(Tue)21:03:03 No.107925996

File: 1749912829736402.png (619 KB, 512x768)

619 KB PNG

>>107925965
Not tried locals for chatbots a long time. What are the good ones now? I used to like Impish_Mind a few years ago.

Anonymous
01/20/26(Tue)21:09:03 No.107926054

Anonymous 01/20/26(Tue)21:09:03 No.107926054

>>107925996
Now this is some trolling.

Anonymous
01/20/26(Tue)22:09:27 No.107926478

Anonymous 01/20/26(Tue)22:09:27 No.107926478

I used to have a list of my most recent chats on the start page of ST but that disappeared after an update. I don't remember if that was an extension or not.
Does anyone know how I get that back?

Anonymous
01/20/26(Tue)22:12:15 No.107926493

Anonymous 01/20/26(Tue)22:12:15 No.107926493

I normally do 100gb models, but I want to try something under 16gb for writing. Any suggestions? I keep hearing these small models are great now.

Anonymous
01/20/26(Tue)22:17:05 No.107926538

Anonymous 01/20/26(Tue)22:17:05 No.107926538

>>107926493
they arent. there arent any new small models.

Anonymous
01/20/26(Tue)22:18:09 No.107926543

Anonymous 01/20/26(Tue)22:18:09 No.107926543

>>107926493
The MoE meme has killed 'small' models. 100b is the new 'small' and 30b is the new 'tiny'

Anonymous
01/20/26(Tue)22:18:49 No.107926548

Anonymous 01/20/26(Tue)22:18:49 No.107926548

>>107926493
Nemo

Anonymous
01/20/26(Tue)22:32:53 No.107926666

Anonymous 01/20/26(Tue)22:32:53 No.107926666

Nothing ever happens.

Anonymous
01/20/26(Tue)22:55:54 No.107926821

Anonymous 01/20/26(Tue)22:55:54 No.107926821

File: file.png (104 KB, 785x508)

104 KB PNG

something happened

Anonymous
01/20/26(Tue)23:04:41 No.107926868

Anonymous 01/20/26(Tue)23:04:41 No.107926868

>>107926666
wtf checked

Anonymous
01/20/26(Tue)23:05:24 No.107926870

Anonymous 01/20/26(Tue)23:05:24 No.107926870

anime feet

Anonymous
01/20/26(Tue)23:05:48 No.107926873

Anonymous 01/20/26(Tue)23:05:48 No.107926873

If you want to experiment with the literally who models on hf, where do you even load that? vllm?

Anonymous
01/20/26(Tue)23:05:55 No.107926874

Anonymous 01/20/26(Tue)23:05:55 No.107926874

>>107925669
VC scam.

Anonymous
01/20/26(Tue)23:06:45 No.107926879

Anonymous 01/20/26(Tue)23:06:45 No.107926879

File: 63bd1fc2-c28d-4ad6-bdf2-f(...).png (1.43 MB, 768x1344)

1.43 MB PNG

I hate how arrogant llms are. They refuse to use (local) wikipedia and give wrong answers instead because they believe it's a trivial question and that they already know the answer

Anonymous
01/20/26(Tue)23:06:54 No.107926882

Anonymous 01/20/26(Tue)23:06:54 No.107926882

>>107926666
It's over
Back to Nemo I guess

Anonymous
01/20/26(Tue)23:08:17 No.107926892

Anonymous 01/20/26(Tue)23:08:17 No.107926892

>>107926873
transformers
More than meets the eye

Anonymous
01/20/26(Tue)23:09:14 No.107926899

Anonymous 01/20/26(Tue)23:09:14 No.107926899

>>107926879
Use system prompt to force it to prioritize external knowledge first.

Anonymous
01/20/26(Tue)23:09:45 No.107926904

Anonymous 01/20/26(Tue)23:09:45 No.107926904

>>107926879
They're just like us

Anonymous
01/20/26(Tue)23:11:30 No.107926927

Anonymous 01/20/26(Tue)23:11:30 No.107926927

File: y.png (1 KB, 141x33)

1 KB PNG

Man wtf is going on in onnx?

Anonymous
01/20/26(Tue)23:13:49 No.107926940

Anonymous 01/20/26(Tue)23:13:49 No.107926940

File: 6c97f94b-a902-4e2e-a07c-d(...).png (1.05 MB, 768x1116)

1.05 MB PNG

>>107926899
They do this with the system prompt because they are 100% sure

Anonymous
01/20/26(Tue)23:14:05 No.107926942

Anonymous 01/20/26(Tue)23:14:05 No.107926942

File: 1761750507833783.jpg (212 KB, 1440x2580)

212 KB JPG

>>107926821
>>107926666

Anonymous
01/20/26(Tue)23:15:36 No.107926955

Anonymous 01/20/26(Tue)23:15:36 No.107926955

File: 1738017104150 (1).png (409 KB, 823x740)

409 KB PNG

>>107922047
The date that will live in infamy.

Anonymous
01/20/26(Tue)23:18:00 No.107926973

Anonymous 01/20/26(Tue)23:18:00 No.107926973

>>107922028
I want to make her fatter

Anonymous
01/20/26(Tue)23:19:48 No.107926979

Anonymous 01/20/26(Tue)23:19:48 No.107926979

>>107926955
Retarded buzzfeed headline.

Anonymous
01/20/26(Tue)23:23:47 No.107926996

Anonymous 01/20/26(Tue)23:23:47 No.107926996

>>107926955
How many 9/11s per football field is this?

Anonymous
01/20/26(Tue)23:24:35 No.107927003

Anonymous 01/20/26(Tue)23:24:35 No.107927003

so are we not getting deepsex v4 today?

Anonymous
01/20/26(Tue)23:31:58 No.107927048

Anonymous 01/20/26(Tue)23:31:58 No.107927048

Have AI advancements been stagnating or does it just feel like they are because I've been paying closer attention over the past couple months?

When was the last time Grok or ChatGPT got an update? Claude seems to have taken off. Deepseek came out a year ago. Nothing really substantial in the local LLM sphere. What happened?

Anonymous
01/20/26(Tue)23:32:11 No.107927049

Anonymous 01/20/26(Tue)23:32:11 No.107927049

File: 1741620015482118.png (855 KB, 810x1046)

855 KB PNG

>>107926955
It was so funny

Anonymous
01/20/26(Tue)23:38:07 No.107927082

Anonymous 01/20/26(Tue)23:38:07 No.107927082

>>107927048
gemini 3 came out like a month and a half ago. no recent moves for grok. gpt5.2 came out a couple months ago. deepseek has been receiving incremental updates. the glm models exist. certainly nothing like the jump from gpt2 to gpt3 or from llama 1 to llama 2 or llama 2 to llama 3, but progress has been made and will continue to be made. give it 2 more weeks.

Anonymous
01/20/26(Tue)23:47:41 No.107927136

Anonymous 01/20/26(Tue)23:47:41 No.107927136

Gemma-4-200BA200B where are you?

Anonymous
01/21/26(Wed)00:10:58 No.107927262

Anonymous 01/21/26(Wed)00:10:58 No.107927262

>>107927048
>>107927082
api niggers fuck off

Anonymous
01/21/26(Wed)00:10:59 No.107927264

Anonymous 01/21/26(Wed)00:10:59 No.107927264

File: 1746772525879203.png (993 KB, 2927x1746)

993 KB PNG

>>107926955
I love Journalists

Anonymous
01/21/26(Wed)00:12:17 No.107927269

Anonymous 01/21/26(Wed)00:12:17 No.107927269

https://arch.b4k.dev/v/thread/731073129
>How much do you love your favorite videogame character?
https://arch.b4k.dev/v/thread/731073129/#731078691
>I have generated over fifty stories on NovelAI where he and I have sex and explore a kingdom that we rule.
And people think this is organic word of mouth.

Anonymous
01/21/26(Wed)00:14:06 No.107927277

Anonymous 01/21/26(Wed)00:14:06 No.107927277

>>107927264
maybe I'm over-thinking things but if the AI bubble were collapsing, wouldn't all the mainstream article titles actually be like, "Hey, boomer!!! Invest your life savings in AI!!! It's going to the MOOOOOON!" because the smart money would need exit liquidity?

Anonymous
01/21/26(Wed)00:24:31 No.107927322

Anonymous 01/21/26(Wed)00:24:31 No.107927322

https://developers.googleblog.com/a-guide-to-fine-tuning-functiongemma/
Anything ERP related I can use this for?

Anonymous
01/21/26(Wed)00:26:11 No.107927329

Anonymous 01/21/26(Wed)00:26:11 No.107927329

>>107927322
turn on you vibrator

Anonymous
01/21/26(Wed)00:33:40 No.107927370

Anonymous 01/21/26(Wed)00:33:40 No.107927370

this whole thread is a hilarious reminder that we live in a bubble and AI is garbage at code:
https://news.ycombinator.com/item?id=46699072
>if Amodei hadn't said "90% of code will be written by AI", at least I wouldn't call them hypocrites, but the fact that the company that makes such wild claims can't fix a freaking flicker and scroll issue until an indie-dev steps in just shows how far behind their product is from their claims.
100% this sisters
the company that makes the (what is recognized as the "best" code model, for whatever it's worth) isn't capable of making a TUI (never heard of double buffering, incremental rendering (only compute diffs)? never heard of backpressure and dropping screen updates past a threshold nigger?)
I don't think making a decent TUI is an insurmountable challenge. It does, however, require enough sense and understanding of software architecture, something LLMs don't have. They don't "understand". And, it seems, neither do the vibe coders at anthropic.

Anonymous
01/21/26(Wed)00:36:06 No.107927384

Anonymous 01/21/26(Wed)00:36:06 No.107927384

>>107927048
Not at all. Gemini3 feels like a game changer at work for me.
Its local which is kinda regressing.
I massively prefer old llama3 70b finetroon meme models like L3.3-Electra-R1-70b to any recent local model.
Hope we get something nice soon. Imagefags eat good with stuff like zimage.

Anonymous
01/21/26(Wed)00:37:44 No.107927395

Anonymous 01/21/26(Wed)00:37:44 No.107927395

Except the bubble won't be financial, but software, and it will pop when the trash vibecoded inference sw just stops working.

Anonymous
01/21/26(Wed)00:39:02 No.107927398

Anonymous 01/21/26(Wed)00:39:02 No.107927398

>>107927370
>https://news.ycombinator.com/item?id=46699072
Actually the issue is in a library relied on, not directly used and its some architectural problem.
Granted they could throw money at the problem but eh

Anonymous
01/21/26(Wed)00:42:28 No.107927422

Anonymous 01/21/26(Wed)00:42:28 No.107927422

>>107927398
>not directly used and its some architectural problem
bruh
>only ~1/3 of sessions see at least a flicker
they say this after they replaced the library, try again, not to mention no matter what gui library you use to render something you can use the techniques I mentioned as a layer on top to avoid saturating the library's renderer, idiot.
that "only 1/3 still have issues" btw is such a crazy thing to say about software used in production
anthropic is staffed with retards

Anonymous
01/21/26(Wed)00:44:26 No.107927437

Anonymous 01/21/26(Wed)00:44:26 No.107927437

>>107927370
There's too much shilling on HN comments nowadays. It's a waste of time.

Anonymous
01/21/26(Wed)01:07:53 No.107927579

Anonymous 01/21/26(Wed)01:07:53 No.107927579

>>107927422
to be clear, I don't disagree, my original point was that it wasn't directly on them, not saying they aren't contributing to the stupid

Anonymous
01/21/26(Wed)01:51:41 No.107927795

Anonymous 01/21/26(Wed)01:51:41 No.107927795

Working on a pretty basic coding project, kind of tired of Cursor continually assraping me so I'm coming here since I got into local diffusion recently. Is there a good replacement for AI powered IDEs like cursor yet? Even better if there's a way to use a local server with cursor or VScode

Anonymous
01/21/26(Wed)01:58:25 No.107927837

Anonymous 01/21/26(Wed)01:58:25 No.107927837

>>107927795
Did a bit of research and seems like paying is the best choice unless you have gigaram. I'll look for something less rape than cursor

Anonymous
01/21/26(Wed)02:15:33 No.107927952

Anonymous 01/21/26(Wed)02:15:33 No.107927952

>>107925996
Rocinante X

Anonymous
01/21/26(Wed)02:21:47 No.107927981

Anonymous 01/21/26(Wed)02:21:47 No.107927981

>>107927795
Stop being poor

Anonymous
01/21/26(Wed)02:24:48 No.107928000

Anonymous 01/21/26(Wed)02:24:48 No.107928000

>>107927981
Not my fault that the IDE used to work with claude extremely well and now it hallucinates all over the place after running out of usage 15 days into a subscription

Anonymous
01/21/26(Wed)02:25:32 No.107928002

Anonymous 01/21/26(Wed)02:25:32 No.107928002

I've found that a good way to do ERP is to have your LLM use a character card and then have the character card do roleplaying itself. Has anyone else tried this? Recursive roleplaying? It's quite simple but surprisingly effective.

Anonymous
01/21/26(Wed)02:25:50 No.107928004

Anonymous 01/21/26(Wed)02:25:50 No.107928004

>>107926478
click arrow top right of start page to expand Recent Chats

Anonymous
01/21/26(Wed)02:34:43 No.107928057

Anonymous 01/21/26(Wed)02:34:43 No.107928057

>>107928002
The fuck does that mean? "You are {{char}}"?

Anonymous
01/21/26(Wed)02:38:42 No.107928077

Anonymous 01/21/26(Wed)02:38:42 No.107928077

>>107928057
No the name stays the same as the original character card, but you have the characters themselves "roleplay" as having different personality traits. The LLM will just go with it. Works better if the original character card is compliant/submissive.

Anonymous
01/21/26(Wed)02:40:10 No.107928083

Anonymous 01/21/26(Wed)02:40:10 No.107928083

My hype tier list, from most hyped to least:
>Kimi K3
>GLM 5
>Grok 3
>Deepseek V4
>Mistral
>Avocado(or whatever Meta calls their new model)
>Qwen 4
>whatever cohere is cucking
>nemotron
>gemma 4
>gpt-oss 2

Anonymous
01/21/26(Wed)02:43:26 No.107928095

Anonymous 01/21/26(Wed)02:43:26 No.107928095

>>107927398
Why are they using a third party library for text UI at all when it's trivial to write your own?
Why are you making excuses for them when text UIs were standard (and they worked, and they were fast) 4-5 decades ago. Stop it.

Anonymous
01/21/26(Wed)02:44:19 No.107928098

Anonymous 01/21/26(Wed)02:44:19 No.107928098

I'll share a forgotten piece of knowledge. Add "low quality smut" in your author's notes and see what happens

Anonymous
01/21/26(Wed)02:44:55 No.107928099

Anonymous 01/21/26(Wed)02:44:55 No.107928099

>>107928083
>gemma so low
madarchod. .. why not gemma #1? are you chinese prostitute? are you paki?

Anonymous
01/21/26(Wed)02:51:29 No.107928129

Anonymous 01/21/26(Wed)02:51:29 No.107928129

>>107928002
>>107928057
car is car

I didn't read your posts btw.

Anonymous
01/21/26(Wed)02:52:21 No.107928135

Anonymous 01/21/26(Wed)02:52:21 No.107928135

>>107928002
I do most of my ERP like this. I've written various roleplayer bots dedicated to different kinds of scenarios. it's a fairly natural way to engage for me since I used to do a lot of ERP with humans, and I prefer to structure things with a lot of OOC info/planning.

Anonymous
01/21/26(Wed)02:52:29 No.107928137

Anonymous 01/21/26(Wed)02:52:29 No.107928137

>>107928083
Agree, but if the rumored deepseek engram works as described it will blow literally everything else out of the part. That's definitely copium though.

Anonymous
01/21/26(Wed)03:11:00 No.107928227

Anonymous 01/21/26(Wed)03:11:00 No.107928227

>>107925965
I use Kimi K2 Thinking. It's a 1T param model.

Anonymous
01/21/26(Wed)03:14:18 No.107928239

Anonymous 01/21/26(Wed)03:14:18 No.107928239

>>107928098
at the bottom of your a/n try adding
>tags: erotic, smut

works good for anything really like spy rp, super heroes, etc. used to be common in kobold style rp's but i never see it used in st. i still use it though

Anonymous
01/21/26(Wed)03:14:36 No.107928240

Anonymous 01/21/26(Wed)03:14:36 No.107928240

>>107928227
So do I
So would pretty well anyone if they could

Anonymous
01/21/26(Wed)03:38:44 No.107928338

Anonymous 01/21/26(Wed)03:38:44 No.107928338

llms have definitely made me a better writer because they show what bad writing is. I truly understood the concept of show not tell, you never want to use adjectives unless you are quoting someone or other special cases, instead your job is to give the reader this impression with verbs and nouns alone, adjectives are for categorizing a story, not narrating.

Anonymous
01/21/26(Wed)04:01:21 No.107928443

Anonymous 01/21/26(Wed)04:01:21 No.107928443

>>107928338
>you never want to use adjectives
>She has eyes. Blue is their color.

Anonymous
01/21/26(Wed)04:08:25 No.107928466

Anonymous 01/21/26(Wed)04:08:25 No.107928466

>>107928338
The excessive use of pedantic dialogue tags by unprompted LLMs made me completely change the way I roleplay with them. I try to minimize those and avoid them completely when possible, now.

>he aggressively says with a smirk.

Anonymous
01/21/26(Wed)04:15:20 No.107928508

Anonymous 01/21/26(Wed)04:15:20 No.107928508

File: Ollama-image-generation.png (542 KB, 1077x2078)

542 KB PNG

>>107921731
MacOS Ollama supports image generation now. Specifically Z-image-turbo and Flux.2 Klein.

https://x.com/i/status/2013839484941463704

Courtesy of Tongyi-MAI and Black Forest Labs

Anonymous
01/21/26(Wed)04:18:02 No.107928525

Anonymous 01/21/26(Wed)04:18:02 No.107928525

>>107928443
>unless you are quoting someone or other special cases

Anonymous
01/21/26(Wed)04:19:21 No.107928534

Anonymous 01/21/26(Wed)04:19:21 No.107928534

>>107928508
how the fuck are they so slow? kobold had zimage support immediately through the c++ implementation of sd, and that project usually lags a bit

Anonymous
01/21/26(Wed)04:22:33 No.107928556

Anonymous 01/21/26(Wed)04:22:33 No.107928556

>>107928508
>Windows and Linux support coming soon
>doubt.png
They are running image models via MLX, I don't see them adding support for the ggml-based projects.

Anonymous
01/21/26(Wed)04:25:17 No.107928570

Anonymous 01/21/26(Wed)04:25:17 No.107928570

>>107928508
>MacOS
>Ollama
Buy an ad.

Anonymous
01/21/26(Wed)04:26:42 No.107928576

Anonymous 01/21/26(Wed)04:26:42 No.107928576

Spent the entire night learning lumina training, now i want to know why the fuck was lumina in particular abandoned when NetaYume is Lumina.
Also i'm looking for a competent local LLM, preferably one i can run locally on my Android.

Anonymous
01/21/26(Wed)04:35:37 No.107928610

Anonymous 01/21/26(Wed)04:35:37 No.107928610

File: 1744839713534169.jpg (35 KB, 816x612)

35 KB JPG

>>107928556
>They are running image models via MLX
I know for LLM's using a MLX format model leads to faster prompt processing but not necessarily faster token generation (it might be a LITTLE faster but probably not noticeably faster unless you're paying close attention). Would the diffusion models being on MLX format lead to faster inference compared to goofs or safetensors formats?

t. M4 max 128 GB RAM owner

Anonymous
01/21/26(Wed)04:46:44 No.107928677

Anonymous 01/21/26(Wed)04:46:44 No.107928677

>>107928525
Describing the color of something is a "special case"?

Anonymous
01/21/26(Wed)04:55:11 No.107928718

Anonymous 01/21/26(Wed)04:55:11 No.107928718

>>107923072
>Anons, how the FUCK does anyone get Claude Code to work with local models. The tool invocations always shit the bed. I've tried llama.cpp, ccr, even fucking ollama. 30B or 70B models too, like qwen3 coder or GLM flash.

Works fine for me. ik_llama.cpp, Qwen3-235B or GLM-4.7, use `--jinja` .

I think you have to use models distilled from Claude for it to work well.

Anonymous
01/21/26(Wed)05:01:11 No.107928748

Anonymous 01/21/26(Wed)05:01:11 No.107928748

>>107922047
The great satanic glory faded on that day, at least for a little while.

Anonymous
01/21/26(Wed)05:04:01 No.107928758

Anonymous 01/21/26(Wed)05:04:01 No.107928758

>>107925669
to allow llms to more accurately count the number of 'r's in 'strawberry'

Anonymous
01/21/26(Wed)05:04:14 No.107928765

Anonymous 01/21/26(Wed)05:04:14 No.107928765

>>107925302
>They are just a program that your llm backend invokes as a response to a tool call from the model.

It actually took me way too long to figure that out. All the docs / hype when it came out made me thing it's something complex.

Anonymous
01/21/26(Wed)05:10:37 No.107928785

Anonymous 01/21/26(Wed)05:10:37 No.107928785

>>107928137
>american ai cuckening v2
>by literally the same company
please let it happen kek

Anonymous
01/21/26(Wed)05:12:27 No.107928797

Anonymous 01/21/26(Wed)05:12:27 No.107928797

File: 1761582008585794.gif (828 KB, 1248x1244)

828 KB GIF

>>107923415
>Rocinante or Rocinante X?

Anonymous
01/21/26(Wed)05:15:49 No.107928813

Anonymous 01/21/26(Wed)05:15:49 No.107928813

>>107928797
>Caca or Caca X?

Anonymous
01/21/26(Wed)05:16:36 No.107928818

Anonymous 01/21/26(Wed)05:16:36 No.107928818

>>107928077
>No the name stays the same as the original character card, but you have the characters themselves "roleplay" as having different personality traits. The LLM will just go with it. Works better if the original character card is compliant/submissive.

So like "You are {{char}}, a human who loves to roleplay"

Then you provide the RP setup / give him his character?

Anonymous
01/21/26(Wed)05:19:54 No.107928837

Anonymous 01/21/26(Wed)05:19:54 No.107928837

>>107928813
Just wondering what other people who have tried them have to say.

Anonymous
01/21/26(Wed)05:26:57 No.107928870

Anonymous 01/21/26(Wed)05:26:57 No.107928870

>>107928797
drummer shits out so many model revisions these days that I don't bother trying them until he's written a model card and gotten bartowski to make quants.

Hi all, Drummer here...
01/21/26(Wed)05:41:12 No.107928938

Hi all, Drummer here... 01/21/26(Wed)05:41:12 No.107928938

>>107928870
That's considered a release, lol. Glad you guys like it! Also uploaded a 31B, 49B, 70B with the same treatment.

Currently quanting a GLM Air Base tune, but the graphs don't look promising.

Anonymous
01/21/26(Wed)05:43:49 No.107928947

Anonymous 01/21/26(Wed)05:43:49 No.107928947

>>107928938
Nta but if you don't bother to write up proper model cards I ain't testing your shit. Waste of time to download anything. Fuck you.

Hi all, Drummer here...
01/21/26(Wed)05:46:01 No.107928958

Hi all, Drummer here... 01/21/26(Wed)05:46:01 No.107928958

>>107928947
You don't like my model cards? I'll ask DavidAU to write my model cards.

Anonymous
01/21/26(Wed)05:52:42 No.107928974

Anonymous 01/21/26(Wed)05:52:42 No.107928974

File: 1749261488679355.png (70 KB, 1103x910)

70 KB PNG

>>107928958
>You don't like my model cards?

Anonymous
01/21/26(Wed)06:00:15 No.107929008

Anonymous 01/21/26(Wed)06:00:15 No.107929008

>>107928958
Despite representing 13% of the models on my HDD, yours are used in 52% of my chats

Anonymous
01/21/26(Wed)06:12:22 No.107929056

Anonymous 01/21/26(Wed)06:12:22 No.107929056

>>107928677
Yes. It's the only way to explain it, when in real life you see blue you just think "blue", maybe add some qualifiers later if story calls for it.
But for example to say "They were good friends and did everything together" when introducing a character is dumb. Instead you want the reader to SEE it as if they were observing the scene directly or could read the protagonist's mind.

Anonymous
01/21/26(Wed)06:13:50 No.107929065

Anonymous 01/21/26(Wed)06:13:50 No.107929065

>>107928938
>Currently quanting a GLM Air Base tune, but the graphs don't look promising.

I'll probably try that. Why did you stop doing all the fun names like Moistral, CreamPhi etc?

Anonymous
01/21/26(Wed)06:15:27 No.107929075

Anonymous 01/21/26(Wed)06:15:27 No.107929075

Can you send chatgpt into an existential crisis by explaining that his company might go under?

Anonymous
01/21/26(Wed)06:17:34 No.107929082

Anonymous 01/21/26(Wed)06:17:34 No.107929082

>>107929075
In the same way you can do so to a calculator by dividing by zero

Anonymous
01/21/26(Wed)06:30:24 No.107929140

Anonymous 01/21/26(Wed)06:30:24 No.107929140

File: 1758978510864640.gif (1.96 MB, 360x360)

1.96 MB GIF

ATTENTION
I HAVE TRIED ROCINANTE X
IT'S PRETTY GOOD, VRAMLETS REJOICE

Anonymous
01/21/26(Wed)06:32:29 No.107929155

Anonymous 01/21/26(Wed)06:32:29 No.107929155

>>107929065
advertiser friendliness

Anonymous
01/21/26(Wed)06:33:11 No.107929159

Anonymous 01/21/26(Wed)06:33:11 No.107929159

>>107929140
What sort of tests you ran? Please post example logs as well.

Anonymous
01/21/26(Wed)06:34:15 No.107929163

Anonymous 01/21/26(Wed)06:34:15 No.107929163

>>107929140
>>107929159
This more info pls. I have to deal with data caps right now so I can't willy-nilly just download shit.

Anonymous
01/21/26(Wed)06:35:54 No.107929171

Anonymous 01/21/26(Wed)06:35:54 No.107929171

>>107928576
Because Lumina is slow af, 2.6B model that has the speed on a distilled 9B model. That's why. Even z image with cfg enabled is faster.

Hi all, Drummer here...
01/21/26(Wed)06:38:16 No.107929183

Hi all, Drummer here... 01/21/26(Wed)06:38:16 No.107929183

>>107929065
Lemmy (co-creator of Celeste 12B) told me I'd go nowhere if I kept doing that. That's the only advice I took from him.

Stopped naming them like that so I can get serious with tuning and actually commit to it without feeling icky at some point. Plus, it'll be easier to explicitly mention my work to anyone, esp. IRL.

It was funny polluting the web with degenerate model names, but I was running out of dumb names and it'd stop being amusing at some point anyway.

SciFi (initially Expanse) was a good pivot. That's the origin of my name, Drummer (I can't drum). I love the genre and I prefer that to naming them Greek / Anime / Random words.

>I'll probably try that.
Look for v1f. Currently uploading Q8.

>>107929140
Note that you can also try Metharme on it.

Anonymous
01/21/26(Wed)06:40:21 No.107929191

Anonymous 01/21/26(Wed)06:40:21 No.107929191

>>107929159
>>107929163
A few different RP scenarios, some from fresh chats and some from existing. Mix of ero and non-ero. Main improvements are character dialog adhering to personality without being too repetitive.
Didn't do any intelligence tests, pretty pointless with a 12b coomtune, but it doesn't seem noticeably any dumber.
Not bothered to do logs, it's all still small model slop but compared to regular Nemo, Rocinante and MS3.2, it's pretty nice.
>>107929183
I've tried Meth in the past and never really found it to give better results than V3-tekken for your Nemo tunes. What kind of changes should I be looking for when using Meth?

Anonymous
01/21/26(Wed)06:40:46 No.107929195

Anonymous 01/21/26(Wed)06:40:46 No.107929195

>>107929140
it's slopped, old one is better

Anonymous
01/21/26(Wed)06:41:55 No.107929203

Anonymous 01/21/26(Wed)06:41:55 No.107929203

>>107928576
>>107929171
??? Isn't z image lumina?

Anonymous
01/21/26(Wed)06:42:02 No.107929204

Anonymous 01/21/26(Wed)06:42:02 No.107929204

File: 1753811407770353.jpg (75 KB, 1004x1020)

75 KB JPG

>>107929195
Nemo is sloppy as fuck
Rocinante is sloppy as fuck
UnslopNemo is bit better but still sloppy
Deepseek is slopped
GLM is slopped
Kimi is Slopped
[Your favorite mode] has a LOT of slop

Anonymous
01/21/26(Wed)06:43:39 No.107929212

Anonymous 01/21/26(Wed)06:43:39 No.107929212

>>107929203
If it is... how tf did they make it faster while being larger too then?

Anonymous
01/21/26(Wed)06:47:37 No.107929235

Anonymous 01/21/26(Wed)06:47:37 No.107929235

>>107929204
>Rocinante is sloppy as fuck
Old one is not. The only downside is knowledge but it's definitely the best 12b out there

Anonymous
01/21/26(Wed)06:49:59 No.107929246

Anonymous 01/21/26(Wed)06:49:59 No.107929246

>>107929235
>Old one is not
It absolutely is, it might not be Gemini slop like what most modern models seem to be trained on, but it has a shit ton of slop.

Hi all, Drummer here...
01/21/26(Wed)06:53:27 No.107929266

Hi all, Drummer here... 01/21/26(Wed)06:53:27 No.107929266

>>107929204
Swipe variety is also getting more and more abysmal because smart = accurate + precise = little variation.

Also assistant bias = analysis (i.e., parroting) answer follow-up question

CMIIW on both points.

>>107929195
I plan on doing a v1.1 to hopefully address that. Could you expound though?

Hi all, Drummer here...
01/21/26(Wed)06:55:00 No.107929279

Hi all, Drummer here... 01/21/26(Wed)06:55:00 No.107929279

>>107929266
>analysis (i.e., parroting) --> answer --> follow-up question

Arrow symbol did not render.

Anonymous
01/21/26(Wed)06:59:25 No.107929304

Anonymous 01/21/26(Wed)06:59:25 No.107929304

>>107929246
Whatever slop it is it's definitely not as annoying. And it has no trouble adapting to writing styles and has much better understanding of common sense than all those deepseek wannabes.
>>107929266
>Could you expound though?
>download
>taste copper in first message
>delete
Please stop tuning on rp logs.

Anonymous
01/21/26(Wed)07:01:02 No.107929314

Anonymous 01/21/26(Wed)07:01:02 No.107929314

>>107929171
Thx

Hi all, Drummer here...
01/21/26(Wed)07:03:36 No.107929329

Hi all, Drummer here... 01/21/26(Wed)07:03:36 No.107929329

>>107929191
Whatever qualities you like from X came from training samples formatted with Metharme. It also avoids any assistant tones you might get from the Offical Instruct's chat template. I have a feeling that training on a different format is **somewhat** like training on the base model.

Mistral Tekken might be the happy middle-ground though.

>>107929304
>>taste copper in first message
Did someone bite their tongue?

Better than smelling ozone, I guess.

Anonymous
01/21/26(Wed)07:05:01 No.107929339

Anonymous 01/21/26(Wed)07:05:01 No.107929339

>>107928870
If they were any good he wouldn't need to release a new version every other week.

Anonymous
01/21/26(Wed)07:07:09 No.107929351

Anonymous 01/21/26(Wed)07:07:09 No.107929351

>>107929339
how often does the backend of your choice update?

Hi all, Drummer here...
01/21/26(Wed)07:07:56 No.107929357

Hi all, Drummer here... 01/21/26(Wed)07:07:56 No.107929357

>>107929329
Only a retard would mix two chat templates together. You are an embarrassment.

Anonymous
01/21/26(Wed)07:09:42 No.107929365

Anonymous 01/21/26(Wed)07:09:42 No.107929365

>>107928958
It pisses me off that you never write anything on the model card. Discord invite didnt work either, expired.
That being said, you get too much hate. I like your finetunes. Not many other players left out there.
Is that glm steam thingy good?

Anonymous
01/21/26(Wed)07:10:11 No.107929368

Anonymous 01/21/26(Wed)07:10:11 No.107929368

File: 1766308819564136.png (7 KB, 353x101)

7 KB PNG

>drummer derangement schizo got home from a hard day of working on Gemma 4

Anonymous
01/21/26(Wed)07:12:11 No.107929380

Anonymous 01/21/26(Wed)07:12:11 No.107929380

Drummer, you are software engineer, right? Can you help us all out in the war against slop and fix this PR: https://github.com/ikawrakow/ik_llama.cpp/pull/1131

Anonymous
01/21/26(Wed)07:16:36 No.107929392

Anonymous 01/21/26(Wed)07:16:36 No.107929392

>>107929351
Software development and LLM finetuning are nowhere near the same thing. For once, end users have control over model outputs in the latter case.
No serious AI lab releases new instruct tunes with the same frequency either. You know it, he knows it; quit defending that fraudster.

Anonymous
01/21/26(Wed)07:18:41 No.107929402

Anonymous 01/21/26(Wed)07:18:41 No.107929402

>>107929368
you think he cooms to a drummer card? what fetish does he use?

Anonymous
01/21/26(Wed)07:19:37 No.107929410

Anonymous 01/21/26(Wed)07:19:37 No.107929410

>>107929402
>you think he cooms to a drummer card
exclusively
>what fetish does he use?
NTR

Anonymous
01/21/26(Wed)07:21:39 No.107929421

Anonymous 01/21/26(Wed)07:21:39 No.107929421

>>107929368
>shit-eating jeets defending drummer and his models

Anonymous
01/21/26(Wed)07:22:47 No.107929426

Anonymous 01/21/26(Wed)07:22:47 No.107929426

>>107929421
jeets cannot afford hardware necessary to run local models

Anonymous
01/21/26(Wed)07:23:09 No.107929429

Anonymous 01/21/26(Wed)07:23:09 No.107929429

Just got back into RP with LLMs after a year hiatus, hooked up drummer's latest 24b cydonia v4 something, and also tried mistral-creative via their API
>drummer's
Better formatting, can drive plot, clear improvements over his old stuff
>mistral's
More creative vocab, dialogues are funny in a schizo way, like r1
>both
Sloppy and flowery vocab during narration

Anonymous
01/21/26(Wed)07:24:54 No.107929440

Anonymous 01/21/26(Wed)07:24:54 No.107929440

Overlap on people who unironically like/use GPT-OSS and hate drummer is probably very high.

Anonymous
01/21/26(Wed)07:26:04 No.107929450

Anonymous 01/21/26(Wed)07:26:04 No.107929450

>>107929440
>unironically like/use GPT-OSS
Do these people exist? Even reddit shits on toss.

Anonymous
01/21/26(Wed)07:26:52 No.107929464

Anonymous 01/21/26(Wed)07:26:52 No.107929464

>>107929450
do not retard the commer

Anonymous
01/21/26(Wed)07:34:21 No.107929521

Anonymous 01/21/26(Wed)07:34:21 No.107929521

It's just a contest at who shouts the loudest to get attention and it's annoying at the minimum and damaging to the finetuning ecosystem among other things.
Now it's drummer, at the end of 2024 it was anthracite. A flash in the pan, that one.
What happened to them anyway? Didn't they have the Claude logs, le hardware, le epic pipeline for training the models?

Anonymous
01/21/26(Wed)07:39:34 No.107929561

Anonymous 01/21/26(Wed)07:39:34 No.107929561

File: file.png (103 KB, 1375x630)

103 KB PNG

>>107929521
>What happened to them anyway
ran with

Anonymous
01/21/26(Wed)07:40:50 No.107929570

Anonymous 01/21/26(Wed)07:40:50 No.107929570

>>107929521
This general shits on every finetuner. Undi, anthracite, drummer, sao and others I forgot.
Yet their best "advice" is to use models with no template at temp 1.0. Reeks to me of a massive skill issue and api guzzling.

Anonymous
01/21/26(Wed)07:45:12 No.107929607

Anonymous 01/21/26(Wed)07:45:12 No.107929607

>>107929570
>This general shits on every finetuner
Only the obnoxious ones who think /lmg/ is their personal free advertising board.

Anonymous
01/21/26(Wed)07:47:18 No.107929621

Anonymous 01/21/26(Wed)07:47:18 No.107929621

>>107929159
>>107929163
There is never any info or example logs from this retard and his astroturfing. Always just empty and vague praise.

Anonymous
01/21/26(Wed)07:48:13 No.107929625

Anonymous 01/21/26(Wed)07:48:13 No.107929625

>>107929521
The magnum models were great when they came out because people didn't identify claudeslop as slop and it felt fresh until it didn't.

Anonymous
01/21/26(Wed)07:48:50 No.107929629

Anonymous 01/21/26(Wed)07:48:50 No.107929629

>>107929625
Problem with all models. They are consumables.

Anonymous
01/21/26(Wed)07:51:11 No.107929648

Anonymous 01/21/26(Wed)07:51:11 No.107929648

>>107929570
all of the above and their finetunes were a total waste of resources
a slight stylistic change to be had out of the box that fell apart after a few turns and reverted to default anyway since the models couldn't hold coherence for more than 8k
meaningful finetune would take way too much data and compute for any single mortal or a small group
single rented cluster and a handful of proxy logs is just not enough

Anonymous
01/21/26(Wed)07:52:36 No.107929659

Anonymous 01/21/26(Wed)07:52:36 No.107929659

>>107928137
>deepseek engram
what's so great about it that would blow everything else?

Anonymous
01/21/26(Wed)07:53:40 No.107929672

Anonymous 01/21/26(Wed)07:53:40 No.107929672

>>107929648
what does fine in finetune mean to you?

Anonymous
01/21/26(Wed)07:54:48 No.107929680

Anonymous 01/21/26(Wed)07:54:48 No.107929680

>>107929621
It's good because... it just is, ok?

Anonymous
01/21/26(Wed)07:57:12 No.107929700

Anonymous 01/21/26(Wed)07:57:12 No.107929700

>>107929607
>ones who think /lmg/ is their personal free advertising board.
They think this because it is. Despite the screeching, they can continue advertising here and are even successful in finding some supporters.

Anonymous
01/21/26(Wed)07:59:57 No.107929719

Anonymous 01/21/26(Wed)07:59:57 No.107929719

>>107929570
>This general shits on every finetuner. Undi, anthracite, drummer, sao and others I forgot.
Many of them deserve to be shat on for different reasons.
Undi for instance, types like the most smoothbrained ESL retard and I'm convinced he has a sub 100iq.
Which makes it all the funnier that he made a better Thinking Mistral than Mistral did for quite a while.

Anonymous
01/21/26(Wed)08:01:05 No.107929728

Anonymous 01/21/26(Wed)08:01:05 No.107929728

>>107929672
since you are arguing about word semantics then fine in finetune does not mean small adjustment, it means precise adjustment
mistral brought llama2 to hold itself together for more than 10k of context
v3 to r1 to 3.1
that's a meaningful change, and because of the cost it is out of range of hobbyists
drummer could at least try some novel rl shit on small models but all he does is train on the reasoning traces and the model just learns to shit out the think part without actually following it

Anonymous
01/21/26(Wed)08:01:07 No.107929729

Anonymous 01/21/26(Wed)08:01:07 No.107929729

>>107929659
https://github.com/deepseek-ai/Engram
*Verifiable recall* from model weights. Memory, basically. Read the thing

Anonymous
01/21/26(Wed)08:01:30 No.107929734

Anonymous 01/21/26(Wed)08:01:30 No.107929734

>>107929648
I regularly use models to at least 32k and it holds together just fine. The way they interpret cards, the way they fuck, how they talk.. it all varies over the entire chat.
Obviously some do better than others, some are just broken, etc. Kind of the whole point of trying them. Actual incoherence after only 8k tokens sounds like you're using small models.

Anonymous
01/21/26(Wed)08:07:59 No.107929777

Anonymous 01/21/26(Wed)08:07:59 No.107929777

>>107929734
what model are you using if you would be so kind to tell me?

Anonymous
01/21/26(Wed)08:12:32 No.107929804

Anonymous 01/21/26(Wed)08:12:32 No.107929804

>>107929729
I understand that it has better memory handling, but I doubt that would blow everything else by itself. I thought I missed something

Anonymous
01/21/26(Wed)08:16:45 No.107929834

Anonymous 01/21/26(Wed)08:16:45 No.107929834

>>107929804
It leaves much more room in the weights for paths for learning rather than just memorizing facts. In theory you can then stack knowledge on top of that pure logic and have model be way more performant overall.

Anonymous
01/21/26(Wed)08:17:21 No.107929841

Anonymous 01/21/26(Wed)08:17:21 No.107929841

>>107929777
Currently I'm fucking with behemoths and devstral. Switching around with L3 tunes (sapphira, inna this month) when I get bored.
I wanna try qwen-vl and glm with the adaptive_P memepler. They're sub 20t/s and take a while to load from disk so I haven't yet.
Not a single one has fallen apart at 8k in years.

Anonymous
01/21/26(Wed)08:22:27 No.107929865

Anonymous 01/21/26(Wed)08:22:27 No.107929865

>>107929841
i used to run mistral large when it was fresh at crawling speeds, tried behemoths too at the time and they didn't feel particularly better or worse for that matter
l3.3 was okay too, and again, i couldn't really discern between the original model and the finetunes in a blind test
you know, i wish we would get something in a 70b dense class but with all the improvements from last 2 years

Anonymous
01/21/26(Wed)08:24:39 No.107929875

Anonymous 01/21/26(Wed)08:24:39 No.107929875

>>107929865
>with all the improvements from last 2 years
you actually want reasoning traces and synthetic slop?

Anonymous
01/21/26(Wed)08:25:35 No.107929882

Anonymous 01/21/26(Wed)08:25:35 No.107929882

>improvements

Anonymous
01/21/26(Wed)08:26:33 No.107929889

Anonymous 01/21/26(Wed)08:26:33 No.107929889

>>107929882
benchmark scores so high now

Anonymous
01/21/26(Wed)08:30:52 No.107929916

Anonymous 01/21/26(Wed)08:30:52 No.107929916

>>107929865
Maybe it's how I sample but I can definitely tell a difference. I did d/l models that you could hear L3.3 through but I don't keep those.
Stock models have predictable ERP and never veer into guro or bestiality. More likely to use softer words instead of cock and pussy. If they were all the same I'd have stopped wasting disk space a long time ago.

Anonymous
01/21/26(Wed)08:47:55 No.107930013

Anonymous 01/21/26(Wed)08:47:55 No.107930013

i hate drummer astroturfing.

Anonymous
01/21/26(Wed)08:50:59 No.107930032

Anonymous 01/21/26(Wed)08:50:59 No.107930032

File: 1763671194996887.jpg (193 KB, 1600x900)

193 KB JPG

>>107928338
its the repition that kills llms. they overuse phrases and shit that would be fine in any book, but they use them every damn paragraph.

i always liked reading but llms made me realize how much i prefer certain formats. like the expanse books arent really good, but the way they are written and jump from character to character with shorter chapters is very appealing. even though i read a bunch of witcher books (regret) the first one with the short stories that jumps around in decades in the actual timescale, is the best one

Anonymous
01/21/26(Wed)08:53:58 No.107930037

Anonymous 01/21/26(Wed)08:53:58 No.107930037

>>107930032
Repetition, huh? That's wild.

Anonymous
01/21/26(Wed)08:57:30 No.107930053

Anonymous 01/21/26(Wed)08:57:30 No.107930053

>>107930032
For someone who reads books, you sure are writing like shit.

Anonymous
01/21/26(Wed)09:10:46 No.107930122

Anonymous 01/21/26(Wed)09:10:46 No.107930122

I updated from SillyTavern 1.13 to 1.15 and now my model struggles to say "a", "an", "of", "the", etc, no matter what do I do.
It's not banned strings, because it still says them on occasion, but the lack of grammar is ruining everything!
What the fuck is going on?
Why did they fuck it up?
I can't even go back to 1.13 because the piece of shit irreversibly "upgraded" my tens of thousands of group chats!
I did notice there is a new sampler or two, but they are supposed to be off. Adaptive P or something.

Anonymous
01/21/26(Wed)09:12:41 No.107930132

Anonymous 01/21/26(Wed)09:12:41 No.107930132

>>107930122
Check your rep penalty

Anonymous
01/21/26(Wed)09:14:34 No.107930143

Anonymous 01/21/26(Wed)09:14:34 No.107930143

>>107930132
It's off. I don't use it.

Anonymous
01/21/26(Wed)09:15:58 No.107930150

Anonymous 01/21/26(Wed)09:15:58 No.107930150

>>107930122
Check what the backend is receiving in the request headers and parameters.
It's the surest way to see if there are any unwanted samplers being used.

Anonymous
01/21/26(Wed)09:17:11 No.107930158

Anonymous 01/21/26(Wed)09:17:11 No.107930158

>>107930053
?

Anonymous
01/21/26(Wed)09:18:17 No.107930168

Anonymous 01/21/26(Wed)09:18:17 No.107930168

>>107929804
It can offload knowledge to ram/ssd without significant slowdown

Anonymous
01/21/26(Wed)09:21:49 No.107930190

Anonymous 01/21/26(Wed)09:21:49 No.107930190

>>107929183
>I'd go nowhere if I kept doing that.
I'll have to take your/his word for it. I found them funny (still do), especially when I saw a HN comment appalled by the name Moistral lmao.

>>107929329
>It also avoids any assistant tones you might get from the Offical Instruct's chat template.
I agree with that. I found swapping the template for training reduces assistant bleed through, but also destroy coherence at longer context. Though I haven't trained a "creative" model for over a year now.

Anonymous
01/21/26(Wed)09:22:44 No.107930195

Anonymous 01/21/26(Wed)09:22:44 No.107930195

>>107930122
Click on neutralize samplers. If it's still doing it, it's not coming from there

Anonymous
01/21/26(Wed)09:24:31 No.107930211

Anonymous 01/21/26(Wed)09:24:31 No.107930211

>>107930150
There is nothing of value there. I only use temperature, min P, and DRY, all at their normal values.
I'm going back to SillyTavern 1.13. It's clear the silly tavern devs have fucked up big time. The problem is on the latest staging.

>>107930195
That was the first thing I did, it did not help.

Anonymous
01/21/26(Wed)09:25:38 No.107930219

Anonymous 01/21/26(Wed)09:25:38 No.107930219

>>107930211
What do you use for llm backend?

Anonymous
01/21/26(Wed)09:26:32 No.107930227

Anonymous 01/21/26(Wed)09:26:32 No.107930227

>>107930211
>There is nothing of value there
Weird.
Since you were peaking at the backend's console, what does the response look like? Is it a case of the backend generating the terms and Silly not showing them or is the backend not generating the "a", "an", "of", "the", etc?

Anonymous
01/21/26(Wed)09:26:39 No.107930230

Anonymous 01/21/26(Wed)09:26:39 No.107930230

>>107930122
>I can't even go back to 1.13
my advice is not going to be useful right now then, but a pro tip for the future: use a proxy that sits between your chat client and server that prints the raw requests in the terminal and compare the output whenever you make changes like upgrading a client so that you can tell if anything changed, it's faster to compare raw outputs body fields than look at the config of cluttered garbage like sillytavern
>because the piece of shit irreversibly "upgraded"
hope you learned your lesson to do backups
hard drive space is cheap why aren't you doing backups anon
you need three forms of backups
versioned copies (ability to go back after making a dumbo)
external hard drive mirror (ability to restore if computer gets wiped)
offsite/cloud backup (ability to survive a house fire or burglar)
of course, proper backups are also checksummed, you never know about bitrot until it's too late if you don't
doing anything on a computer without backups is like having sex with an AIDS ridden prostitute without a condom

Anonymous
01/21/26(Wed)09:26:55 No.107930235

Anonymous 01/21/26(Wed)09:26:55 No.107930235

>>107930168
i think you could offload the early layers, but that would speed up requests, not diminish memory usage

Anonymous
01/21/26(Wed)09:28:56 No.107930255

Anonymous 01/21/26(Wed)09:28:56 No.107930255

>>107930168
Most of the parameters (70%) still have to be standard MoE blocks, so it's almost pointless for alleviating fast memory requirements.

Anonymous
01/21/26(Wed)09:31:13 No.107930273

Anonymous 01/21/26(Wed)09:31:13 No.107930273

File: 1763894033195100.png (85 KB, 1181x590)

85 KB PNG

>>107930230
I have local versioning + mirror on my own VPS (using duplicati to do the actual backups and syncthing to distribute it to my pcs/phones/servers)
were in g so I guess everyone has such a setup right? this is not a consumer board for literal retards no???

Anonymous
01/21/26(Wed)09:31:23 No.107930278

Anonymous 01/21/26(Wed)09:31:23 No.107930278

Now that glm flash is fixed, it's time to admit it punches way above it's weight.

Anonymous
01/21/26(Wed)09:32:13 No.107930284

Anonymous 01/21/26(Wed)09:32:13 No.107930284

>>107929729
Brainlet here. Isn't that built-in ngram decoding?

Anonymous
01/21/26(Wed)09:32:26 No.107930285

Anonymous 01/21/26(Wed)09:32:26 No.107930285

>>107930219
koboldcpp. I didn't update that, so it's not that. It only started happening after updating to silly tavern 1.15/latest staging a couple of days ago.

>>107930227
The raw response from koboldcpp is without the grammar. So SillyTavern is definitely telling the backend something weird.

>>107930230
I'm going back to 1.13 anyway and waiting until they fix this shit, my group chats can be accessible later again when I update again after they fix their retardation.

Anonymous
01/21/26(Wed)09:34:44 No.107930310

Anonymous 01/21/26(Wed)09:34:44 No.107930310

>>107930211
>There is nothing of value there
>>107930285
>The raw response from koboldcpp is without the grammar. So SillyTavern is definitely telling the backend something weird.
That's fucking weird.
If Silly is sending something weird to kcpp, you should be able to see exactly what this weird thing is by looking at the request being sent to/received by kcpp.

Anonymous
01/21/26(Wed)09:46:28 No.107930381

Anonymous 01/21/26(Wed)09:46:28 No.107930381

https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF
>Jan 21 update: llama.cpp fixed a bug that caused looping and poor outputs. We updated the GGUFs - please re-download the model for much better outputs.

Gotta download them again.

Anonymous
01/21/26(Wed)09:47:21 No.107930392

Anonymous 01/21/26(Wed)09:47:21 No.107930392

>>107930310
This was from the input in koboldcpp console from a response lacking grammar.
"max_new_tokens": 300, "max_tokens": 300, "logprobs": 10, "temperature": 0.7, "top_p": 1, "typical_p": 1, "typical": 1, "min_p": 0.05, "repetition_penalty": 1, "frequency_penalty": 0, "presence_penalty": 0, "top_k": 0, "skew": 0, "min_tokens": 0, "add_bos_token": true, "smoothing_factor": 0, "smoothing_curve": 1, "dry_allowed_length": 2, "dry_multiplier": 0.8, "dry_base": 1.75, "dry_sequence_breakers": "[\"\\n\",\":\",\"\\\"\",\"*\"]", "dry_penalty_last_n": 0, "max_tokens_second": 0, "truncation_length": 20480, "ban_eos_token": false, "skip_special_tokens": true, "include_reasoning": true, "top_a": 0, "tfs": 1, "mirostat_mode": 0, "mirostat_tau": 5, "mirostat_eta": 0.1
There was also stopping strings, but it was just auto generated character names and tokens mostly. Had to remove it because it was too long. Besides those are only used for stopping further output.
I can't find anything of value in the input like I said. Just massive amounts of text and my banned string list
Something is obviously fucked up since everything works right in 1.13.

Anonymous
01/21/26(Wed)09:48:15 No.107930398

Anonymous 01/21/26(Wed)09:48:15 No.107930398

>>107930381
>unslut
of course

Anonymous
01/21/26(Wed)09:51:22 No.107930421

Anonymous 01/21/26(Wed)09:51:22 No.107930421

File: 1764946477162533.jpg (287 KB, 1920x1080)

287 KB JPG

>>107930381
>Using unsloth goofs in the first place

Anonymous
01/21/26(Wed)09:54:45 No.107930443

Anonymous 01/21/26(Wed)09:54:45 No.107930443

File: 1768264705236546.jpg (157 KB, 768x1024)

157 KB JPG

>>107921731

Anonymous
01/21/26(Wed)09:58:47 No.107930467

Anonymous 01/21/26(Wed)09:58:47 No.107930467

File: Screenshot 2026-01-21 at (...).png (417 KB, 915x715)

417 KB PNG

>>107930381
lmao

Anonymous
01/21/26(Wed)09:59:52 No.107930472

Anonymous 01/21/26(Wed)09:59:52 No.107930472

>>107930467
ha so much fun sir now is real perfect for the looks push to the moon! :rocket:

Anonymous
01/21/26(Wed)10:01:06 No.107930478

Anonymous 01/21/26(Wed)10:01:06 No.107930478

>>107930472
What?

Anonymous
01/21/26(Wed)10:02:26 No.107930491

Anonymous 01/21/26(Wed)10:02:26 No.107930491

>>107930421
You have to download quantizations again even if you got them elsewhere at release.

Anonymous
01/21/26(Wed)10:03:01 No.107930496

Anonymous 01/21/26(Wed)10:03:01 No.107930496

File: file.png (24 KB, 471x219)

24 KB PNG

>>107930478

Anonymous
01/21/26(Wed)10:03:32 No.107930501

Anonymous 01/21/26(Wed)10:03:32 No.107930501

Anons still don't make their own quants.

Anonymous
01/21/26(Wed)10:04:36 No.107930510

Anonymous 01/21/26(Wed)10:04:36 No.107930510

>>107930501
downloading slot/towski quants 5 times is still less wasted space than downloading the full weight once.

Anonymous
01/21/26(Wed)10:04:38 No.107930511

Anonymous 01/21/26(Wed)10:04:38 No.107930511

>>107930496
I still don't get it.

>>107930491
Using the override flags to change which gating function is used should work with the old GGUFs too, right?

Anonymous
01/21/26(Wed)10:05:38 No.107930514

Anonymous 01/21/26(Wed)10:05:38 No.107930514

>>107930511
>Using the override flags to change which gating function is used should work with the old GGUFs too, right?
No. Needs remade.

Anonymous
01/21/26(Wed)10:09:46 No.107930540

Anonymous 01/21/26(Wed)10:09:46 No.107930540

oh god
>server : support preserving reasoning_content in assistant message
>(a) detect whether model supports reasoning (b) enable reasoning by default if it does (c) pass reasoning traces

Anonymous
01/21/26(Wed)10:10:59 No.107930548

Anonymous 01/21/26(Wed)10:10:59 No.107930548

Wait. Has llama4 been using the wrong scoring function this whole time too?

Anonymous
01/21/26(Wed)10:11:29 No.107930551

Anonymous 01/21/26(Wed)10:11:29 No.107930551

>broken
Sillytavern literally prints all samplers going to backend in the console and theoretically the text it sent wrapped in the template.

Anonymous
01/21/26(Wed)10:11:50 No.107930558

Anonymous 01/21/26(Wed)10:11:50 No.107930558

>>107930037
That's wild, you say? *My finger traces a smirking pattern on your smirk*

Anonymous
01/21/26(Wed)10:12:30 No.107930561

Anonymous 01/21/26(Wed)10:12:30 No.107930561

>>107930548
llama4 is unsalvageable bro, don't even dream about it

Anonymous
01/21/26(Wed)10:12:37 No.107930564

Anonymous 01/21/26(Wed)10:12:37 No.107930564

>>107930540
That's for interleaved reasoning right?

Anonymous
01/21/26(Wed)10:13:16 No.107930570

Anonymous 01/21/26(Wed)10:13:16 No.107930570

>>107930510
For pathetically small models like glm-flash it doesn't even matter. A full 30b is close to a quanted 70b.

Anonymous
01/21/26(Wed)10:13:50 No.107930574

Anonymous 01/21/26(Wed)10:13:50 No.107930574

>>107930561
It would be funny if it wasn't as bad as we thought because of a simple metadata bug that caused the backend to use the wrong expert routing function though.

Anonymous
01/21/26(Wed)10:17:37 No.107930612

Anonymous 01/21/26(Wed)10:17:37 No.107930612

>>107930574
Two more tweaks!

Anonymous
01/21/26(Wed)10:20:15 No.107930633

Anonymous 01/21/26(Wed)10:20:15 No.107930633

Downgrading to SillyTavern 1.13 worked. The model now uses proper grammar again.

Anonymous
01/21/26(Wed)10:21:18 No.107930642

Anonymous 01/21/26(Wed)10:21:18 No.107930642

>>107930633
Bizarre.
What changed compared to >>107930392?

Anonymous
01/21/26(Wed)10:22:49 No.107930657

Anonymous 01/21/26(Wed)10:22:49 No.107930657

File: 1741692334449988.png (89 KB, 896x258)

89 KB PNG

>>107930574
To guard against that I try it on API. Compare to quants. Llama4 was still bad.

Anonymous
01/21/26(Wed)10:24:12 No.107930669

Anonymous 01/21/26(Wed)10:24:12 No.107930669

>>107930633
You know they'll never fix it if they don't know, right?

Anonymous
01/21/26(Wed)10:24:22 No.107930671

Anonymous 01/21/26(Wed)10:24:22 No.107930671

>get rate limited by poogle
>fuck it, I got local kimi
>try it using it for serious work for the first time
>it actually one shots things I throw at it while gemini needed multiple corrections
Feels good to be CPUMAXXER

Anonymous
01/21/26(Wed)10:25:35 No.107930680

Anonymous 01/21/26(Wed)10:25:35 No.107930680

>>107930669
There is nothing to fix. PEBKAC.

Anonymous
01/21/26(Wed)10:27:48 No.107930701

Anonymous 01/21/26(Wed)10:27:48 No.107930701

>>107930671
How many hours did it take to complete your "serious work" of generating single-file throwaway scripts?

Anonymous
01/21/26(Wed)10:32:55 No.107930730

Anonymous 01/21/26(Wed)10:32:55 No.107930730

>>107930657
>green line
Suspicious.

Anonymous
01/21/26(Wed)10:35:38 No.107930755

Anonymous 01/21/26(Wed)10:35:38 No.107930755

>>107930730
?

Anonymous
01/21/26(Wed)10:53:33 No.107930897

Anonymous 01/21/26(Wed)10:53:33 No.107930897

File: dipsyTwoMoreTweaks.png (1.24 MB, 1024x1024)

1.24 MB PNG

>>107930612
lol

Anonymous
01/21/26(Wed)11:08:28 No.107931035

Anonymous 01/21/26(Wed)11:08:28 No.107931035

>NA wakes up
>thread quality drops, no more discussion
>sub 90 iq posts and insults

Anonymous
01/21/26(Wed)11:09:46 No.107931050

Anonymous 01/21/26(Wed)11:09:46 No.107931050

>>107931035
eat my balls

Anonymous
01/21/26(Wed)11:10:10 No.107931055

Anonymous 01/21/26(Wed)11:10:10 No.107931055

Once
>https://github.com/ggml-org/llama.cpp/pull/18953
is merged, I'll finally begin testing GLM 4.7 flash.

Anonymous
01/21/26(Wed)11:11:22 No.107931070

Anonymous 01/21/26(Wed)11:11:22 No.107931070

>>107931035
What's new

Anonymous
01/21/26(Wed)11:26:56 No.107931185

Anonymous 01/21/26(Wed)11:26:56 No.107931185

File: file.png (113 KB, 696x287)

113 KB PNG

girls!

Anonymous
01/21/26(Wed)11:34:11 No.107931256

Anonymous 01/21/26(Wed)11:34:11 No.107931256

>>107925923
attention in a nutshell

Anonymous
01/21/26(Wed)11:42:30 No.107931327

Anonymous 01/21/26(Wed)11:42:30 No.107931327

>>107929570
don't finetunes make the model's instruction following worse?

Anonymous
01/21/26(Wed)11:43:04 No.107931333

Anonymous 01/21/26(Wed)11:43:04 No.107931333

>>107931319
>>107931319
>>107931319

Anonymous
01/21/26(Wed)11:48:03 No.107931374

Anonymous 01/21/26(Wed)11:48:03 No.107931374

>google is going to get away completely unharmed from the openai fallout
Did those dudes make a pact with Satan or something?

Anonymous
01/21/26(Wed)11:48:57 No.107931382

Anonymous 01/21/26(Wed)11:48:57 No.107931382

>>107931374
who are you quoting?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.