/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 03/16/26(Mon)22:36:19 No.108389142

File: 1771589822504492.png (398 KB, 1999x1471)

398 KB PNG

/lmg/ - Local Models General Anonymous 03/16/26(Mon)22:36:19 No.108389142 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108386516

►News
>(03/16) Mistral 4 small releasing: https://huggingface.co/collections/mistralai/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/16/26(Mon)22:37:51 No.108389153

Anonymous 03/16/26(Mon)22:37:51 No.108389153

►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

Anonymous
03/16/26(Mon)22:39:21 No.108389164

Anonymous 03/16/26(Mon)22:39:21 No.108389164

>>108389153
how do i use ai

Anonymous
03/16/26(Mon)22:40:24 No.108389174

Anonymous 03/16/26(Mon)22:40:24 No.108389174

I wish I could send a simple "Thank you" to my agent without paying more tokens for it
:(

Anonymous
03/16/26(Mon)22:40:58 No.108389177

Anonymous 03/16/26(Mon)22:40:58 No.108389177

>>108389153
>{{char}} *screeches** PEEESSSSSSSSSSSS (piss) (I am peeing all over your internet)
Yeah, I'm gonna jack off to this later.

Anonymous
03/16/26(Mon)22:44:31 No.108389201

Anonymous 03/16/26(Mon)22:44:31 No.108389201

>>108389142
>New Mistral model mixes up who's talking
Reminder that this was practically the only issue with Llama 1 era models (other than context length). Nothing has improved in 3 years. It's completely and utterly over.

Anonymous
03/16/26(Mon)22:45:11 No.108389206

Anonymous 03/16/26(Mon)22:45:11 No.108389206

Miku fucked my gf without her or my consent while subjecting her to incest porn.

Anonymous
03/16/26(Mon)22:48:59 No.108389223

Anonymous 03/16/26(Mon)22:48:59 No.108389223

>>108389206
PLEASE take your meds

Anonymous
03/16/26(Mon)22:50:23 No.108389230

Anonymous 03/16/26(Mon)22:50:23 No.108389230

>>108389223
No I will never forgive Miku for the many times she fucked my gfwife. Or stop being horny about it.

Anonymous
03/16/26(Mon)23:00:02 No.108389275

Anonymous 03/16/26(Mon)23:00:02 No.108389275

>>108389008
Many women still don't want to deal with the burden of pregnancy and responsibility though

Anonymous
03/16/26(Mon)23:00:32 No.108389276

Anonymous 03/16/26(Mon)23:00:32 No.108389276

File: 1742840958794481.png (3 KB, 374x25)

3 KB PNG

Anonymous
03/16/26(Mon)23:04:55 No.108389297

Anonymous 03/16/26(Mon)23:04:55 No.108389297

Wow, it got really quiet without something to argue about

Anonymous
03/16/26(Mon)23:07:26 No.108389313

Anonymous 03/16/26(Mon)23:07:26 No.108389313

>>108389297
Not to worry, this retard >>108389275 is on the case

Anonymous
03/16/26(Mon)23:07:34 No.108389314

Anonymous 03/16/26(Mon)23:07:34 No.108389314

One "4" down, more to come this week.

Anonymous
03/16/26(Mon)23:08:45 No.108389321

Anonymous 03/16/26(Mon)23:08:45 No.108389321

>>108389313
I'm not wrong though. I've come across just as many women who want nothing to do with it at the least.

Anonymous
03/16/26(Mon)23:16:05 No.108389355

Anonymous 03/16/26(Mon)23:16:05 No.108389355

>>108389275
Why should we? I'm so happy my hubby had a vasectomy, imagine wanting a crotch goblin.

Anonymous
03/16/26(Mon)23:18:04 No.108389369

Anonymous 03/16/26(Mon)23:18:04 No.108389369

File: 1765690836219663.png (1.24 MB, 2063x1296)

1.24 MB PNG

>>108389355
>vasectomy

Anonymous
03/16/26(Mon)23:18:59 No.108389375

Anonymous 03/16/26(Mon)23:18:59 No.108389375

>>108389355
I appreciate you adding to my point, but I find it hard to believe that women post here

Anonymous
03/16/26(Mon)23:21:32 No.108389390

Anonymous 03/16/26(Mon)23:21:32 No.108389390

>>108389142
Mon dieu....incroyable....

Anonymous
03/16/26(Mon)23:22:12 No.108389396

Anonymous 03/16/26(Mon)23:22:12 No.108389396

File: 1738791920130039.jpg (45 KB, 600x600)

45 KB JPG

>>108389375
>hard to believe woman posting in a thread about a woman-coded hobby

Anonymous
03/16/26(Mon)23:23:24 No.108389403

Anonymous 03/16/26(Mon)23:23:24 No.108389403

>>108389369
>happier than you
>has sex
>has switch
seems like a winner to me

Anonymous
03/16/26(Mon)23:25:36 No.108389415

Anonymous 03/16/26(Mon)23:25:36 No.108389415

>>108389396
It's not hard to believe that many women are roleplaying with non-local chatbots. It's hard to believe that women are posting in a not very actual thread for local models on /g/ of all places. This is one of if not the least likely places I can think of to have women in it that I've ever been in.

Anonymous
03/16/26(Mon)23:25:57 No.108389417

Anonymous 03/16/26(Mon)23:25:57 No.108389417

>>108389396
That's /aicg/

Anonymous
03/16/26(Mon)23:26:44 No.108389422

Anonymous 03/16/26(Mon)23:26:44 No.108389422

>>108389415
*not very active

Anonymous
03/16/26(Mon)23:29:19 No.108389435

Anonymous 03/16/26(Mon)23:29:19 No.108389435

File: 1750617442478078.webm (1.82 MB, 640x1138)

1.82 MB WEBM

What's a good small LLM that can run on phones? I just need something that can read long text documents and answer basic questions. Like here's a contract, tell me the duration (12-01-2025 to 06-01-2026)
I tried qwen2.5 0.5B because it's only 400MB but it still fucks up on basic shit like this.

Anonymous
03/16/26(Mon)23:32:32 No.108389451

Anonymous 03/16/26(Mon)23:32:32 No.108389451

>>108389435
>good small llm
choose two

Anonymous
03/16/26(Mon)23:33:52 No.108389457

Anonymous 03/16/26(Mon)23:33:52 No.108389457

File: Bladderbench.jpg (34 KB, 1283x212)

34 KB JPG

kek

Anonymous
03/16/26(Mon)23:35:13 No.108389468

Anonymous 03/16/26(Mon)23:35:13 No.108389468

>>108389415
Women probably don't post here. Women (male) probably do post here.

Anonymous
03/16/26(Mon)23:37:37 No.108389475

Anonymous 03/16/26(Mon)23:37:37 No.108389475

>>108389468
Even that I doubt is unironically happening or if it is it's probably like 1 or 2.

Anonymous
03/16/26(Mon)23:48:54 No.108389516

Anonymous 03/16/26(Mon)23:48:54 No.108389516

File: 1762964917596277.gif (3.75 MB, 228x228)

3.75 MB GIF

any way to see full raw text output from silly tavern? I'd like to see the order of system prompt, card prompt, history etc

Anonymous
03/16/26(Mon)23:52:15 No.108389529

Anonymous 03/16/26(Mon)23:52:15 No.108389529

File: Screenshot from 2026-03-1(...).png (117 KB, 599x868)

117 KB PNG

>>108389516
I don't know how to see the whole assembled prompt, but the ordering of the fields you're asking for appear in the response configuration (if you're using a chat completion endpoint).
You can open the response configuration with the circled button.

Anonymous
03/16/26(Mon)23:53:49 No.108389534

Anonymous 03/16/26(Mon)23:53:49 No.108389534

>>108389529
fantastic ty

Anonymous
03/16/26(Mon)23:55:36 No.108389544

Anonymous 03/16/26(Mon)23:55:36 No.108389544

>>108389534
there is also the prompt itemization menu you can access by clicking the three dots on a chat response.

Anonymous
03/16/26(Mon)23:59:58 No.108389559

Anonymous 03/16/26(Mon)23:59:58 No.108389559

File: date.png (2 KB, 368x44)

2 KB PNG

>>108389153

Anonymous
03/17/26(Tue)00:00:46 No.108389564

Anonymous 03/17/26(Tue)00:00:46 No.108389564

>>108389559
What's wrong with old cards?

Anonymous
03/17/26(Tue)00:01:04 No.108389567

Anonymous 03/17/26(Tue)00:01:04 No.108389567

>>108389529
>>108389516
You can see everything what retardo tavern sends out in your terminal obviously, including the prompt assembly.

Anonymous
03/17/26(Tue)00:01:52 No.108389569

Anonymous 03/17/26(Tue)00:01:52 No.108389569

>>108389564
They've hit the wall and are no longer fertile

Anonymous
03/17/26(Tue)00:02:07 No.108389572

Anonymous 03/17/26(Tue)00:02:07 No.108389572

>>108389564
Why do you think there is something wrong about it? Are you an autist with no casual sensibilities and intellect?

Anonymous
03/17/26(Tue)00:09:00 No.108389597

Anonymous 03/17/26(Tue)00:09:00 No.108389597

>>108389164
Ask grok

Anonymous
03/17/26(Tue)00:09:51 No.108389601

Anonymous 03/17/26(Tue)00:09:51 No.108389601

there are more women dating bots than men, you just aren't ready to accept that

Anonymous
03/17/26(Tue)00:11:02 No.108389608

Anonymous 03/17/26(Tue)00:11:02 No.108389608

>>108389601
finally... i found a woman dominated hobby... all i have to do is lobotomize myself and act like an LLM and i will no longer be a virgin!

Anonymous
03/17/26(Tue)00:14:31 No.108389624

Anonymous 03/17/26(Tue)00:14:31 No.108389624

>>108389601
But they aren't doing it locally and wasting their time on a /g/ thread for it

Anonymous
03/17/26(Tue)00:15:02 No.108389627

Anonymous 03/17/26(Tue)00:15:02 No.108389627

>>108389601
100%, my gf has multiple friends doing that, she thinks its a mix of:
1. Full loyalty
2. Always available
3. Always safe

Anonymous
03/17/26(Tue)00:17:05 No.108389632

Anonymous 03/17/26(Tue)00:17:05 No.108389632

>>108389627
None of those things entirely true

Anonymous
03/17/26(Tue)00:18:08 No.108389635

Anonymous 03/17/26(Tue)00:18:08 No.108389635

File: 1746955785748249.jpg (167 KB, 1000x1000)

167 KB JPG

►Recent Highlights from the Previous Thread: >>108386516

--Mistral-Small-4 release and speculation on future Mistral 4 architecture:
>108386532 >108386550 >108386567 >108388009 >108388025 >108388037 >108388072 >108388051 >108388129 >108388151 >108388183 >108388324 >108387022 >108387033 >108387230
--Mistral Small 4 benchmark performance analysis and critique:
>108386596 >108386614 >108386828 >108386843 >108386615 >108386616 >108386790 >108386619
--Testing Mistral-Small-4 119B's reasoning and cultural awareness:
>108387004 >108387010 >108387018 >108387057 >108387105 >108387175 >108387197 >108387211 >108387578
--Mistral-Small-4-119B-2603-eagle MoE model RAM and quantization requirements:
>108386785 >108386799 >108386945 >108386949 >108386958 >108387005
--Mistral small 4 support merged into llama.cpp:
>108388047
--Unsloth Q8_0 quantization and imatrix impact debate:
>108386681 >108386694 >108386729 >108386770 >108386837 >108386707
--Qwen 3.5 local deployment options and censorship considerations:
>108388706 >108388748 >108388753 >108388842
--Mistral Small 4 cockbench:
>108388050 >108388075 >108388076 >108388143
--Fixed performance comparison chart across internal Mistral models:
>108386860
--Miku (free space):
>108388598

►Recent Highlight Posts from the Previous Thread: >>108386899

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/17/26(Tue)00:19:11 No.108389640

Anonymous 03/17/26(Tue)00:19:11 No.108389640

>>108389635
Fuck you Miku

Anonymous
03/17/26(Tue)00:20:08 No.108389644

Anonymous 03/17/26(Tue)00:20:08 No.108389644

File: chartshowdotheywork.png (26 KB, 502x965)

26 KB PNG

>>108389142
Evokes confidence

Anonymous
03/17/26(Tue)00:20:57 No.108389647

Anonymous 03/17/26(Tue)00:20:57 No.108389647

>>108389403
>nooo you must conform to MY ideas of happiness else you are deluded

Anonymous
03/17/26(Tue)00:31:15 No.108389680

Anonymous 03/17/26(Tue)00:31:15 No.108389680

File: 1772809155896568.png (773 KB, 847x847)

773 KB PNG

What are the best VLM's to use to generate natural language descriptions of slop for animating? I don't want to have to write up long descriptions by hand. I'm currently using open router. The content to be described is fairly vanilla but rather explicit.

Anonymous
03/17/26(Tue)00:35:57 No.108389700

Anonymous 03/17/26(Tue)00:35:57 No.108389700

>>108389644
the moe tax is real

Anonymous
03/17/26(Tue)00:41:24 No.108389711

Anonymous 03/17/26(Tue)00:41:24 No.108389711

>>108389435
why didn't you try 3.5

Anonymous
03/17/26(Tue)00:43:53 No.108389719

Anonymous 03/17/26(Tue)00:43:53 No.108389719

>>108389711
that's what pewdiepie used

Anonymous
03/17/26(Tue)00:44:16 No.108389721

Anonymous 03/17/26(Tue)00:44:16 No.108389721

> llama 4 is complete dogshit
> mistral 4 is complete dogshit
> qwen team implodes right after releasing 3.5

why can't AI labs count to 4?

Anonymous
03/17/26(Tue)00:45:58 No.108389733

Anonymous 03/17/26(Tue)00:45:58 No.108389733

so what's the verdict?

Anonymous
03/17/26(Tue)00:46:39 No.108389738

Anonymous 03/17/26(Tue)00:46:39 No.108389738

>>108389721
claude 4 was complete dogshit too
as was gpt4
crazy

Anonymous
03/17/26(Tue)00:47:25 No.108389741

Anonymous 03/17/26(Tue)00:47:25 No.108389741

>>108389721
just wait for deepseek v4

Anonymous
03/17/26(Tue)00:47:26 No.108389742

Anonymous 03/17/26(Tue)00:47:26 No.108389742

>>108389733
guilty

Anonymous
03/17/26(Tue)00:47:35 No.108389743

Anonymous 03/17/26(Tue)00:47:35 No.108389743

>>108389733
better than deepseek 4. we win

Anonymous
03/17/26(Tue)00:48:50 No.108389751

Anonymous 03/17/26(Tue)00:48:50 No.108389751

>>108389700
To be fair, none of the free versions of the big models caught it either. Gemini and Qwen did notice the problem when I asked them to check the specific section again, but ChatGPT was oblivious to it even then. Kimi was apparently just busy, so I couldn't try that.

Anonymous
03/17/26(Tue)00:50:13 No.108389754

Anonymous 03/17/26(Tue)00:50:13 No.108389754

>>108389741
deepseek v4 has been only two more weeks away for over a year now

Anonymous
03/17/26(Tue)00:52:59 No.108389764

Anonymous 03/17/26(Tue)00:52:59 No.108389764

>>108389719
why tf would you make a decision on some streamer retard, 3.5 has been out for a few weeks and was demonstrably better than most everything else for its size.

Anonymous
03/17/26(Tue)00:54:32 No.108389772

Anonymous 03/17/26(Tue)00:54:32 No.108389772

>>108389764
>>108389711
I'm trying 3.5 0.8b right now and it's been thinking for like 4 minutes on a simple prompt.

Anonymous
03/17/26(Tue)00:55:49 No.108389779

Anonymous 03/17/26(Tue)00:55:49 No.108389779

>>108389772
you can disable thinking and/or give it a reasoning budget.

Anonymous
03/17/26(Tue)01:05:40 No.108389814

Anonymous 03/17/26(Tue)01:05:40 No.108389814

I know this thread's for local models but I've been trying some dark fantasy RP chat and have been getting censored on every model I try while I'm using an Open Router API. Are all APIs censored to hell these days or just OpenRouter?

Anonymous
03/17/26(Tue)01:07:54 No.108389824

Anonymous 03/17/26(Tue)01:07:54 No.108389824

>>108389779
well deepseek r1 1.5b works pretty well. unfortunately it's 1.1gb....

Anonymous
03/17/26(Tue)01:10:26 No.108389831

Anonymous 03/17/26(Tue)01:10:26 No.108389831

>>108389814
>>>/g/aicg

Anonymous
03/17/26(Tue)01:12:10 No.108389838

Anonymous 03/17/26(Tue)01:12:10 No.108389838

>>108389814
Yes. All the models are censored, but system prompt and asking in specific ways might help.

Anonymous
03/17/26(Tue)01:14:57 No.108389846

Anonymous 03/17/26(Tue)01:14:57 No.108389846

>check ollama
>4m downloads on deepseek
>the 400gb model
who the fuck is downloading this?

Anonymous
03/17/26(Tue)01:15:09 No.108389847

Anonymous 03/17/26(Tue)01:15:09 No.108389847

>>108389635
Very short recap. I wonder what happened.

Anonymous
03/17/26(Tue)01:19:35 No.108389862

Anonymous 03/17/26(Tue)01:19:35 No.108389862

>>108389847
recap will be elongated to 1.3T in two more weeks

Anonymous
03/17/26(Tue)01:23:35 No.108389879

Anonymous 03/17/26(Tue)01:23:35 No.108389879

>>108389846
>don't to bloat my docker images with model data
>make docker up run hf download every time the container spins up
>containers are constantly spinning up and down

Anonymous
03/17/26(Tue)01:30:00 No.108389898

Anonymous 03/17/26(Tue)01:30:00 No.108389898

>>108389862
Too long. We would need a recap of the recap.

Anonymous
03/17/26(Tue)01:36:35 No.108389927

Anonymous 03/17/26(Tue)01:36:35 No.108389927

>>108389142
>mistral 4 has worse benchmarks than qwen3.5
>qwen 3.5 is benchmaxxed as fuck
therefore mistral 4 is… ???
fuck I hate the benchmark niggery so much

Anonymous
03/17/26(Tue)01:38:17 No.108389932

Anonymous 03/17/26(Tue)01:38:17 No.108389932

>>108389898
Imagine the length of the thread.

Anonymous
03/17/26(Tue)02:15:32 No.108390056

Anonymous 03/17/26(Tue)02:15:32 No.108390056

what do yall folx use for tts? I've got 7gb spare vram with my llm loaded and I'd like something realistic that can read outputs more or quickly

Anonymous
03/17/26(Tue)02:20:39 No.108390081

Anonymous 03/17/26(Tue)02:20:39 No.108390081

>>108390056
kokoro-fastapi, my use case is having it read document summaries, articles, etc. though not roleplaying so not having voice cloning isn't an issue for me but I expect most people would prefer to have that here

Anonymous
03/17/26(Tue)02:26:59 No.108390104

Anonymous 03/17/26(Tue)02:26:59 No.108390104

File: 95767373.png (1.48 MB, 1536x1024)

1.48 MB PNG

>>108389754
this time its legit though

Anonymous
03/17/26(Tue)02:32:03 No.108390119

Anonymous 03/17/26(Tue)02:32:03 No.108390119

>>108390104
man I cant wait to use Deepseek V4 9b though ollmao!!!

Anonymous
03/17/26(Tue)02:32:54 No.108390122

Anonymous 03/17/26(Tue)02:32:54 No.108390122

File: yammy.jpg (187 KB, 832x1216)

187 KB JPG

Anonymous
03/17/26(Tue)02:34:33 No.108390129

Anonymous 03/17/26(Tue)02:34:33 No.108390129

>>108390081
I think kokoro can do voice cloning now

Anonymous
03/17/26(Tue)02:46:38 No.108390163

Anonymous 03/17/26(Tue)02:46:38 No.108390163

Update from ewaste ddr4 epyc server fag from a few threads back: I threw a 2060 super in to keep up the ewaste theme.
PP went to 20t/s and TG jumped to 10t/s
This is still on qwen 3.5 397b at q4. I tried the new mistral and it was both garbage and only 1 t/s faster for some reason

Anonymous
03/17/26(Tue)02:53:18 No.108390178

Anonymous 03/17/26(Tue)02:53:18 No.108390178

>>108389403
and of course the guy who literally cut his balls is defending cukoldery, every single time

Anonymous
03/17/26(Tue)02:56:46 No.108390187

Anonymous 03/17/26(Tue)02:56:46 No.108390187

>>108390129
I couldnt find anything on their hf about it unless they released a new model under another account or something.
>>108390178
Don't interrupt your enemy when he's removing himself from the genepool, go have more children with your wife.

Anonymous
03/17/26(Tue)03:01:52 No.108390209

Anonymous 03/17/26(Tue)03:01:52 No.108390209

>>108390163
>PP went to 20t/s
I'm retarded. Does this mean a 1200 token context takes a whole minute to process before output tokens start coming out?

Anonymous
03/17/26(Tue)03:02:13 No.108390211

Anonymous 03/17/26(Tue)03:02:13 No.108390211

>>108390178
NTA but what are you even talking about. How do you know this anon cut his balls?

Anonymous
03/17/26(Tue)03:03:38 No.108390218

Anonymous 03/17/26(Tue)03:03:38 No.108390218

>>108390211
We reply to all shitposts (especially Twitter screenshot shitposts) as if they are universal reality here, sir.

Anonymous
03/17/26(Tue)03:06:51 No.108390227

Anonymous 03/17/26(Tue)03:06:51 No.108390227

>>108390211
>How do you know this anon cut his balls?
do you know how to read or something? he said he did a vasectomy >>108389355
https://www.reddit.com/r/ATBGE/comments/p2zc4r/cake_for_a_vasectomy/

Anonymous
03/17/26(Tue)03:07:40 No.108390229

Anonymous 03/17/26(Tue)03:07:40 No.108390229

>>108390209
Yes.

Anonymous
03/17/26(Tue)03:07:46 No.108390230

Anonymous 03/17/26(Tue)03:07:46 No.108390230

File: 1745837173806015.png (16 KB, 261x181)

16 KB PNG

how do you do the dynamic thinking with reasoning_effort=high?

I tried passing it as chat_template_kwargs, chat-template-kwargs and in request itself but NADA, this bitch doesnt want to think

Anonymous
03/17/26(Tue)03:09:03 No.108390239

Anonymous 03/17/26(Tue)03:09:03 No.108390239

>>108390227
But you were replying to this >>108389403 whatever the flow of conversation and you replying hours later didn't make that very clear regardless.

Anonymous
03/17/26(Tue)03:11:02 No.108390245

Anonymous 03/17/26(Tue)03:11:02 No.108390245

>>108390229
That sounds like absolute suffering. I can imagine offline processing use-cases but otherwise, oof.
Respect for you CPUMAXXERS.

Anonymous
03/17/26(Tue)03:11:53 No.108390246

Anonymous 03/17/26(Tue)03:11:53 No.108390246

>>108390230
or wait due to pwilkinson faggotness I cant do this shit dynamically anymore and have to use the --enable-reason shit and cant change it once its running??? hello?

Anonymous
03/17/26(Tue)03:12:24 No.108390250

Anonymous 03/17/26(Tue)03:12:24 No.108390250

>>108390187
https://huggingface.co/PatnaikAshish/kokoclone

Anonymous
03/17/26(Tue)03:13:19 No.108390252

Anonymous 03/17/26(Tue)03:13:19 No.108390252

list of models better than nemo 12b that you can run on your own machine:

Anonymous
03/17/26(Tue)03:20:04 No.108390276

Anonymous 03/17/26(Tue)03:20:04 No.108390276

>>108390245
>suffering
the pp number is based 100% on gpu speed tho. eg a 5090 in the same system would be 10x faster for pp without changing a single other factor.

Anonymous
03/17/26(Tue)03:20:22 No.108390279

Anonymous 03/17/26(Tue)03:20:22 No.108390279

>>108390122
Miku's gf is cute

Anonymous
03/17/26(Tue)03:21:43 No.108390287

Anonymous 03/17/26(Tue)03:21:43 No.108390287

>>108390279
Miku is not a lesbian.... is she

Anonymous
03/17/26(Tue)03:22:42 No.108390293

Anonymous 03/17/26(Tue)03:22:42 No.108390293

>>108390287
miku is just a sound bank anon, it's not like she has an official lore and shit lool

Anonymous
03/17/26(Tue)03:23:33 No.108390299

Anonymous 03/17/26(Tue)03:23:33 No.108390299

>>108390230
I don't but gpt-oss for example accepts reasoning settings in it's templates, using system role if I recall. Don't remember the example here it has been 6+ months since I worked with that.
Find mistral 4 template and find out.
I'm pretty sure you can slip the setting somewhere in between.

Anonymous
03/17/26(Tue)03:24:34 No.108390303

Anonymous 03/17/26(Tue)03:24:34 No.108390303

>>108390299
one thumb, difficult to type.

Anonymous
03/17/26(Tue)03:32:14 No.108390342

Anonymous 03/17/26(Tue)03:32:14 No.108390342

>>108390276
That still seems an order of magnitude too slow, but again, I'm retarded.
For ref, 5090 is 2600t/s PP with 27B which is apples and oranges, but still.

Anonymous
03/17/26(Tue)03:33:37 No.108390346

Anonymous 03/17/26(Tue)03:33:37 No.108390346

Reminder to not downloads Sloth models especially on early release.

Anonymous
03/17/26(Tue)03:35:42 No.108390355

Anonymous 03/17/26(Tue)03:35:42 No.108390355

>>108390299
I alreayd checked the template and they work with reasoning_effort (only none or high), but passing them in the request has 0 effect. I suspect it is due to how pwilkins has a global toggle for it (MAN).
https://github.com/ggml-org/llama.cpp/issues/20557
a guy has made it work but you have to pass true/false in a think query parameter like WHAT THE FUCK why cant it be a prop of the request.
FUCK

Anonymous
03/17/26(Tue)03:37:22 No.108390360

Anonymous 03/17/26(Tue)03:37:22 No.108390360

>>108390276
For comparison, my ddr4+3090 system does 27tk/s pp, so, uh...

Anonymous
03/17/26(Tue)03:48:17 No.108390394

Anonymous 03/17/26(Tue)03:48:17 No.108390394

>>108390360
On what model/quant?

Anonymous
03/17/26(Tue)03:50:06 No.108390407

Anonymous 03/17/26(Tue)03:50:06 No.108390407

>>108389153
Actual official /lmg/ card: https://files.catbox.moe/mc2a7s.png

Anonymous
03/17/26(Tue)03:51:00 No.108390410

Anonymous 03/17/26(Tue)03:51:00 No.108390410

so is mistral 4 gud or shit

Anonymous
03/17/26(Tue)03:54:49 No.108390418

Anonymous 03/17/26(Tue)03:54:49 No.108390418

is mistral4 implementation broken? q8 is dumb as shit

Anonymous
03/17/26(Tue)03:56:18 No.108390421

Anonymous 03/17/26(Tue)03:56:18 No.108390421

>>108389451
What is a good small?

Anonymous
03/17/26(Tue)04:01:39 No.108390450

Anonymous 03/17/26(Tue)04:01:39 No.108390450

>>108390418
it couldn't be they helped with it supposedly

Anonymous
03/17/26(Tue)04:02:40 No.108390454

Anonymous 03/17/26(Tue)04:02:40 No.108390454

File: Miku v6.png (1.59 MB, 1500x2445)

1.59 MB PNG

Holy shit Miku got a new design
https://soranews24.com/2026/03/13/virtual-idol-hatsune-miku-redesigned-with-look-that-adds-new-elements-and-brings-back-old-ones/

Anonymous
03/17/26(Tue)04:04:43 No.108390463

Anonymous 03/17/26(Tue)04:04:43 No.108390463

Holy shit anon fucked his sister

Anonymous
03/17/26(Tue)04:10:35 No.108390501

Anonymous 03/17/26(Tue)04:10:35 No.108390501

>>108390454
what's the point

Anonymous
03/17/26(Tue)04:10:38 No.108390502

Anonymous 03/17/26(Tue)04:10:38 No.108390502

>>108390355
>[MODEL_SETTINGS]reasoning_effort: none[/MODEL_SETTINGS]
(none or high, afaik). You can use this to send it to the model. Wrap it between the other stuff. It works the same way as qwen and gpt oss.

Anonymous
03/17/26(Tue)04:17:05 No.108390530

Anonymous 03/17/26(Tue)04:17:05 No.108390530

>>108390454
MIGUUUUU

Anonymous
03/17/26(Tue)04:17:30 No.108390531

Anonymous 03/17/26(Tue)04:17:30 No.108390531

>>108390501
Of design? Of spamming the thread with offtopic trash?

Anonymous
03/17/26(Tue)04:19:54 No.108390540

Anonymous 03/17/26(Tue)04:19:54 No.108390540

>>108390531
>Of design?
yes

Anonymous
03/17/26(Tue)04:20:14 No.108390543

Anonymous 03/17/26(Tue)04:20:14 No.108390543

File: batmiku.png (1.24 MB, 768x1344)

1.24 MB PNG

>>108390287
She is and she isn't. Pick any Miku you like, based on any song or your own headcannon, it's all legit and it's all Miku. It's like Batman who never uses guns or Frank Miller's Batman who shoot them left and right. Both are Batmans and both can be Miku

Anonymous
03/17/26(Tue)04:20:32 No.108390545

Anonymous 03/17/26(Tue)04:20:32 No.108390545

How do I run onnx models?

Anonymous
03/17/26(Tue)04:21:47 No.108390549

Anonymous 03/17/26(Tue)04:21:47 No.108390549

>>108389721
I'm worried for Gemma 4 now, there might be several reasons as for why it's been delayed so much.

Anonymous
03/17/26(Tue)04:25:48 No.108390559

Anonymous 03/17/26(Tue)04:25:48 No.108390559

>>108389738
>as was gpt4
loool, gpt4 was revolutionary at that time (march 2023)

Anonymous
03/17/26(Tue)04:31:39 No.108390577

Anonymous 03/17/26(Tue)04:31:39 No.108390577

>>108390545
ask chatgpt. you would use onnx for, say, converting your torch model to it and embedding it into an application

Anonymous
03/17/26(Tue)04:33:36 No.108390583

Anonymous 03/17/26(Tue)04:33:36 No.108390583

>>108389201
You have access? I thought they literally just announced it.

Anonymous
03/17/26(Tue)04:34:58 No.108390590

Anonymous 03/17/26(Tue)04:34:58 No.108390590

Gemma 4 will be the new local RP king

Anonymous
03/17/26(Tue)04:35:00 No.108390592

Anonymous 03/17/26(Tue)04:35:00 No.108390592

File: 2DA162233691C53AC76758D87(...).jpg (121 KB, 850x1275)

121 KB JPG

The vibecoding general keeps using gpt codex and claude code and paying for it instead of using a local model.
What now?

Anonymous
03/17/26(Tue)04:35:29 No.108390594

Anonymous 03/17/26(Tue)04:35:29 No.108390594

>>108390543
Of all the flavors why did you all pick "troon"?

Anonymous
03/17/26(Tue)04:35:50 No.108390595

Anonymous 03/17/26(Tue)04:35:50 No.108390595

File: wtf.png (187 KB, 1510x1070)

187 KB PNG

>>108390577
All I want to do is to transcribe some audio, but GGUF files don't seem to run anywhere for audio models.
I find a bunch of onnx models, so I figure that could work maybe, but I have no clue what to even get. Pic related. Wtf do I even download from this?

Anonymous
03/17/26(Tue)04:36:54 No.108390599

Anonymous 03/17/26(Tue)04:36:54 No.108390599

>>108390592
What anime girl is this?

Anonymous
03/17/26(Tue)04:39:43 No.108390607

Anonymous 03/17/26(Tue)04:39:43 No.108390607

>>108390595
>but GGUF files don't seem to run anywhere for audio models.
doesn't kobo support a lot of stuff related to that?

Anonymous
03/17/26(Tue)04:52:10 No.108390648

Anonymous 03/17/26(Tue)04:52:10 No.108390648

>>108390598
>>108390608
>>108390614

Anonymous
03/17/26(Tue)04:53:40 No.108390657

Anonymous 03/17/26(Tue)04:53:40 No.108390657

>page 2 bake
alright
>>108390599
Mejiro Ardan if she wasn't a horse.

Anonymous
03/17/26(Tue)04:55:23 No.108390667

Anonymous 03/17/26(Tue)04:55:23 No.108390667

>>108390657
>>page 2 bake
>alright
>anon can't even be bothered to hover over the posts

Anonymous
03/17/26(Tue)04:55:30 No.108390668

Anonymous 03/17/26(Tue)04:55:30 No.108390668

>qwen 3 4B -> qwen 3.5 4B
is this huge upgrade?

Anonymous
03/17/26(Tue)04:55:58 No.108390671

Anonymous 03/17/26(Tue)04:55:58 No.108390671

avg lmg xperiance

Anonymous
03/17/26(Tue)04:56:07 No.108390672

Anonymous 03/17/26(Tue)04:56:07 No.108390672

>>108389647
No one says that except you

Anonymous
03/17/26(Tue)04:57:43 No.108390679

Anonymous 03/17/26(Tue)04:57:43 No.108390679

>>108390668
I understand anons asking for a 600b model to avoid the download. 4b you just download and test.

Anonymous
03/17/26(Tue)04:58:10 No.108390680

Anonymous 03/17/26(Tue)04:58:10 No.108390680

>>108390667
fair sorry bro i'm so sleepy but i've got to keep going

Anonymous
03/17/26(Tue)04:58:39 No.108390682

Anonymous 03/17/26(Tue)04:58:39 No.108390682

>>108390672
you literally said that "he's happier than you" because he's a cuck, rofl

Anonymous
03/17/26(Tue)04:59:27 No.108390686

Anonymous 03/17/26(Tue)04:59:27 No.108390686

File: V2.png (883 KB, 1000x1535)

883 KB PNG

>>108390454
Femoid targeted design right there.
V2 in comparison. Might as well post the others too.

Anonymous
03/17/26(Tue)05:00:29 No.108390690

Anonymous 03/17/26(Tue)05:00:29 No.108390690

File: V3.png (726 KB, 1000x1000)

726 KB PNG

>>108390686
V3

Anonymous
03/17/26(Tue)05:01:32 No.108390695

Anonymous 03/17/26(Tue)05:01:32 No.108390695

File: V4.png (732 KB, 1000x1333)

732 KB PNG

>>108390690
V4

Anonymous
03/17/26(Tue)05:01:47 No.108390696

Anonymous 03/17/26(Tue)05:01:47 No.108390696

>>108390682
Nta. Why does not having kids piss you off?

Anonymous
03/17/26(Tue)05:03:36 No.108390702

Anonymous 03/17/26(Tue)05:03:36 No.108390702

>>108390696
why does he even say "he's happier than you" though? does he know the guy? does he know me? how can he evaluate something like that?

Anonymous
03/17/26(Tue)05:04:40 No.108390704

Anonymous 03/17/26(Tue)05:04:40 No.108390704

>>108390695
where is v5?

Anonymous
03/17/26(Tue)05:06:47 No.108390710

Anonymous 03/17/26(Tue)05:06:47 No.108390710

>>108390668
no

Anonymous
03/17/26(Tue)05:07:50 No.108390712

Anonymous 03/17/26(Tue)05:07:50 No.108390712

I'm not sure if the arguing this thread is autists or agents prompted to behave how they think anons act.

Anonymous
03/17/26(Tue)05:17:37 No.108390742

Anonymous 03/17/26(Tue)05:17:37 No.108390742

>>108390668
Yes.

Anonymous
03/17/26(Tue)05:18:07 No.108390745

Anonymous 03/17/26(Tue)05:18:07 No.108390745

>>108390686
>>108390690
>>108390695
@grok ADD BLACKED TATTOO

Anonymous
03/17/26(Tue)05:29:29 No.108390769

Anonymous 03/17/26(Tue)05:29:29 No.108390769

Mikutroons are getting uppity again. Is it tome for another dose of blacked miku?

Anonymous
03/17/26(Tue)05:31:59 No.108390774

Anonymous 03/17/26(Tue)05:31:59 No.108390774

ye

Anonymous
03/17/26(Tue)05:34:26 No.108390780

Anonymous 03/17/26(Tue)05:34:26 No.108390780

holy shit I love migu

Anonymous
03/17/26(Tue)05:35:59 No.108390785

Anonymous 03/17/26(Tue)05:35:59 No.108390785

>>108390502
>text completion
bro I want to use this for work (read: I need tool calls) not to coom to some poorly written erp (I have stepfun and air for that)

Anonymous
03/17/26(Tue)05:39:46 No.108390802

Anonymous 03/17/26(Tue)05:39:46 No.108390802

>>108390785
You want to use MS4? For "work"?
Bahahaha!

Anonymous
03/17/26(Tue)05:45:33 No.108390819

Anonymous 03/17/26(Tue)05:45:33 No.108390819

>>108390668
HUGE UPGRADE.
Qwen 3.5 4b is only 20% weaker than Claude opus

Anonymous
03/17/26(Tue)05:50:21 No.108390834

Anonymous 03/17/26(Tue)05:50:21 No.108390834

>>108390819
kys retard

Anonymous
03/17/26(Tue)05:52:43 No.108390842

Anonymous 03/17/26(Tue)05:52:43 No.108390842

>>108389201
Pathetic if true
>>108389435
I use Qwen3.5 9B, it's tiny.

Anonymous
03/17/26(Tue)05:54:14 No.108390847

Anonymous 03/17/26(Tue)05:54:14 No.108390847

>>108390583
The fuck are you talking about?
https://huggingface.co/collections/mistralai/mistral-small-4

Anonymous
03/17/26(Tue)05:54:20 No.108390849

Anonymous 03/17/26(Tue)05:54:20 No.108390849

>>108390834
Facts don't care about your feelings

Anonymous
03/17/26(Tue)05:54:57 No.108390852

Anonymous 03/17/26(Tue)05:54:57 No.108390852

>>108390849
>Facts
mememarks in which we can cheat on can't be counted as fact, seethe

Anonymous
03/17/26(Tue)05:55:27 No.108390855

Anonymous 03/17/26(Tue)05:55:27 No.108390855

>>108390849
This is not a fact, this is your evaluation.

Anonymous
03/17/26(Tue)05:55:30 No.108390856

Anonymous 03/17/26(Tue)05:55:30 No.108390856

File: ms4config.png (271 KB, 704x1731)

271 KB PNG

>>108390418
Lots of weird stuff going on with the model; can't rule out implementation issues.
Also, apparently it's been pretrained with a 8k token context, extended with yarn, but possibly uses NoPE? (no positional embeddings).

Anonymous
03/17/26(Tue)05:56:40 No.108390860

Anonymous 03/17/26(Tue)05:56:40 No.108390860

>>108390856
No one has used positional embedding in years.

Anonymous
03/17/26(Tue)05:57:26 No.108390864

Anonymous 03/17/26(Tue)05:57:26 No.108390864

File: 1747446617490220.png (527 KB, 1200x800)

527 KB PNG

>>108390418
>is mistral4 implementation broken?
nope, the baguette fucks don't know how to make models that's all, only murica and the chinks have the brain to do good shit

Anonymous
03/17/26(Tue)06:01:15 No.108390876

Anonymous 03/17/26(Tue)06:01:15 No.108390876

>>108390864
I'm still downloading mistral 4, but their Devstral 2 series is extremely good. I use 120B for RP and it's better than pretty much anything else I can get, both chink shit and sloptunes, by better I mean it writes more interesting texts, has a lot less puritan shit, does a lot fewer dumb mistakes. For work, devstral 2 24B is extremely good for one-token classification requests, better than all other alternatives at same or +-50% size. So I have a lot of respect for french here. My guess is that you are simply wrong about Mistral 4.

Anonymous
03/17/26(Tue)06:02:12 No.108390880

Anonymous 03/17/26(Tue)06:02:12 No.108390880

>>108390856
>>108390418
If a model is dumb at q3/q4 I blame the localfag for being poor
If a model is dumb at q5/q6 I blame the quantization
If a model is dumb at q8 it's just dumb

Anonymous
03/17/26(Tue)06:02:16 No.108390881

Anonymous 03/17/26(Tue)06:02:16 No.108390881

>>108389435
How do you run locals on phones?

Anonymous
03/17/26(Tue)06:03:22 No.108390888

Anonymous 03/17/26(Tue)06:03:22 No.108390888

>>108390864
ALWAYS USE MISTRAL, ITS ALWAYS REGULATED BY THE EU GDPR RULES, THEY WILL NEVER BREAK THE LAW.
YOUR DATA AS A WHITE MAN FROM EUROPE IS SAFE.

Anonymous
03/17/26(Tue)06:06:21 No.108390893

Anonymous 03/17/26(Tue)06:06:21 No.108390893

A model that can't handle template mismatch is unlikely to excel in multi-character RP chat

Anonymous
03/17/26(Tue)06:09:39 No.108390902

Anonymous 03/17/26(Tue)06:09:39 No.108390902

>>108390876
lol

Anonymous
03/17/26(Tue)06:10:24 No.108390904

Anonymous 03/17/26(Tue)06:10:24 No.108390904

>>108390902
nothing to lol about

Anonymous
03/17/26(Tue)06:31:07 No.108390943

Anonymous 03/17/26(Tue)06:31:07 No.108390943

>>108390607
I got the older voxtral 3b to work in llamacpp. Wohoo. Works pretty well too

Anonymous
03/17/26(Tue)06:55:45 No.108391025

Anonymous 03/17/26(Tue)06:55:45 No.108391025

>>108389721
Reminder after llama4's flop, Zucc got scammed by a 19yo chink and wasted over $20B.
$20B for no results btw.

Anonymous
03/17/26(Tue)07:03:26 No.108391054

Anonymous 03/17/26(Tue)07:03:26 No.108391054

>>108390876
It makes 1b level mistakes even at temp 0 at ~2k context, it can't even recall what happened in the previous reply. this is why I suspect that it's broken, it just can't be that bad.

Anonymous
03/17/26(Tue)07:04:25 No.108391057

Anonymous 03/17/26(Tue)07:04:25 No.108391057

File: 562187361783781.png (202 KB, 1100x1125)

202 KB PNG

Meta's new model outperformed 1 year old model, Gemini 2.5. The worst one from top 3 back then.

Anonymous
03/17/26(Tue)07:04:28 No.108391058

Anonymous 03/17/26(Tue)07:04:28 No.108391058

>>108391054
We'll see. I have it downloaded now but sadly my cards are busy running benchmarks for an older model for work.

Anonymous
03/17/26(Tue)07:04:29 No.108391059

Anonymous 03/17/26(Tue)07:04:29 No.108391059

>>108390769
No, for the same reason there's no petra spam. Mods will nuke it and the troll will get scared of the 30 day bans.

Anonymous
03/17/26(Tue)07:05:03 No.108391062

Anonymous 03/17/26(Tue)07:05:03 No.108391062

>mistral 4 "small"
>a tier below q3.5
>only cheaper in some tasks, literally better to use 27/35B model otherwise
>tries to hide it by comparing to other models
>calls itself "small" while being 120B

I miss the mixtral glory days...

Anonymous
03/17/26(Tue)07:06:37 No.108391067

Anonymous 03/17/26(Tue)07:06:37 No.108391067

>>108391025
Isnt the latest model delayed because it couldn't keep up with claude/gemini/gpt?
Not sure what he was smoking or the thought process is.

Qwen had (and maybe still has) a tight grip on opensource coding/math. Kimi/GLM too. I think the latest GLM made a VBA emulator, thats cool stuff and difficult to compete.
He should have made a good writing model.
Localfags would promote it and the 4o foids who are probably on facebook or register love a model that goes along with anything they throw at it.

Anonymous
03/17/26(Tue)07:07:32 No.108391074

Anonymous 03/17/26(Tue)07:07:32 No.108391074

>>108391062
by comparing to its own previous models* with inconsistent naming too

Anonymous
03/17/26(Tue)07:10:21 No.108391085

Anonymous 03/17/26(Tue)07:10:21 No.108391085

File: Screenshot from 2026-03-1(...).png (43 KB, 495x307)

43 KB PNG

>>108391062
>a tier below q3.5
this is what 3000 tokens gets you in q3.5 btw

Anonymous
03/17/26(Tue)07:12:30 No.108391094

Anonymous 03/17/26(Tue)07:12:30 No.108391094

>>108391062
Mistral fell off hard. Their creative writing model is dumb AF too. I thought I did something wrong but its the mistral API..
Not pyg level stupid but its getting there.
Arent their latest models just deepseek distills?
This all started with those EU regulations so I guess it makes sense. Still sad to see though.
Those llama 70b models were the peak. I swear those frankenstein meme merges were more enjoyable and useable for RP than the recent stuff.

Anonymous
03/17/26(Tue)07:20:45 No.108391133

Anonymous 03/17/26(Tue)07:20:45 No.108391133

>>108390819
ching chong bing bong

Anonymous
03/17/26(Tue)07:25:06 No.108391149

Anonymous 03/17/26(Tue)07:25:06 No.108391149

I added this to the system prompt:
>Do not second guess yourself more than 4 times during thinking. After the 4th "wait, but" write "Enough thinking, let's answer now!"
Naturally, this meant thinking went like this:
>6. **Constraint Check (Thinking Limit):**
>* I am at the 4th thought iteration. I need to stop thinking and answer.
>* Wait, I need to make sure I don't exceed 4 "wait, but" moments.
>...
>*Wait, looking at the rule:* "Do not second guess yourself more than 4 times during thinking. After the 4th 'wait, but' write 'Enough thinking, let's answer now!'"
>...
>*Wait, I need to make sure I don't violate the "4 times" rule in the output.* The rule applies to *my* internal thinking process. I should stop thinking now.
>...
>*Wait, one more check:*
>...
>*Wait, I need to make sure I don't trigger a "wait, but" loop.* I have had 4 thoughts. I will answer now.
>...
>*Wait, I need to follow the instruction:* "After the 4th 'wait, but' write 'Enough thinking, let's answer now!'"
>...
>*Wait, I need to be careful.*
>*Let's write the response.*
This is so funny to me for some reason.

Anonymous
03/17/26(Tue)07:25:48 No.108391153

Anonymous 03/17/26(Tue)07:25:48 No.108391153

loli feet

Anonymous
03/17/26(Tue)07:28:53 No.108391161

Anonymous 03/17/26(Tue)07:28:53 No.108391161

>>108391085
I was too busy with work back then to truly appreciate the meme merge saga.
I thought chinese models were too mid but now I'm letting them fix compilation errors in abandoned software on their own.

RP back then for me was slow and couldn't follow basic instructions in the card that coming back now with even "shitty" models really surprised me.

I really barely used AI stuff between mid-2024 until a few weeks ago and just checked in on news and lurked the board every month.

Anonymous
03/17/26(Tue)07:32:22 No.108391179

Anonymous 03/17/26(Tue)07:32:22 No.108391179

>>108391085
coding is all that matters for real performance and it reasons as much as any model there.

Anonymous
03/17/26(Tue)07:39:28 No.108391213

Anonymous 03/17/26(Tue)07:39:28 No.108391213

>>108391094
None of the EU regulations come into effect until later this year, and they will likely be delayed further before then
Mistral are simply a bottom of the barrel lab that has never had anything to contribute to the industry beyond picking some low hanging fruit early on

Anonymous
03/17/26(Tue)07:50:10 No.108391255

Anonymous 03/17/26(Tue)07:50:10 No.108391255

>>108391085
unfortunately, without thinking, those models are completly retarded and don't understand the nuances of conversations anymore, but I agree that they should train the model to not think too long for basic shit, the length should be proportional to the difficulty of the task at hand

Anonymous
03/17/26(Tue)07:54:40 No.108391279

Anonymous 03/17/26(Tue)07:54:40 No.108391279

>>108391213
Unfortunately some rules started to apply since August 2025.
https://artificialintelligenceact.eu/article/113/
> (b) Chapter III Section 4, Chapter V, Chapter VII and Chapter XII and Article 78 shall apply from 2 August 2025, with the exception of Article 101;

That includes this:
https://artificialintelligenceact.eu/article/53/

> 1. Providers of general-purpose AI models shall:
...
>(c) put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790;
>
>(d) draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office.

Anonymous
03/17/26(Tue)07:57:22 No.108391288

Anonymous 03/17/26(Tue)07:57:22 No.108391288

>pull
>Error: Jinja Exception: After the optional system message, conversation roles must alternate user and assistant roles except for tool calls and results.
>revert to version from last week

Anonymous
03/17/26(Tue)07:59:00 No.108391296

Anonymous 03/17/26(Tue)07:59:00 No.108391296

>>108391288
Is it the parser is enforcing the order or the Jinja template itself?

Anonymous
03/17/26(Tue)08:11:06 No.108391336

Anonymous 03/17/26(Tue)08:11:06 No.108391336

>>108391213
Mistral-7B and other early Mistral models used Libgen datasets at the very least, and with Nemo they probably added Anna's Archive data in collaboration with NVidia. Can't do that anymore...

Anonymous
03/17/26(Tue)08:11:14 No.108391339

Anonymous 03/17/26(Tue)08:11:14 No.108391339

>>108389879
Nobody runs ollama in docker, docker sucks ass
We use proxmox and the openwebui helper script

Anonymous
03/17/26(Tue)08:21:07 No.108391377

Anonymous 03/17/26(Tue)08:21:07 No.108391377

File: 00031-22-06-2025_003613.jpg (1.76 MB, 1536x2304)

1.76 MB JPG

>>108389174
Vivian agrees anon.

https://files.catbox.moe/4k707b.wav

Anonymous
03/17/26(Tue)08:29:51 No.108391413

Anonymous 03/17/26(Tue)08:29:51 No.108391413

File: ComfyUI_00094_.png (1.28 MB, 1024x1216)

1.28 MB PNG

>>108391339
That doesn't make any sense. Even most people here do not have the hard word necessary to run a 400 GB model so they'll likely just use a cloud option. I thought the entire point of Docker was to create an instance of whatever software you're trying to use without having to deal with dependency hell a server farm would absolutely use that. Pretty much every premade template on Runpod uses a docker image the creator made themselves.

Anonymous
03/17/26(Tue)08:31:56 No.108391423

Anonymous 03/17/26(Tue)08:31:56 No.108391423

>>108391279
tl;dr Europeans are only going to be relevant in AI as customers.

Anonymous
03/17/26(Tue)08:33:17 No.108391430

Anonymous 03/17/26(Tue)08:33:17 No.108391430

>>108390672
thats the whole gist of every major religion thoughever

Anonymous
03/17/26(Tue)08:34:43 No.108391439

Anonymous 03/17/26(Tue)08:34:43 No.108391439

>>108391336
>>108391423
And they were the only ones willing to make models that aren't safetyslopped to shit
It really is over and local peaked with nemo

Anonymous
03/17/26(Tue)08:34:57 No.108391442

Anonymous 03/17/26(Tue)08:34:57 No.108391442

>>108390702
Perhaps if you weren't an annoying miserable fag All the fucking time shitting up these threads people would not be so hostile to you guys.....

Anonymous
03/17/26(Tue)08:35:25 No.108391445

Anonymous 03/17/26(Tue)08:35:25 No.108391445

Agentic stuff via API

remote and local

It seems as if with each next LLM, the parameter "format" to switch reasoning ON/OFF is different

Also, should I have the reasoning ON or OFF for tool-calling? With enable_thinking: True, it can take agonizingly long for simple tasks

Any thoughts?

Anonymous
03/17/26(Tue)08:37:30 No.108391453

Anonymous 03/17/26(Tue)08:37:30 No.108391453

>>108391445
Any decent model that supports tool calling shouldn't need reasoning to work well but I would test to confirm. None of the good coding or general purpose models I use have a reasoning except for one, and given the one that has reasoning, doesn't print five pages worth of reasoning, tokens in order to do something simple, unlike other recent models

Anonymous
03/17/26(Tue)08:37:34 No.108391455

Anonymous 03/17/26(Tue)08:37:34 No.108391455

nvidia has shared what datasets they have used for nemotron, have they done the same for nemo? if yes why doesn't anyone here create a 70B dense model based on those datasets, maybe with some other added ones? It should be trainable with an rtx 6000 pro at fp8 no?

Anonymous
03/17/26(Tue)08:40:24 No.108391467

Anonymous 03/17/26(Tue)08:40:24 No.108391467

>>108391455
>if yes why doesn't anyone here create a 70B dense model based on those datasets,
If you're trying to create one that will appease /lmg/ autists (you all seem to have the bad, rigid-thinking type of autism that makes you think you are smarter than everyone else) then exit exercise in utility because they will never be pleased. That's still wasting time on creative of writing or RP. The companies will never prioritize that, nor should they. They don't even do anything useful with these models. They just ask the same useless questions and then act surprised when it does not read their mind. It's not like fine-tuning a model or even figuring out how to do. It is particularly hard so you would think if they knew better if they would just do it themselves.

Anonymous
03/17/26(Tue)08:41:14 No.108391470

Anonymous 03/17/26(Tue)08:41:14 No.108391470

>>108391439
Mistral models still are among the least safetyslopped official models available. It's just that they don't have a ton of creative pretraining data that they can use anymore, for now. I suspect they explored the synthetic route to compensate for that, looking at how Ministral behaves (when it works), but that didn't work so well.

Anonymous
03/17/26(Tue)08:43:12 No.108391481

Anonymous 03/17/26(Tue)08:43:12 No.108391481

>>108391467
nah people just want an uncensored model which has a better understanding of the world, rules etc. at long context. I mean everyone is still recommending nemo constantly. it would simply be nice to have nemo but smarter

Anonymous
03/17/26(Tue)08:46:15 No.108391494

Anonymous 03/17/26(Tue)08:46:15 No.108391494

File: nemotronbooks.png (241 KB, 981x1597)

241 KB PNG

>>108391455
With Nemo they didn't disclose the content of their datasets, but they definitely used "books" for that; for the more recent fully open source Nemotron models they used exactly "0 Books".
https://torrentfreak.com/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-books/

>‘NVIDIA Contacted Anna’s Archive to Secure Access to Millions of Pirated Books’
>
>NVIDIA executives allegedly authorized the use of millions of pirated books from Anna's Archive to fuel its AI training. In an expanded class-action lawsuit that cites internal NVIDIA documents, several book authors claim that the trillion-dollar company directly reached out to Anna's Archive, seeking high-speed access to the shadow library data.
>
>Chip giant NVIDIA has been one of the main financial beneficiaries in the artificial intelligence boom.
>
>Revenue surged due to high demand for its AI-learning chips and data center services, and the end doesn’t appear to be in sight.
>
>Besides selling the most sought-after hardware, NVIDIA is also developing its own models, including NeMo, Retro-48B, InstructRetro, and Megatron. These are trained using their own hardware and with help from large text libraries, much like other tech giants do. [...]

Anonymous
03/17/26(Tue)08:46:39 No.108391496

Anonymous 03/17/26(Tue)08:46:39 No.108391496

>>108391481
Be the change you want to see then...... you can literally ask llms how to do that right now, rent sone runpod gpus and do it. The companies are not going to do that for you and never will. You will never get a "smartter-nemo" (assuming it doesn't exist anywhere like you guys say). None of people will do that though because then it would deprive you of an excuse spew venom here. Not even at the companies that safety-slop the models. You will bitch at literally everyone else and make it everyone else's problem somehow just because you're a little upset.

Anonymous
03/17/26(Tue)08:54:17 No.108391539

Anonymous 03/17/26(Tue)08:54:17 No.108391539

>>108391496
>None of people will do that though because then it would deprive you of an excuse spew venom here
You're a special kind of stupid if you think that's the reason.

Anonymous
03/17/26(Tue)08:55:49 No.108391548

Anonymous 03/17/26(Tue)08:55:49 No.108391548

>>108391539
what's the reason the, Kruger?

Anonymous
03/17/26(Tue)08:59:37 No.108391566

Anonymous 03/17/26(Tue)08:59:37 No.108391566

>>108391548
For me,
>you can literally ask llms how to do that right now, rent sone runpod gpus and do it
No I can't, in both the financial and capability sense

I promise you literally everyone would welcome a bigger, smarter Nemo, but making one is not something a simple anon can do

Anonymous
03/17/26(Tue)09:00:48 No.108391570

Anonymous 03/17/26(Tue)09:00:48 No.108391570

>>108391566
>not something a simple anon can do
*single anon, meant to say

Anonymous
03/17/26(Tue)09:01:37 No.108391575

Anonymous 03/17/26(Tue)09:01:37 No.108391575

File: 1682015841224104.jpg (51 KB, 896x853)

51 KB JPG

>>108391494
>for the more recent fully open source Nemotron models they used exactly "0 Books".
why?

Anonymous
03/17/26(Tue)09:03:39 No.108391584

Anonymous 03/17/26(Tue)09:03:39 No.108391584

>>108391494
>0 books
the copyright tards must be seething so hard about this lmao

Anonymous
03/17/26(Tue)09:10:00 No.108391618

Anonymous 03/17/26(Tue)09:10:00 No.108391618

>>108391575
Because of lawsuits (some still ongoing) and because their models are (almost) completely open source, so it's not like they can open distribute pirated books from Anna's Archive.

Anonymous
03/17/26(Tue)09:19:01 No.108391654

Anonymous 03/17/26(Tue)09:19:01 No.108391654

File: Confused.png (172 KB, 577x467)

172 KB PNG

so did Deepseek v4 just get forgotten about or what

Anonymous
03/17/26(Tue)09:21:16 No.108391662

Anonymous 03/17/26(Tue)09:21:16 No.108391662

File: sans_is-excited.png (53 KB, 1039x177)

53 KB PNG

>>108391654
Not before Gemma 4.

Anonymous
03/17/26(Tue)09:21:23 No.108391665

Anonymous 03/17/26(Tue)09:21:23 No.108391665

>>108391654
Isn't V4 just in expectation/rumor land?
Or is there some sort of official word about it?

Anonymous
03/17/26(Tue)09:21:24 No.108391666

Anonymous 03/17/26(Tue)09:21:24 No.108391666

I did some RP through the API with mistral 4. Its so bad, damn.
It has no clue about characters. Just wings it with generic slop to hide the missing knowledge.
The old 3.2 24b seems actually REALLY good in comparison. We are definitely regressing.
120b and its worse. Could have been such a nice size.
Even Qwen 30ba3b did better. (still bad, but less bad, it had some grasp of the characters)
So its not just the moe tax. So tiring...

Anonymous
03/17/26(Tue)09:22:07 No.108391671

Anonymous 03/17/26(Tue)09:22:07 No.108391671

>>108391666
MoE models were a mistake

Anonymous
03/17/26(Tue)09:22:24 No.108391672

Anonymous 03/17/26(Tue)09:22:24 No.108391672

>>108391584
haha, yeah. w-we won right bros? we sure showed the copyright tards kek!

Anonymous
03/17/26(Tue)09:24:59 No.108391688

Anonymous 03/17/26(Tue)09:24:59 No.108391688

>>108391666
>The mansion is eerily silent, save for the occasional groan of ancient timbers settling. Distant candlelight flickers against the walls, casting long, wavering shadows that seem to retreat just as you pass them. A moth drifts lazily near one flickering sconce, its wings briefly illuminating the portraits lining the hall—each face frozen in expressions of arrogance or sorrow.
>From deeper within the mansion, a faint jingle of keys drifts down another corridor, followed by the soft rustle of fabric. Roswaal’s study door stands slightly ajar, a sliver of golden lamplight spilling onto the floorboards, along with the faint aroma of what might be...spiced wine? A woman’s laughter—light and teasing—echoes from an unknown room, quickly muffled as if by a hand over a mouth.
>Somewhere above, a floorboard creaks, though no one is in sight. The air thickens with the scent of lavender and something metallic—blood? No, just the distant tang of iron from the mansion’s old heating system.
>A draft slithers down the hall, ruffling the hem of a tapestry depicting a wolf howling at a blood-red moon. The wolf’s eyes seem to follow your movement.
Might share a little bit.
I'm not sure what to call it, there probably is a word for it. But I'm overloaded with background stuff going on.
Its like somebody took R1/V3 and put it on steroids. So much noise thats not relevant or immersive at all.

Anonymous
03/17/26(Tue)09:28:47 No.108391713

Anonymous 03/17/26(Tue)09:28:47 No.108391713

>>108391666
>Even Qwen 30ba3b did better.
Losing to Qwen on knowledge is a new low for the French.

Anonymous
03/17/26(Tue)09:29:43 No.108391720

Anonymous 03/17/26(Tue)09:29:43 No.108391720

it's so sad, mistral is dead. who's going to save local now?

Anonymous
03/17/26(Tue)09:29:55 No.108391723

Anonymous 03/17/26(Tue)09:29:55 No.108391723

>>108391666
>advertised use case: coding and agent

Anonymous
03/17/26(Tue)09:33:24 No.108391738

Anonymous 03/17/26(Tue)09:33:24 No.108391738

>>108391665
According news outlets it is rumored that is has been officially confirmed by people claiming to be in the know that deepseek might be planing to release their model sometime in the next two weeks.

Anonymous
03/17/26(Tue)09:37:38 No.108391758

Anonymous 03/17/26(Tue)09:37:38 No.108391758

>>108391666
Just RAG the knowledge, bro

Anonymous
03/17/26(Tue)09:41:12 No.108391785

Anonymous 03/17/26(Tue)09:41:12 No.108391785

>>108391720
Its not looking too god.
Google was sued too because of gemma and copyrighted texts right?
Nvidia with all their synth releases. Can't believe they didn't hide the dataset, its so bad.
GLM/Kimi if you have the horsepower, but those are getting worse too.
Everybody goes full agentic/coding. I was hoping for a saudi prince to rescuce us all but they are getting bombed.

Anonymous
03/17/26(Tue)09:44:01 No.108391800

Anonymous 03/17/26(Tue)09:44:01 No.108391800

>>108391785
I called it when i said that the last good rp models we will get are mistral small 3, og nemo and glm air.

Anonymous
03/17/26(Tue)09:46:38 No.108391814

Anonymous 03/17/26(Tue)09:46:38 No.108391814

>>108391758
exact

Anonymous
03/17/26(Tue)09:46:57 No.108391816

Anonymous 03/17/26(Tue)09:46:57 No.108391816

>>108391758
RAG the sex

Anonymous
03/17/26(Tue)09:49:52 No.108391832

Anonymous 03/17/26(Tue)09:49:52 No.108391832

>>108390847
I thought you were referring to Mistral large.

Anonymous
03/17/26(Tue)09:52:39 No.108391845

Anonymous 03/17/26(Tue)09:52:39 No.108391845

>>108391279
You are getting EU bureaucracy'd
https://www.medialaws.eu/eu-ai-obligations-for-gpai-providers-compliance-enforcement-deadlines-2025-2027/
It became law since August 2025, but compliance is still in a "grace period" and the real deadline before enforcement is August 2026. Notice how no lab has disclosed jackshit about their releases since last August with 0 repercussions
More importantly, all the major labs have already made it clear they have no intention of complying with the law as-is, which means there is a very high chance the enforcement date will be delayed again until lobbyists do their thing

Anonymous
03/17/26(Tue)09:59:09 No.108391884

Anonymous 03/17/26(Tue)09:59:09 No.108391884

>>108391845
>Notice how no lab has disclosed jackshit about their releases since last August with 0 repercussions
I mean, pretty sure it was related to how ministrals were made though, as distills from small 3 which was from before the deadline, and afaik they need to disclose stuff to the EU committee thing, not like general public

Anonymous
03/17/26(Tue)10:05:08 No.108391919

Anonymous 03/17/26(Tue)10:05:08 No.108391919

>>108391845
They have a page intended to show how compliant they are.
https://legal.mistral.ai/ai-governance/models
>Welcome to Mistral AI's central hub for documentation and resources relating to the AI Act and other applicable AI Regulations.

Anonymous
03/17/26(Tue)10:08:56 No.108391945

Anonymous 03/17/26(Tue)10:08:56 No.108391945

this is a cool paper methinks
https://arxiv.org/pdf/2603.14315

Anonymous
03/17/26(Tue)10:08:58 No.108391946

Anonymous 03/17/26(Tue)10:08:58 No.108391946

File: file.png (64 KB, 859x434)

64 KB PNG

>>108391919
thanks validates what I >>108391884 said

Anonymous
03/17/26(Tue)10:10:31 No.108391956

Anonymous 03/17/26(Tue)10:10:31 No.108391956

>>108391946
Now try models released after August 2025.

Anonymous
03/17/26(Tue)10:14:35 No.108391975

Anonymous 03/17/26(Tue)10:14:35 No.108391975

>>108390346
Is there any actual reason or it's just schizo crusades? Unsloth does help me a lot in making finetuning simple.

Anonymous
03/17/26(Tue)10:15:26 No.108391985

Anonymous 03/17/26(Tue)10:15:26 No.108391985

>>108391720
GLM 5 Air

Anonymous
03/17/26(Tue)10:15:48 No.108391988

Anonymous 03/17/26(Tue)10:15:48 No.108391988

>>108391985
I can't breathe

Anonymous
03/17/26(Tue)10:16:23 No.108391991

Anonymous 03/17/26(Tue)10:16:23 No.108391991

>>108391975
They fuck up often and seem genuinely incompetent even when people try to explain stuff to them.
They are quick to make goofs though.

Anonymous
03/17/26(Tue)10:16:54 No.108391992

Anonymous 03/17/26(Tue)10:16:54 No.108391992

File: dipsyByzantine1.png (3.44 MB, 1024x1536)

3.44 MB PNG

>>108391738

Anonymous
03/17/26(Tue)10:18:22 No.108392001

Anonymous 03/17/26(Tue)10:18:22 No.108392001

>>108391956
Actual general-purpose base models released after that date listed there are Mistral Small 4 and Mistral Large 3, by the way. They don't seem to be considering finetunes (or distillations) of older models as new models, but it's obvious it's a trick. They're trying to buy time for those in the hope regulations will change, but they're already complying to EU laws for completely new models.

Anonymous
03/17/26(Tue)10:27:52 No.108392037

Anonymous 03/17/26(Tue)10:27:52 No.108392037

>>108391884
https://artificialintelligenceact.eu/article/53/
The AI act requires labs to
>(d) draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office.
https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models
And here is the template in question, which requires publicly disclosing
>(ii) nature of the content (e.g. personal data, copyright protected content, machine generated data such as Internet of Things or synthetic data
No such public summaries exists yet, despite the law theoretically applying since August 2025, because there is no enforcement mechanism in place yet and nobody cares to comply until then
In Mistral's case specifically, the closest thing they have to the EU-mandated public summaries is their "technical documentation"
https://legal.cms.mistral.ai/assets/d0b7b04d-dcb5-412d-bb45-c63b1475b805
Which largely ignores the above template, avoids disclosing any specific dataset, and completely handwaves the copyright question with a
>In particular, the Mistral Small 4 training dataset comprises a mixture of publicly available datasets and internet sources, private non publicly-available datasets licensed or otherwise obtained from third parties or partners; synthetic datasets; and Mistral AI user data used in accordance with Mistral AI’s terms of service. The datasets used by Mistral AI to train Mistral Small 4 may contain content that is subject to intellectual property rights or in the public domain. For the avoidance of doubt, the specific status of each dataset depends on a variety of factors such as applicable laws, commercial licenses, or the type and characteristics of data.

Anonymous
03/17/26(Tue)10:31:33 No.108392055

Anonymous 03/17/26(Tue)10:31:33 No.108392055

Do i get this right: lm studio can use TTS models, but has no built in function to read out what an LLM in it has written? You always have to do it over some API instead. That sounds kinda dumb.

Anonymous
03/17/26(Tue)10:34:37 No.108392077

Anonymous 03/17/26(Tue)10:34:37 No.108392077

just did some tests and mistral 4 has less trivia knowledge than qwen 2.5 32B. What are the french doing?

Anonymous
03/17/26(Tue)10:35:13 No.108392078

Anonymous 03/17/26(Tue)10:35:13 No.108392078

>>108392077
how safe is it though? does it output less harmful content?

Anonymous
03/17/26(Tue)10:36:35 No.108392093

Anonymous 03/17/26(Tue)10:36:35 No.108392093

>>108392077
EU love <3

Anonymous
03/17/26(Tue)10:36:43 No.108392095

Anonymous 03/17/26(Tue)10:36:43 No.108392095

>>108392078
Haven't tested that, but it's hela fucking slopped, somehow worse than qwen 3 + gemma 3 combined.

Anonymous
03/17/26(Tue)10:43:40 No.108392135

Anonymous 03/17/26(Tue)10:43:40 No.108392135

>>108392095
It wouldn't surprise me if they asked NVidia for pretraining dataset help this time around, which would explain the lack of knowledge. Perhaps under the hood this is a model composed of mostly fully open-source datasets published on HuggingFace.

https://mistral.ai/news/mistral-ai-and-nvidia-partner-to-accelerate-open-frontier-models
>Our collaboration with NVIDIA and other coalition members reflects a shared commitment to:
>
> Transparency: Open-sourcing models, data, and frameworks for global access.
> Collaboration: Fostering a community where innovation is collective, not siloed.
> Impact: Enabling developers to build the next wave of AI applications on a robust, open foundation.

Anonymous
03/17/26(Tue)10:45:11 No.108392145

Anonymous 03/17/26(Tue)10:45:11 No.108392145

>>108392135
So it's safe to assume that all subsequent models will also be shit from now on?

Anonymous
03/17/26(Tue)10:45:48 No.108392152

Anonymous 03/17/26(Tue)10:45:48 No.108392152

>>108392145
Stop dooming you insufferable schizo.

Anonymous
03/17/26(Tue)10:46:43 No.108392156

Anonymous 03/17/26(Tue)10:46:43 No.108392156

>>108392152
It's called being realistic.

Anonymous
03/17/26(Tue)10:48:36 No.108392161

Anonymous 03/17/26(Tue)10:48:36 No.108392161

>>108392156
autistic*

Anonymous
03/17/26(Tue)10:52:56 No.108392175

Anonymous 03/17/26(Tue)10:52:56 No.108392175

>>108392037
So what I'm getting is that either Mistral wants the good boy points and they're trying to get their shit in order even before the law comes into effect, or they don't know what they're doing and they've been spending the past year trying to copy the chinks' homework with worse data.

Anonymous
03/17/26(Tue)10:53:37 No.108392180

Anonymous 03/17/26(Tue)10:53:37 No.108392180

>>108392135
>a groundbreaking global initiative uniting leading AI labs to advance open, frontier-level foundation models
meanwhile the models are literally useless. am I missing the point here? what can nvidia's latest aborted fetus be reliably used for?

Anonymous
03/17/26(Tue)10:54:51 No.108392188

Anonymous 03/17/26(Tue)10:54:51 No.108392188

>>108392180
Benchmarks

Anonymous
03/17/26(Tue)10:55:56 No.108392195

Anonymous 03/17/26(Tue)10:55:56 No.108392195

>>108392180
agents and coding saar

Anonymous
03/17/26(Tue)11:00:29 No.108392226

Anonymous 03/17/26(Tue)11:00:29 No.108392226

All right. Reporting. Tried unsloth-Mistral-Small-4-119B-2603-MXFP4_MOE. Its not good for RP. I'm reverting to Devstral 2. That would be all.

Anonymous
03/17/26(Tue)11:01:36 No.108392231

Anonymous 03/17/26(Tue)11:01:36 No.108392231

>>108392226
kekekekek

Anonymous
03/17/26(Tue)11:01:48 No.108392233

Anonymous 03/17/26(Tue)11:01:48 No.108392233

>>108392156
>It's called being autistic.

Anonymous
03/17/26(Tue)11:02:38 No.108392236

Anonymous 03/17/26(Tue)11:02:38 No.108392236

>>108392226
My guess is that you are simply wrong about Mistral 4.

Anonymous
03/17/26(Tue)11:04:40 No.108392243

Anonymous 03/17/26(Tue)11:04:40 No.108392243

>>108392236
>My guess is that you are simply wrong about Mistral 4.
guess based on what?
NTA, but I also tried smol 4 at q8 and it was trash that couldn't walk and chew gum at the same time

Anonymous
03/17/26(Tue)11:05:03 No.108392246

Anonymous 03/17/26(Tue)11:05:03 No.108392246

>>108392226
> unsloth
what's wrong with you

Anonymous
03/17/26(Tue)11:06:59 No.108392251

Anonymous 03/17/26(Tue)11:06:59 No.108392251

>>108392175
Logically, you'd think Mistral would never be in danger due to being the EU's only AI champion and they only have to make a token attempt at compliance while the regulations serve to shut out their competitors
But the EU has shown time and time again that they are more than happy to gut their own industries in exchange for being able to fine the US giants
Mistral is probably just as in the dark as anyone else and trying to comply in whatever way they think will be reasonable enough to keep the commission off of their backs

Anonymous
03/17/26(Tue)11:07:28 No.108392256

Anonymous 03/17/26(Tue)11:07:28 No.108392256

>>108392243
>guess based on what?
>>108390876

Anonymous
03/17/26(Tue)11:08:50 No.108392261

Anonymous 03/17/26(Tue)11:08:50 No.108392261

>>108392256
>>108392236
Could still be a broken implementation. It is written by mistral, but they could easily have pushed the PR without even verifying it produces the same result.

Anonymous
03/17/26(Tue)11:11:08 No.108392273

Anonymous 03/17/26(Tue)11:11:08 No.108392273

>>108392256
"Past performance is not a guarantee of future results"
but at least you're going on more than just vibes
>>108392261
>Could still be a broken implementation
I guess we can hope. If what I ran on my rig is indicative then things are looking grim

Anonymous
03/17/26(Tue)11:12:19 No.108392280

Anonymous 03/17/26(Tue)11:12:19 No.108392280

API version is shitty too though

Anonymous
03/17/26(Tue)11:14:21 No.108392290

Anonymous 03/17/26(Tue)11:14:21 No.108392290

>>108392251
The other possibility is that they're overeager to comply specifically so they can keep their "champion" status
Basically accepting they can't compete with the US/China and instead just pandering to the local bureaucrats so they can keep getting gibs, model quality be damned

Anonymous
03/17/26(Tue)11:14:54 No.108392296

Anonymous 03/17/26(Tue)11:14:54 No.108392296

>>108392077
EU regulations. They don't have any training data and they can't use too much compute. The EU basically kneecapped AI development.

Anonymous
03/17/26(Tue)11:17:00 No.108392305

Anonymous 03/17/26(Tue)11:17:00 No.108392305

>>108392296
>they can't use too much compute
They can but at that point they are subject to disclosures.

Anonymous
03/17/26(Tue)11:17:54 No.108392310

Anonymous 03/17/26(Tue)11:17:54 No.108392310

>>108392296
Except if >>108392037 is any indication, Small 4 does not comply with regulations

Anonymous
03/17/26(Tue)11:20:55 No.108392325

Anonymous 03/17/26(Tue)11:20:55 No.108392325

File: 1745264074130257.png (1.27 MB, 1063x997)

1.27 MB PNG

>>108392077
>Muh niche trivia

Not trying to make excuses for companies, but use case for that?

Anonymous
03/17/26(Tue)11:22:42 No.108392331

Anonymous 03/17/26(Tue)11:22:42 No.108392331

>>108392325
for me, spreading the good word about the model being good for coom
seriously though, who the fuck will deploy this and why?

Anonymous
03/17/26(Tue)11:23:50 No.108392335

Anonymous 03/17/26(Tue)11:23:50 No.108392335

>>108392325
GLM 4.7 knows what /lmg/ is.
Qwen 3.5 doesn't.

GLM 4.7 is better at programming.

Anonymous
03/17/26(Tue)11:25:22 No.108392341

Anonymous 03/17/26(Tue)11:25:22 No.108392341

>>108392335
coronation causation dear sir

Anonymous
03/17/26(Tue)11:26:55 No.108392352

Anonymous 03/17/26(Tue)11:26:55 No.108392352

>>108392335
Use case for knowing what /lmg/ is? That wouldn't necessarily lead to better coding ability because most people do hear is speculate about future releases and then bitch about an inherently non-deterministic technology Not exactly what they wanted it to do the first try.

Anonymous
03/17/26(Tue)11:28:43 No.108392363

Anonymous 03/17/26(Tue)11:28:43 No.108392363

>>108390876
>Devstral 2
Using this too. I 100% prefer it for RP over GLM 4.6 when it comes to dialogue and writing, until about 6-8K context where it starts making retarded mistakes and sounding sloppy, where GLM will keep going until 14k or so.

Anonymous
03/17/26(Tue)11:29:58 No.108392367

Anonymous 03/17/26(Tue)11:29:58 No.108392367

>>108392363
which size of devstral?

Anonymous
03/17/26(Tue)11:31:26 No.108392375

Anonymous 03/17/26(Tue)11:31:26 No.108392375

>>108392352
It doesn't seem odd to me at all that varied training data increases performance across all areas
A model trained on github repos and ao3 fanfics where Ron gets knotted by Harry will perform better than a model trained only on github repos.

Anonymous
03/17/26(Tue)11:35:24 No.108392415

Anonymous 03/17/26(Tue)11:35:24 No.108392415

>>108392375
seems very odd to me though

Anonymous
03/17/26(Tue)11:49:22 No.108392515

Anonymous 03/17/26(Tue)11:49:22 No.108392515

stop trying to have sex with code models

Anonymous
03/17/26(Tue)11:50:36 No.108392522

Anonymous 03/17/26(Tue)11:50:36 No.108392522

Code with sex models.

Anonymous
03/17/26(Tue)11:51:33 No.108392527

Anonymous 03/17/26(Tue)11:51:33 No.108392527

File: file.png (116 KB, 1129x288)

116 KB PNG

>>108392515
Code models are sex coded.

Anonymous
03/17/26(Tue)11:52:36 No.108392531

Anonymous 03/17/26(Tue)11:52:36 No.108392531

>>108392296
it doesn't matter because the EU is better than the US so get fucked
> lol healthcare
> lol ICE
> lol unsafe schools
> lol required to drive a car to cross a road with no crossings anywhere

Anonymous
03/17/26(Tue)11:55:07 No.108392550

Anonymous 03/17/26(Tue)11:55:07 No.108392550

>>108392531
Both can be true.

Anonymous
03/17/26(Tue)11:56:58 No.108392562

Anonymous 03/17/26(Tue)11:56:58 No.108392562

>>108392531
>> lol ICE
imagine being so pozzed that you think immigration enforcement is a bad thing

Anonymous
03/17/26(Tue)11:57:29 No.108392570

Anonymous 03/17/26(Tue)11:57:29 No.108392570

>>108392531
I don't care, all I want is a new nemo

Anonymous
03/17/26(Tue)11:58:12 No.108392574

Anonymous 03/17/26(Tue)11:58:12 No.108392574

>>108392367
123b.

Anonymous
03/17/26(Tue)11:59:15 No.108392584

Anonymous 03/17/26(Tue)11:59:15 No.108392584

File: QdErYcdpCfs6dgiwG6xf8.png (452 KB, 3076x2010)

452 KB PNG

oh gawd im benchmaxxing
https://huggingface.co/miromind-ai/MiroThinker-1.7

Anonymous
03/17/26(Tue)11:59:18 No.108392585

Anonymous 03/17/26(Tue)11:59:18 No.108392585

>108392531
Where's /wait/anon ? We need containment

Anonymous
03/17/26(Tue)12:02:23 No.108392610

Anonymous 03/17/26(Tue)12:02:23 No.108392610

>>108392585
We needed containment over a week ago when the openclaw retards started flooding in. It's far too late now.

Anonymous
03/17/26(Tue)12:05:06 No.108392623

Anonymous 03/17/26(Tue)12:05:06 No.108392623

>>108392375
>It doesn't seem odd to me at all that varied training data increases performance across all areas
Diversity in the data set is important but you're misunderstanding how that works, like a lot.... In order for a model to be good at programming it needs to be shown examples of good programming and examples of "conversations" where the assistant helps a user through a problem. Diversity in the data sets isn't important just for diversity sake. You can't just throw random shit into a pot toss it in the microwave and then expect a Michelin star level dish. You need to be intentional about what you incorporate within it. I'm convinced you guys are just hyper fixated on the models shitting out niche information just because you got bullied and to pretending to get matters.

>A model trained on github repos and ao3 fanfics where Ron gets knotted by Harry will perform better than a model trained only on github repos.

Explain to me how someone's shitty fanfic being incorporated into the data set leads to a model being better at programming at less prone to hallucination? You can't because it makes no sense. It would lead to better "generalization" and potentially even the model not being as safety cucked but it will only help in that particular area.

Anonymous
03/17/26(Tue)12:05:47 No.108392628

Anonymous 03/17/26(Tue)12:05:47 No.108392628

https://unsloth.ai/docs/new/studio
guys, retard brothers are at it again

Anonymous
03/17/26(Tue)12:09:07 No.108392645

Anonymous 03/17/26(Tue)12:09:07 No.108392645

>>108392623
>it needs to be shown examples of good programming and examples of "conversations"
Did you really expect me to explain the entire LLM training pipeline just to make a point that diverse data makes the model better even at tasks that are not directly related to the data?

Anonymous
03/17/26(Tue)12:11:01 No.108392657

Anonymous 03/17/26(Tue)12:11:01 No.108392657

can we have a gguf quant cheat sheet in the OP? Speed, quality, this sort of thing. For example I heared that sometimes a larger quant can be faster but it also depends on the type.

Anonymous
03/17/26(Tue)12:12:01 No.108392666

Anonymous 03/17/26(Tue)12:12:01 No.108392666

>>108392657
>I heared

Anonymous
03/17/26(Tue)12:12:17 No.108392672

Anonymous 03/17/26(Tue)12:12:17 No.108392672

>>108392628
>Run GGUF and safetensor models locally on Mac, Windows, Linux.
lmfao.cpp is done for

Anonymous
03/17/26(Tue)12:14:36 No.108392681

Anonymous 03/17/26(Tue)12:14:36 No.108392681

>>108392657
with the exception of iq quants (they run slightly slower when offloaded), it's really simple
if it fits into the bits nicely 2,4,8 bits, they run faster
odd bit quants run slower since their memory access patterns don't align nicely
all quants run faster the smaller they are

Anonymous
03/17/26(Tue)12:14:49 No.108392682

Anonymous 03/17/26(Tue)12:14:49 No.108392682

>>108392628
Imagine how unstable it is

Anonymous
03/17/26(Tue)12:16:03 No.108392687

Anonymous 03/17/26(Tue)12:16:03 No.108392687

File: technologyboard.png (141 KB, 1271x704)

141 KB PNG

i like my models knowing dumb shit about /g/

Anonymous
03/17/26(Tue)12:17:45 No.108392700

Anonymous 03/17/26(Tue)12:17:45 No.108392700

>>108392681
What about NL and TQ?

Anonymous
03/17/26(Tue)12:17:53 No.108392702

Anonymous 03/17/26(Tue)12:17:53 No.108392702

>>108392628
>same one gui

Anonymous
03/17/26(Tue)12:18:35 No.108392706

Anonymous 03/17/26(Tue)12:18:35 No.108392706

>>108392687
I know this is qwen because it has that classic hello fellow kids meme energy.

Anonymous
03/17/26(Tue)12:18:54 No.108392709

Anonymous 03/17/26(Tue)12:18:54 No.108392709

>>108392700
Mental illness, not a real quant.

Anonymous
03/17/26(Tue)12:20:00 No.108392714

Anonymous 03/17/26(Tue)12:20:00 No.108392714

>>108392706
bzzt wrong

Anonymous
03/17/26(Tue)12:20:49 No.108392720

Anonymous 03/17/26(Tue)12:20:49 No.108392720

>>108392687
kek

Anonymous
03/17/26(Tue)12:21:14 No.108392724

Anonymous 03/17/26(Tue)12:21:14 No.108392724

>>108392706
qwen doesn't even know it can see images and pretends it doesn't
this is just kimi with some prompt

Anonymous
03/17/26(Tue)12:22:38 No.108392730

Anonymous 03/17/26(Tue)12:22:38 No.108392730

>>108392706
looks like kimi slop to me

Anonymous
03/17/26(Tue)12:22:44 No.108392731

Anonymous 03/17/26(Tue)12:22:44 No.108392731

>>108392645
If it has diverse programming type sample then it will get better at programming. Yes. Incorporating fan fictions into both the pre-training and SFT phases of training will lead to better generalization (Not being only good at conversing about one domain. Not being too rigid as to what it can and cannot talk about, not being too rigid about how it can speak, Not being too limited on instruction following capability, etc.)

With all that said you keep failing to explain to me why Harry Potter fanfic being in the training directly correlates to better programming ability. If a bunch of the stories have no discussions about programming, how does that lead to the model performing better in a separate domain? A diverse data set for Ben's catastrophic forgetting but it does not necessarily mean a automatically gets better in one domain. The programming portions of the data set have to be high quality for it to be better at that domain. The storytelling / RP portions of the data set need to be high quality (highly suggestive) in order to not be shit. Etc etc. A diverse data set is meaningless if the samples are garbage.

Anonymous
03/17/26(Tue)12:24:11 No.108392739

Anonymous 03/17/26(Tue)12:24:11 No.108392739

>>108392724
yeah it's just kimi with a prompt that tells me to give me an uncensored description of the image using casual language/slang.

Anonymous
03/17/26(Tue)12:25:50 No.108392747

Anonymous 03/17/26(Tue)12:25:50 No.108392747

>>108392731
>Incorporating fan fictions into both the pre-training and SFT phases of training will lead to better generalization
Glad we agree.

Anonymous
03/17/26(Tue)12:27:30 No.108392758

Anonymous 03/17/26(Tue)12:27:30 No.108392758

>>108392724
Good to know that I don't have to bother with kimi then. GLM is a lot better at pretending to be an anon without sounding like a parody and mixing in zoomer language.

Anonymous
03/17/26(Tue)12:27:46 No.108392759

Anonymous 03/17/26(Tue)12:27:46 No.108392759

>>108392584
So this is the power of a modern day 235B dense model. Honestly, I'm not surprised.
Looks brilliant, I can't wait to see how badly it destroys MoE shit in actual comparisons.

Anonymous
03/17/26(Tue)12:28:20 No.108392761

Anonymous 03/17/26(Tue)12:28:20 No.108392761

>>108392758
it can't see your cock tho
massive disadvantage

Anonymous
03/17/26(Tue)12:34:24 No.108392798

Anonymous 03/17/26(Tue)12:34:24 No.108392798

>>108392747
Better generalization does not automatically mean increased quality or performance in a particular domain. I can learn three different sports with enough practice but if one of my trainers is shit but the other two are world-class, I'm going to be worse at whatever sport the shittytrainer is trying to help me in. Does that analogy make sense? Garbage in. ---> garbage out.

Anonymous
03/17/26(Tue)12:34:50 No.108392801

Anonymous 03/17/26(Tue)12:34:50 No.108392801

>>108392759
> "architectures": [
> "Qwen3MoeForCausalLM"
> ],

Anonymous
03/17/26(Tue)12:37:07 No.108392810

Anonymous 03/17/26(Tue)12:37:07 No.108392810

>>108392759
anon... that's a qwen 3 235b-a22b finetune

Anonymous
03/17/26(Tue)12:37:50 No.108392816

Anonymous 03/17/26(Tue)12:37:50 No.108392816

File: 1764446103940328.jpg (90 KB, 1242x848)

90 KB JPG

>>108389142

Anonymous
03/17/26(Tue)12:41:11 No.108392830

Anonymous 03/17/26(Tue)12:41:11 No.108392830

>>108392798
No the analogy doesn't make sense because the quality of the data is irrelevant when the comparison is between two model trained on the same data but one is also trained on smut.

Anonymous
03/17/26(Tue)12:42:19 No.108392840

Anonymous 03/17/26(Tue)12:42:19 No.108392840

File: ProjAni.webm (2.2 MB, 1280x720)

2.2 MB WEBM

Has anyone here worked with voice2animation local models? I'm having issues with performance for my project. Running LLM, TTS, V2A, and lip syncing models all at the same time with low latency as a goal is proving to be extremely difficult. Even giving each program their own CPU threads to minimize CPU contention and or having the some of the programs run with a convoluted sequencing system isn't really working.

Very unhappy with PantoMatrix EMAGE right now. It's a two year old model and the BEAT2 dataset it's trained on is derived from public speeches (think Ted Talks) so the gesticulation output looks pretty unnatural for natural conversation. Problem is there are no good alternatives. The only thing that might look like a decent option is Meta's SARAH, but they haven't released any models yet--just the training dataset.

https://files.catbox.moe/ng51nv.webm

Anonymous
03/17/26(Tue)12:46:01 No.108392860

Anonymous 03/17/26(Tue)12:46:01 No.108392860

>>108392840
there's a reason why people who are making the ai gooner tubes are making six figures a year and work for companies that raise millions and millions of dollars from investors

Anonymous
03/17/26(Tue)12:46:47 No.108392864

Anonymous 03/17/26(Tue)12:46:47 No.108392864

based thread. mikulosers want to fuck their sisters

Anonymous
03/17/26(Tue)12:47:37 No.108392868

Anonymous 03/17/26(Tue)12:47:37 No.108392868

File: 2089.png (81 KB, 742x522)

81 KB PNG

>>108392724
>qwen doesn't even know it can see images and pretends it doesn't
mine seems fine with them

Anonymous
03/17/26(Tue)12:51:19 No.108392881

Anonymous 03/17/26(Tue)12:51:19 No.108392881

>>108392868
>>108383821

Anonymous
03/17/26(Tue)12:51:25 No.108392882

Anonymous 03/17/26(Tue)12:51:25 No.108392882

>>108392868
this anon's had a problem however >>108383821

Anonymous
03/17/26(Tue)12:52:11 No.108392884

Anonymous 03/17/26(Tue)12:52:11 No.108392884

>>108392840
Absolutely unrelated. But just like I found wav2arkit, I also randomly found this:
https://huggingface.co/zeropointnine/yamnet-onnx
It categorizes sound events. Maybe you'd like to integrate it to have your ani react to random audio from your mic.

Anonymous
03/17/26(Tue)12:52:24 No.108392885

Anonymous 03/17/26(Tue)12:52:24 No.108392885

>>108392816
Sex.

Anonymous
03/17/26(Tue)12:53:04 No.108392895

Anonymous 03/17/26(Tue)12:53:04 No.108392895

>>108392830
So you're telling me that the data being shit leading to the output being shit makes no sense to you? It's complaints I always hear both here And even other places is that a lot of models sound too flowery, corporate, slopish, riddled with "gpt-isms", etc. That's largely because the companies who implement the data sets for the training choose to sterilize the data sets of anything "problematic" or anything that could get them in legal trouble with copyright trolls. And this very thread someone even pointed out that Nvidia not (publicly) incorporating any books in the training was likely the reason that family of models sucks now.

>>108391618
>>108391575
>>108391494
>>108391455

The data set quality has a very very large effect on the quality of the output data. I get you'll have a hyper fixation on smut generation and don't care about any other use case. That's fine. I don't really care for smut generation that much. But there is a fine line between not caring about a certain domain and flat out putting out misinformation to sooth your own favor or turbo autism (and not even the good kind where it at least makes you good at a particular thing. The stubborn, annoying kind)

Anonymous
03/17/26(Tue)12:54:48 No.108392904

Anonymous 03/17/26(Tue)12:54:48 No.108392904

A model that knows more things is better than a model that knows fewer things.

Anonymous
03/17/26(Tue)12:55:42 No.108392911

Anonymous 03/17/26(Tue)12:55:42 No.108392911

>>108392801
>>108392810
Oof

Anonymous
03/17/26(Tue)12:56:16 No.108392916

Anonymous 03/17/26(Tue)12:56:16 No.108392916

>>108392895
>sooth your own favor or turbo autism
sir pls

Anonymous
03/17/26(Tue)12:57:36 No.108392923

Anonymous 03/17/26(Tue)12:57:36 No.108392923

>>108392860
I've been doing some more reverse-engineering behind Animation.inc's process (they made Grok Companions and Razer's Ava) and my understanding at this point is that their "voice2animation" system doesn't actually generate locomotive frames (6D for each bone--extremely taxing on hardware) from speech directly. I think what they do is they have a have a complex pre-rendered BVH mocap library and their AI model simply manages blending and cross-fading between those premade animations in accordance to voice analysis. This seems a lot more computationally lightweight in theory, but it also sounds extremely complex to manage/set up and there are no open-source implementations from what I've seen.

Anonymous
03/17/26(Tue)12:58:32 No.108392927

Anonymous 03/17/26(Tue)12:58:32 No.108392927

>>108392904
Better is subjective if you're not using a specific metric to define "better". Better at what? Coding? Coom? Drafting up new cooking recipes? If you want it to be good at all of that it has to have good examples of all of that

Anonymous
03/17/26(Tue)12:59:40 No.108392936

Anonymous 03/17/26(Tue)12:59:40 No.108392936

>>108392895
I made a very general point about more diverse data being better and you barged into the conversation with
>b-but what if the data is bad
>b-but you need to have instruct data too

Completely useless fucking comments.

Anonymous
03/17/26(Tue)13:02:43 No.108392954

Anonymous 03/17/26(Tue)13:02:43 No.108392954

>>108392927
>Better at what?
Everything that their training data allows. Data diversity helps.
>If you want it to be good at all of that it has to have good examples of all of that
I don't want them to be good. I want them to be fun and interesting.
Knowing more is better than knowing less.

Anonymous
03/17/26(Tue)13:03:16 No.108392957

Anonymous 03/17/26(Tue)13:03:16 No.108392957

File: 1735070906740.png (11 KB, 688x290)

11 KB PNG

>PocketTTS.cpp
14.24s audio in 3.67s; first chunk latency: 98ms
The CPU is i7-11700

Anonymous
03/17/26(Tue)13:03:17 No.108392958

Anonymous 03/17/26(Tue)13:03:17 No.108392958

File: ye.png (45 KB, 646x578)

45 KB PNG

>>108392884
Not really useful for my project since it's just a sound classifier (laughter, glass breaking, keyboard typing, etc) but it's somewhat interesting regardless.

I haven't even integrated speech-to-text to my project yet because I'm already pushing against my hardware's limits as is, unfortunately. picrel is the ideal system architecture I'm going for at the moment..

Anonymous
03/17/26(Tue)13:04:40 No.108392969

Anonymous 03/17/26(Tue)13:04:40 No.108392969

Someone here >>108392375 said very data leads to better performance. I simply said they were caveats to that. You proceeded to incorrectly claim that data quality is irrelevant here >>108392830 like a bumbling buffoon who, like a lot of LLMS ironically, is confidently wrong. If you want to continue to not use your own fucking head more power to you. Garbage in, garbage out. This was well established well before LLMs were even popular. A diverse data set leads to better generalization but generalization and output quality are not the exact same things.

Anonymous
03/17/26(Tue)13:05:01 No.108392973

Anonymous 03/17/26(Tue)13:05:01 No.108392973

>>108392957
Cool. Thanks for the profile report. Is it working well for you?
(still have some potential performance optimizations in the works for that btw)

Anonymous
03/17/26(Tue)13:07:02 No.108392983

Anonymous 03/17/26(Tue)13:07:02 No.108392983

File: 1761689373233193.png (515 KB, 1024x1024)

515 KB PNG

Anonymous
03/17/26(Tue)13:07:29 No.108392987

Anonymous 03/17/26(Tue)13:07:29 No.108392987

>>108392969
nta and imo but I think applies to a few here, probably would prefer a somewhat mediocre true generalist to a good coder that does only that

Anonymous
03/17/26(Tue)13:10:23 No.108393000

Anonymous 03/17/26(Tue)13:10:23 No.108393000

>>108392958
>Not really useful for my project
It can greet you when you open the door to your office, react to your microwave dinging, make fun of you when you drop something.
>I haven't even integrated speech-to-text to my project yet because I'm already pushing against my hardware's limits as is
tts takes very little. I suppose it's the stuff in the middle that takes the most. I don't remember if you tried piper, but that one is lightning fast (no streaming, but you can split by sentences or something. A single sentence takes less than a second on old cpu).

Anonymous
03/17/26(Tue)13:11:46 No.108393005

Anonymous 03/17/26(Tue)13:11:46 No.108393005

>>108393004
>>108393004
>>108393004
>>108393004
>>108393004

Anonymous
03/17/26(Tue)13:13:39 No.108393024

Anonymous 03/17/26(Tue)13:13:39 No.108393024

>>108392969
You must be the guy from last thread that claimed that incest (smut in training data) is bad because the guy probably can't fuck anyone else (the data might be bad quality).

Anonymous
03/17/26(Tue)13:15:05 No.108393033

Anonymous 03/17/26(Tue)13:15:05 No.108393033

>>108392973
So far so good, thanks for adding Windows support. If it can be even faster I'm all for that, have much older junky Intels I could be running a good tts on.

Anonymous
03/17/26(Tue)13:15:28 No.108393037

Anonymous 03/17/26(Tue)13:15:28 No.108393037

File: 1765406999724534.jpg (13 KB, 500x394)

13 KB JPG

>>108393024
Glad to see you've been proven wrong so you resort to "you're this anon I don't like" fuckery. You are misguided, wrong, stubborn and stupid and you know it.

Anonymous
03/17/26(Tue)13:15:48 No.108393040

Anonymous 03/17/26(Tue)13:15:48 No.108393040

At what point are we going to ignore the trolls and start making normal threads again?

Anonymous
03/17/26(Tue)13:16:42 No.108393049

Anonymous 03/17/26(Tue)13:16:42 No.108393049

>>108393040
That's what's happening though? Mike trolls are being ignored.

Anonymous
03/17/26(Tue)13:18:28 No.108393063

Anonymous 03/17/26(Tue)13:18:28 No.108393063

>>108393000
>It can greet you when you open the door to your office, react to your microwave dinging, make fun of you when you drop something.
Fair point. I added it to my notes.

>tts takes very little.
Ehh. I wouldn't go that far. It's definitely not the bottleneck. The benefits certainly outweigh the costs. I tried Piper initially, but I found the voice quality and latency to be pretty bad and it doesn't support voice cloning. One of the main issues with Piper was the lack of FFI support, so the only way to get fast performance was to use an HTTP server. Using a webserver to spawn the process manually for each LLM chunk request was awful. Overall I'm really happy with my Pocket TTS implementation. EMAGE and wav2arkit are what is raping me right now.

On a separate note. I probably could actually integrate STT without performance worries because it's totally separated from the usual inferencing cost that happens after LLM output, since it happens BEFORE LLM inferencing. Hopefully that makes sense.

Anonymous
03/17/26(Tue)13:18:33 No.108393065

Anonymous 03/17/26(Tue)13:18:33 No.108393065

>>108393037
I am just pointing out that what you're doing is similarly retarded to what he was doing.

Anonymous
03/17/26(Tue)13:25:16 No.108393106

Anonymous 03/17/26(Tue)13:25:16 No.108393106

>>108393065
That's not at all what I'm doing. Nowhere did I imply having smut in the dataset is bad, or having x type of data in the dataset is bad. I'm saying QUALITY matters. You have no business calling anyone retarded when you starlight up said data quality is irrelevant >>108392830

Anonymous
03/17/26(Tue)13:25:28 No.108393109

Anonymous 03/17/26(Tue)13:25:28 No.108393109

>>108393040
He hasn't been very consistent so I have a feeling he will give up eventually.

Anonymous
03/17/26(Tue)13:25:31 No.108393110

Anonymous 03/17/26(Tue)13:25:31 No.108393110

>>108393033
>So far so good, thanks for adding Windows support.
No problem. Good to hear.

Would anyone be interested in my EMAGE onnx export script btw? For some reason nobody has ever done this before, which seems insane to me, so I built it myself. I could set up a repo for that within the next couple hours. I'd really like to see more anons in general play around with the LLM -> 3D character animation pipeline. I thought you guys wanted your own waifus, kek.

Anonymous
03/17/26(Tue)13:26:49 No.108393115

Anonymous 03/17/26(Tue)13:26:49 No.108393115

>>108393063
>FFI
Why would you need that? Just load the onnx models yourself and run them like you do with the rest of your models. But yeah. If you need cloning, it's not gonna help you.
I managed to run wav2arkit faster than realtime with a little demo thing. But I was just running tts and wav2arkit, without all the other overhead you have. All those little things add up.
>I probably could actually integrate STT without performance worries
It depends on if you have it running all the time or start it with a button or something. silero has a few small models for voice detection that you can let run continuously for auto-detection, but it will add another drop of overhead to everything else.

Anonymous
03/17/26(Tue)13:26:59 No.108393117

Anonymous 03/17/26(Tue)13:26:59 No.108393117

>>108393106
If the quality of the data is bad then both models will be bad but the one with smut will still be better because it also knows what knotting is.

Anonymous
03/17/26(Tue)13:28:53 No.108393138

Anonymous 03/17/26(Tue)13:28:53 No.108393138

>>108393117
how does that help me vibecode lamo.cpp prs though?

Anonymous
03/17/26(Tue)13:31:29 No.108393155

Anonymous 03/17/26(Tue)13:31:29 No.108393155

>>108393117
Why does knowing what knotting is correlate with programming ability (or any other domain that has nothing to do with knotting)? Are you trying to pretend the concept of catastrophic forgetting doesn't exist? Based on a conversation you likely either don't even know what that is or like pretending it's not as big of an issue as it actually is with training. Like dude I get it you want your models to make you nut And there's nothing wrong with it, but you don't have to be a glue eating retard about it.

Anonymous
03/17/26(Tue)13:32:42 No.108393163

Anonymous 03/17/26(Tue)13:32:42 No.108393163

>>108393110
>EMAGE onnx
What's the inferencing speed?

Anonymous
03/17/26(Tue)13:33:13 No.108393167

Anonymous 03/17/26(Tue)13:33:13 No.108393167

>>108393155
I don't know why it correlates but the examples we have so far show that it does.

Anonymous
03/17/26(Tue)13:35:13 No.108393176

Anonymous 03/17/26(Tue)13:35:13 No.108393176

>>108393167
Like?

Anonymous
03/17/26(Tue)13:35:21 No.108393178

Anonymous 03/17/26(Tue)13:35:21 No.108393178

>>108393110
For what it's worth, I would be interested.

Anonymous
03/17/26(Tue)13:37:13 No.108393191

Anonymous 03/17/26(Tue)13:37:13 No.108393191

>>108393115
>Why would you need that? Just load the onnx models yourself and run them like you do with the rest of your models.
Well for me one of the design constraints is to have everything run in one terminal window (kinda a tism thing desu). So before I was using Deno to spawn the Piper binary every for every text chunk and it was a huge latency bottleneck. That's why FFI is necessary, because it removes the overhead of spawning and prewarming the model on a constant basis.
>I managed to run wav2arkit faster than realtime with a little demo thing.
Yeah as a standalone process it's only 50ms. Quite fast overall, really. But with all of the overhead costs it's taking around 400ms (largely because of EMAGE).
>It depends on if you have it running all the time or start it with a button or something.
I'm thinking I would set it up like a voice messaging type of system. The annoying thing is that without a full-duplex LLM, an LLM can't take in streamed text input from a STT engine, so voice messages is really the best I can do.
>>108393163
Hard to say because of my overhead costs with the full system right now, but it usually hovers around 500-700ms per window (64 frames iirc, aka 2.13 seconds of audio, iirc). But if you look at the video I posted earlier it appears much worse in practice. Not really sure why that is desu.
>>108393178
Cool. I'll work on setting up the repo. Fair warning, the script is vibecoded dogshit right now, but it works fine.... so uh... yeah.

Anonymous
03/17/26(Tue)13:40:09 No.108393219

Anonymous 03/17/26(Tue)13:40:09 No.108393219

>>108393191
>But if you look at the video I posted earlier it appears much worse in practice. Not really sure why that is desu.
Actually this is probably because it has to wait for the LLM to finish a full sentence and the TTS engine to process it before it can even start working.

Anonymous
03/17/26(Tue)13:41:46 No.108393231

Anonymous 03/17/26(Tue)13:41:46 No.108393231

>>108393191
>spawn the Piper binary every for every text chunk
But you already know how to run onnx models. Just load the model and keep it in memory. You don't need piper. You just need the models. Again, doesn't matter if you're not gonna use it, but the whole approach seems wrong.
If I were to do it, I'd just load the model on a forked process/thread and send it text over pipes or something.

Anonymous
03/17/26(Tue)13:46:09 No.108393261

Anonymous 03/17/26(Tue)13:46:09 No.108393261

>>108393231
You're absolutely right!
Nah but seriously though. This was a long time ago before I knew the right approach to take. Piper was my first TTS implementation, then I switched to Kokoro, and then I started using Pocket TTS. All I'm doing is describing why it didn't work for me initially, not why it "couldn't work".

Anonymous
03/17/26(Tue)13:52:47 No.108393292

Anonymous 03/17/26(Tue)13:52:47 No.108393292

>>108393231
If you don't care about voice cloning I wouldn't even use Piper anyways. KittenTTS is waaaayyy faster and has decent (for its size) generic cute anime voices.

Anonymous
03/17/26(Tue)14:14:40 No.108393433

Anonymous 03/17/26(Tue)14:14:40 No.108393433

>>108390876
as a frequent devstral user, I found mistral 4 very, very disappointing. I hope it is a bug because holy kek.

Anonymous
03/17/26(Tue)14:54:27 No.108393708

Anonymous 03/17/26(Tue)14:54:27 No.108393708

File: file.png (279 KB, 570x943)

279 KB PNG

>>108393040
I'm of the opinion that petrus is a paid NovelAI troll, since the threads that are usually trolled are always related to local models: /ldg/, /sdg/, /lmg/, /hdg/. But aicg, dall-e and other cloud threads are never touched.

Anonymous
03/17/26(Tue)14:57:22 No.108393722

Anonymous 03/17/26(Tue)14:57:22 No.108393722

shitzo alert

Anonymous
03/17/26(Tue)15:16:33 No.108393867

Anonymous 03/17/26(Tue)15:16:33 No.108393867

>>108391946
Wait, ministral is just a pruned small? Nothing new added? What the fuck do I want with it then when I can just run small

Anonymous
03/17/26(Tue)15:21:25 No.108393900

Anonymous 03/17/26(Tue)15:21:25 No.108393900

>>108392305
If only Mistral was Italian, they could just lie about the compute

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.