[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1771589822504492.png (398 KB, 1999x1471)
398 KB
398 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108386516


►News
>(03/16) Mistral 4 small releasing: https://huggingface.co/collections/mistralai/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
>>
>>108389153
how do i use ai
>>
I wish I could send a simple "Thank you" to my agent without paying more tokens for it
:(
>>
>>108389153
>{{char}} *screeches** PEEESSSSSSSSSSSS (piss) (I am peeing all over your internet)
Yeah, I'm gonna jack off to this later.
>>
>>108389142
>New Mistral model mixes up who's talking
Reminder that this was practically the only issue with Llama 1 era models (other than context length). Nothing has improved in 3 years. It's completely and utterly over.
>>
Miku fucked my gf without her or my consent while subjecting her to incest porn.
>>
>>108389206
PLEASE take your meds
>>
>>108389223
No I will never forgive Miku for the many times she fucked my gfwife. Or stop being horny about it.
>>
>>108389008
Many women still don't want to deal with the burden of pregnancy and responsibility though
>>
File: 1742840958794481.png (3 KB, 374x25)
3 KB
3 KB PNG
>>
Wow, it got really quiet without something to argue about
>>
>>108389297
Not to worry, this retard >>108389275 is on the case
>>
One "4" down, more to come this week.
>>
>>108389313
I'm not wrong though. I've come across just as many women who want nothing to do with it at the least.
>>
>>108389275
Why should we? I'm so happy my hubby had a vasectomy, imagine wanting a crotch goblin.
>>
File: 1765690836219663.png (1.24 MB, 2063x1296)
1.24 MB
1.24 MB PNG
>>108389355
>vasectomy
>>
>>108389355
I appreciate you adding to my point, but I find it hard to believe that women post here
>>
>>108389142
Mon dieu....incroyable....
>>
File: 1738791920130039.jpg (45 KB, 600x600)
45 KB
45 KB JPG
>>108389375
>hard to believe woman posting in a thread about a woman-coded hobby
>>
>>108389369
>happier than you
>has sex
>has switch
seems like a winner to me
>>
>>108389396
It's not hard to believe that many women are roleplaying with non-local chatbots. It's hard to believe that women are posting in a not very actual thread for local models on /g/ of all places. This is one of if not the least likely places I can think of to have women in it that I've ever been in.
>>
>>108389396
That's /aicg/
>>
>>108389415
*not very active
>>
File: 1750617442478078.webm (1.82 MB, 640x1138)
1.82 MB
1.82 MB WEBM
What's a good small LLM that can run on phones? I just need something that can read long text documents and answer basic questions. Like here's a contract, tell me the duration (12-01-2025 to 06-01-2026)
I tried qwen2.5 0.5B because it's only 400MB but it still fucks up on basic shit like this.
>>
>>108389435
>good small llm
choose two
>>
File: Bladderbench.jpg (34 KB, 1283x212)
34 KB
34 KB JPG
kek
>>
>>108389415
Women probably don't post here. Women (male) probably do post here.
>>
>>108389468
Even that I doubt is unironically happening or if it is it's probably like 1 or 2.
>>
File: 1762964917596277.gif (3.75 MB, 228x228)
3.75 MB
3.75 MB GIF
any way to see full raw text output from silly tavern? I'd like to see the order of system prompt, card prompt, history etc
>>
>>108389516
I don't know how to see the whole assembled prompt, but the ordering of the fields you're asking for appear in the response configuration (if you're using a chat completion endpoint).
You can open the response configuration with the circled button.
>>
>>108389529
fantastic ty
>>
>>108389534
there is also the prompt itemization menu you can access by clicking the three dots on a chat response.
>>
File: date.png (2 KB, 368x44)
2 KB
2 KB PNG
>>108389153
>>
>>108389559
What's wrong with old cards?
>>
>>108389529
>>108389516
You can see everything what retardo tavern sends out in your terminal obviously, including the prompt assembly.
>>
>>108389564
They've hit the wall and are no longer fertile
>>
>>108389564
Why do you think there is something wrong about it? Are you an autist with no casual sensibilities and intellect?
>>
>>108389164
Ask grok
>>
there are more women dating bots than men, you just aren't ready to accept that
>>
>>108389601
finally... i found a woman dominated hobby... all i have to do is lobotomize myself and act like an LLM and i will no longer be a virgin!
>>
>>108389601
But they aren't doing it locally and wasting their time on a /g/ thread for it
>>
>>108389601
100%, my gf has multiple friends doing that, she thinks its a mix of:
1. Full loyalty
2. Always available
3. Always safe
>>
>>108389627
None of those things entirely true
>>
File: 1746955785748249.jpg (167 KB, 1000x1000)
167 KB
167 KB JPG
►Recent Highlights from the Previous Thread: >>108386516

--Mistral-Small-4 release and speculation on future Mistral 4 architecture:
>108386532 >108386550 >108386567 >108388009 >108388025 >108388037 >108388072 >108388051 >108388129 >108388151 >108388183 >108388324 >108387022 >108387033 >108387230
--Mistral Small 4 benchmark performance analysis and critique:
>108386596 >108386614 >108386828 >108386843 >108386615 >108386616 >108386790 >108386619
--Testing Mistral-Small-4 119B's reasoning and cultural awareness:
>108387004 >108387010 >108387018 >108387057 >108387105 >108387175 >108387197 >108387211 >108387578
--Mistral-Small-4-119B-2603-eagle MoE model RAM and quantization requirements:
>108386785 >108386799 >108386945 >108386949 >108386958 >108387005
--Mistral small 4 support merged into llama.cpp:
>108388047
--Unsloth Q8_0 quantization and imatrix impact debate:
>108386681 >108386694 >108386729 >108386770 >108386837 >108386707
--Qwen 3.5 local deployment options and censorship considerations:
>108388706 >108388748 >108388753 >108388842
--Mistral Small 4 cockbench:
>108388050 >108388075 >108388076 >108388143
--Fixed performance comparison chart across internal Mistral models:
>108386860
--Miku (free space):
>108388598

►Recent Highlight Posts from the Previous Thread: >>108386899

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108389635
Fuck you Miku
>>
File: chartshowdotheywork.png (26 KB, 502x965)
26 KB
26 KB PNG
>>108389142
Evokes confidence
>>
>>108389403
>nooo you must conform to MY ideas of happiness else you are deluded
>>
File: 1772809155896568.png (773 KB, 847x847)
773 KB
773 KB PNG
What are the best VLM's to use to generate natural language descriptions of slop for animating? I don't want to have to write up long descriptions by hand. I'm currently using open router. The content to be described is fairly vanilla but rather explicit.
>>
>>108389644
the moe tax is real
>>
>>108389435
why didn't you try 3.5
>>
>>108389711
that's what pewdiepie used
>>
> llama 4 is complete dogshit
> mistral 4 is complete dogshit
> qwen team implodes right after releasing 3.5

why can't AI labs count to 4?
>>
so what's the verdict?
>>
>>108389721
claude 4 was complete dogshit too
as was gpt4
crazy
>>
>>108389721
just wait for deepseek v4
>>
>>108389733
guilty
>>
>>108389733
better than deepseek 4. we win
>>
>>108389700
To be fair, none of the free versions of the big models caught it either. Gemini and Qwen did notice the problem when I asked them to check the specific section again, but ChatGPT was oblivious to it even then. Kimi was apparently just busy, so I couldn't try that.
>>
>>108389741
deepseek v4 has been only two more weeks away for over a year now
>>
>>108389719
why tf would you make a decision on some streamer retard, 3.5 has been out for a few weeks and was demonstrably better than most everything else for its size.
>>
>>108389764
>>108389711
I'm trying 3.5 0.8b right now and it's been thinking for like 4 minutes on a simple prompt.
>>
>>108389772
you can disable thinking and/or give it a reasoning budget.
>>
I know this thread's for local models but I've been trying some dark fantasy RP chat and have been getting censored on every model I try while I'm using an Open Router API. Are all APIs censored to hell these days or just OpenRouter?
>>
>>108389779
well deepseek r1 1.5b works pretty well. unfortunately it's 1.1gb....
>>
>>108389814
>>>/g/aicg
>>
>>108389814
Yes. All the models are censored, but system prompt and asking in specific ways might help.
>>
>check ollama
>4m downloads on deepseek
>the 400gb model
who the fuck is downloading this?
>>
>>108389635
Very short recap. I wonder what happened.
>>
>>108389847
recap will be elongated to 1.3T in two more weeks
>>
>>108389846
>don't to bloat my docker images with model data
>make docker up run hf download every time the container spins up
>containers are constantly spinning up and down
>>
>>108389862
Too long. We would need a recap of the recap.
>>
>>108389142
>mistral 4 has worse benchmarks than qwen3.5
>qwen 3.5 is benchmaxxed as fuck
therefore mistral 4 is… ???
fuck I hate the benchmark niggery so much
>>
>>108389898
Imagine the length of the thread.
>>
what do yall folx use for tts? I've got 7gb spare vram with my llm loaded and I'd like something realistic that can read outputs more or quickly
>>
>>108390056
kokoro-fastapi, my use case is having it read document summaries, articles, etc. though not roleplaying so not having voice cloning isn't an issue for me but I expect most people would prefer to have that here
>>
File: 95767373.png (1.48 MB, 1536x1024)
1.48 MB
1.48 MB PNG
>>108389754
this time its legit though
>>
>>108390104
man I cant wait to use Deepseek V4 9b though ollmao!!!
>>
File: yammy.jpg (187 KB, 832x1216)
187 KB
187 KB JPG
>>
>>108390081
I think kokoro can do voice cloning now
>>
Update from ewaste ddr4 epyc server fag from a few threads back: I threw a 2060 super in to keep up the ewaste theme.
PP went to 20t/s and TG jumped to 10t/s
This is still on qwen 3.5 397b at q4. I tried the new mistral and it was both garbage and only 1 t/s faster for some reason
>>
>>108389403
and of course the guy who literally cut his balls is defending cukoldery, every single time
>>
>>108390129
I couldnt find anything on their hf about it unless they released a new model under another account or something.
>>108390178
Don't interrupt your enemy when he's removing himself from the genepool, go have more children with your wife.
>>
>>108390163
>PP went to 20t/s
I'm retarded. Does this mean a 1200 token context takes a whole minute to process before output tokens start coming out?
>>
>>108390178
NTA but what are you even talking about. How do you know this anon cut his balls?
>>
>>108390211
We reply to all shitposts (especially Twitter screenshot shitposts) as if they are universal reality here, sir.
>>
>>108390211
>How do you know this anon cut his balls?
do you know how to read or something? he said he did a vasectomy >>108389355
https://www.reddit.com/r/ATBGE/comments/p2zc4r/cake_for_a_vasectomy/
>>
>>108390209
Yes.
>>
File: 1745837173806015.png (16 KB, 261x181)
16 KB
16 KB PNG
how do you do the dynamic thinking with reasoning_effort=high?

I tried passing it as chat_template_kwargs, chat-template-kwargs and in request itself but NADA, this bitch doesnt want to think
>>
>>108390227
But you were replying to this >>108389403 whatever the flow of conversation and you replying hours later didn't make that very clear regardless.
>>
>>108390229
That sounds like absolute suffering. I can imagine offline processing use-cases but otherwise, oof.
Respect for you CPUMAXXERS.
>>
>>108390230
or wait due to pwilkinson faggotness I cant do this shit dynamically anymore and have to use the --enable-reason shit and cant change it once its running??? hello?
>>
>>108390187
https://huggingface.co/PatnaikAshish/kokoclone
>>
list of models better than nemo 12b that you can run on your own machine:
>>
>>108390245
>suffering
the pp number is based 100% on gpu speed tho. eg a 5090 in the same system would be 10x faster for pp without changing a single other factor.
>>
>>108390122
Miku's gf is cute
>>
>>108390279
Miku is not a lesbian.... is she
>>
>>108390287
miku is just a sound bank anon, it's not like she has an official lore and shit lool
>>
>>108390230
I don't but gpt-oss for example accepts reasoning settings in it's templates, using system role if I recall. Don't remember the example here it has been 6+ months since I worked with that.
Find mistral 4 template and find out.
I'm pretty sure you can slip the setting somewhere in between.
>>
>>108390299
one thumb, difficult to type.
>>
>>108390276
That still seems an order of magnitude too slow, but again, I'm retarded.
For ref, 5090 is 2600t/s PP with 27B which is apples and oranges, but still.
>>
Reminder to not downloads Sloth models especially on early release.
>>
>>108390299
I alreayd checked the template and they work with reasoning_effort (only none or high), but passing them in the request has 0 effect. I suspect it is due to how pwilkins has a global toggle for it (MAN).
https://github.com/ggml-org/llama.cpp/issues/20557
a guy has made it work but you have to pass true/false in a think query parameter like WHAT THE FUCK why cant it be a prop of the request.
FUCK
>>
>>108390276
For comparison, my ddr4+3090 system does 27tk/s pp, so, uh...
>>
>>108390360
On what model/quant?
>>
>>108389153
Actual official /lmg/ card: https://files.catbox.moe/mc2a7s.png
>>
so is mistral 4 gud or shit
>>
is mistral4 implementation broken? q8 is dumb as shit
>>
>>108389451
What is a good small?
>>
>>108390418
it couldn't be they helped with it supposedly
>>
File: Miku v6.png (1.59 MB, 1500x2445)
1.59 MB
1.59 MB PNG
Holy shit Miku got a new design
https://soranews24.com/2026/03/13/virtual-idol-hatsune-miku-redesigned-with-look-that-adds-new-elements-and-brings-back-old-ones/
>>
Holy shit anon fucked his sister
>>
>>108390454
what's the point
>>
>>108390355
>[MODEL_SETTINGS]reasoning_effort: none[/MODEL_SETTINGS]
(none or high, afaik). You can use this to send it to the model. Wrap it between the other stuff. It works the same way as qwen and gpt oss.
>>
>>108390454
MIGUUUUU
>>
>>108390501
Of design? Of spamming the thread with offtopic trash?
>>
>>108390531
>Of design?
yes
>>
File: batmiku.png (1.24 MB, 768x1344)
1.24 MB
1.24 MB PNG
>>108390287
She is and she isn't. Pick any Miku you like, based on any song or your own headcannon, it's all legit and it's all Miku. It's like Batman who never uses guns or Frank Miller's Batman who shoot them left and right. Both are Batmans and both can be Miku
>>
How do I run onnx models?
>>
>>108389721
I'm worried for Gemma 4 now, there might be several reasons as for why it's been delayed so much.
>>
>>108389738
>as was gpt4
loool, gpt4 was revolutionary at that time (march 2023)
>>
>>108390545
ask chatgpt. you would use onnx for, say, converting your torch model to it and embedding it into an application
>>
>>108389201
You have access? I thought they literally just announced it.
>>
Gemma 4 will be the new local RP king
>>
The vibecoding general keeps using gpt codex and claude code and paying for it instead of using a local model.
What now?
>>
>>108390543
Of all the flavors why did you all pick "troon"?
>>
File: wtf.png (187 KB, 1510x1070)
187 KB
187 KB PNG
>>108390577
All I want to do is to transcribe some audio, but GGUF files don't seem to run anywhere for audio models.
I find a bunch of onnx models, so I figure that could work maybe, but I have no clue what to even get. Pic related. Wtf do I even download from this?
>>
>>108390592
What anime girl is this?
>>
>>108390595
>but GGUF files don't seem to run anywhere for audio models.
doesn't kobo support a lot of stuff related to that?
>>
>>108390598
>>108390608
>>108390614
>>
>page 2 bake
alright
>>108390599
Mejiro Ardan if she wasn't a horse.
>>
>>108390657
>>page 2 bake
>alright
>anon can't even be bothered to hover over the posts
>>
>qwen 3 4B -> qwen 3.5 4B
is this huge upgrade?
>>
avg lmg xperiance
>>
>>108389647
No one says that except you
>>
>>108390668
I understand anons asking for a 600b model to avoid the download. 4b you just download and test.
>>
>>108390667
fair sorry bro i'm so sleepy but i've got to keep going
>>
>>108390672
you literally said that "he's happier than you" because he's a cuck, rofl
>>
File: V2.png (883 KB, 1000x1535)
883 KB
883 KB PNG
>>108390454
Femoid targeted design right there.
V2 in comparison. Might as well post the others too.
>>
File: V3.png (726 KB, 1000x1000)
726 KB
726 KB PNG
>>108390686
V3
>>
File: V4.png (732 KB, 1000x1333)
732 KB
732 KB PNG
>>108390690
V4
>>
>>108390682
Nta. Why does not having kids piss you off?
>>
>>108390696
why does he even say "he's happier than you" though? does he know the guy? does he know me? how can he evaluate something like that?
>>
>>108390695
where is v5?
>>
>>108390668
no
>>
I'm not sure if the arguing this thread is autists or agents prompted to behave how they think anons act.
>>
>>108390668
Yes.
>>
>>108390686
>>108390690
>>108390695
@grok ADD BLACKED TATTOO
>>
Mikutroons are getting uppity again. Is it tome for another dose of blacked miku?
>>
ye
>>
holy shit I love migu
>>
>>108390502
>text completion
bro I want to use this for work (read: I need tool calls) not to coom to some poorly written erp (I have stepfun and air for that)
>>
>>108390785
You want to use MS4? For "work"?
Bahahaha!
>>
>>108390668
HUGE UPGRADE.
Qwen 3.5 4b is only 20% weaker than Claude opus
>>
>>108390819
kys retard
>>
>>108389201
Pathetic if true
>>108389435
I use Qwen3.5 9B, it's tiny.
>>
>>108390583
The fuck are you talking about?
https://huggingface.co/collections/mistralai/mistral-small-4
>>
>>108390834
Facts don't care about your feelings
>>
>>108390849
>Facts
mememarks in which we can cheat on can't be counted as fact, seethe
>>
>>108390849
This is not a fact, this is your evaluation.
>>
File: ms4config.png (271 KB, 704x1731)
271 KB
271 KB PNG
>>108390418
Lots of weird stuff going on with the model; can't rule out implementation issues.
Also, apparently it's been pretrained with a 8k token context, extended with yarn, but possibly uses NoPE? (no positional embeddings).
>>
>>108390856
No one has used positional embedding in years.
>>
File: 1747446617490220.png (527 KB, 1200x800)
527 KB
527 KB PNG
>>108390418
>is mistral4 implementation broken?
nope, the baguette fucks don't know how to make models that's all, only murica and the chinks have the brain to do good shit
>>
>>108390864
I'm still downloading mistral 4, but their Devstral 2 series is extremely good. I use 120B for RP and it's better than pretty much anything else I can get, both chink shit and sloptunes, by better I mean it writes more interesting texts, has a lot less puritan shit, does a lot fewer dumb mistakes. For work, devstral 2 24B is extremely good for one-token classification requests, better than all other alternatives at same or +-50% size. So I have a lot of respect for french here. My guess is that you are simply wrong about Mistral 4.
>>
>>108390856
>>108390418
If a model is dumb at q3/q4 I blame the localfag for being poor
If a model is dumb at q5/q6 I blame the quantization
If a model is dumb at q8 it's just dumb
>>
>>108389435
How do you run locals on phones?
>>
>>108390864
ALWAYS USE MISTRAL, ITS ALWAYS REGULATED BY THE EU GDPR RULES, THEY WILL NEVER BREAK THE LAW.
YOUR DATA AS A WHITE MAN FROM EUROPE IS SAFE.
>>
A model that can't handle template mismatch is unlikely to excel in multi-character RP chat
>>
>>108390876
lol
>>
>>108390902
nothing to lol about
>>
>>108390607
I got the older voxtral 3b to work in llamacpp. Wohoo. Works pretty well too
>>
>>108389721
Reminder after llama4's flop, Zucc got scammed by a 19yo chink and wasted over $20B.
$20B for no results btw.
>>
>>108390876
It makes 1b level mistakes even at temp 0 at ~2k context, it can't even recall what happened in the previous reply. this is why I suspect that it's broken, it just can't be that bad.
>>
File: 562187361783781.png (202 KB, 1100x1125)
202 KB
202 KB PNG
Meta's new model outperformed 1 year old model, Gemini 2.5. The worst one from top 3 back then.
>>
>>108391054
We'll see. I have it downloaded now but sadly my cards are busy running benchmarks for an older model for work.
>>
>>108390769
No, for the same reason there's no petra spam. Mods will nuke it and the troll will get scared of the 30 day bans.
>>
>mistral 4 "small"
>a tier below q3.5
>only cheaper in some tasks, literally better to use 27/35B model otherwise
>tries to hide it by comparing to other models
>calls itself "small" while being 120B

I miss the mixtral glory days...
>>
>>108391025
Isnt the latest model delayed because it couldn't keep up with claude/gemini/gpt?
Not sure what he was smoking or the thought process is.

Qwen had (and maybe still has) a tight grip on opensource coding/math. Kimi/GLM too. I think the latest GLM made a VBA emulator, thats cool stuff and difficult to compete.
He should have made a good writing model.
Localfags would promote it and the 4o foids who are probably on facebook or register love a model that goes along with anything they throw at it.
>>
>>108391062
by comparing to its own previous models* with inconsistent naming too
>>
>>108391062
>a tier below q3.5
this is what 3000 tokens gets you in q3.5 btw
>>
>>108391062
Mistral fell off hard. Their creative writing model is dumb AF too. I thought I did something wrong but its the mistral API..
Not pyg level stupid but its getting there.
Arent their latest models just deepseek distills?
This all started with those EU regulations so I guess it makes sense. Still sad to see though.
Those llama 70b models were the peak. I swear those frankenstein meme merges were more enjoyable and useable for RP than the recent stuff.
>>
>>108390819
ching chong bing bong
>>
I added this to the system prompt:
>Do not second guess yourself more than 4 times during thinking. After the 4th "wait, but" write "Enough thinking, let's answer now!"
Naturally, this meant thinking went like this:
>6. **Constraint Check (Thinking Limit):**
>* I am at the 4th thought iteration. I need to stop thinking and answer.
>* Wait, I need to make sure I don't exceed 4 "wait, but" moments.
>...
>*Wait, looking at the rule:* "Do not second guess yourself more than 4 times during thinking. After the 4th 'wait, but' write 'Enough thinking, let's answer now!'"
>...
>*Wait, I need to make sure I don't violate the "4 times" rule in the output.* The rule applies to *my* internal thinking process. I should stop thinking now.
>...
>*Wait, one more check:*
>...
>*Wait, I need to make sure I don't trigger a "wait, but" loop.* I have had 4 thoughts. I will answer now.
>...
>*Wait, I need to follow the instruction:* "After the 4th 'wait, but' write 'Enough thinking, let's answer now!'"
>...
>*Wait, I need to be careful.*
>*Let's write the response.*
This is so funny to me for some reason.
>>
loli feet
>>
>>108391085
I was too busy with work back then to truly appreciate the meme merge saga.
I thought chinese models were too mid but now I'm letting them fix compilation errors in abandoned software on their own.

RP back then for me was slow and couldn't follow basic instructions in the card that coming back now with even "shitty" models really surprised me.

I really barely used AI stuff between mid-2024 until a few weeks ago and just checked in on news and lurked the board every month.
>>
>>108391085
coding is all that matters for real performance and it reasons as much as any model there.
>>
>>108391094
None of the EU regulations come into effect until later this year, and they will likely be delayed further before then
Mistral are simply a bottom of the barrel lab that has never had anything to contribute to the industry beyond picking some low hanging fruit early on
>>
>>108391085
unfortunately, without thinking, those models are completly retarded and don't understand the nuances of conversations anymore, but I agree that they should train the model to not think too long for basic shit, the length should be proportional to the difficulty of the task at hand
>>
>>108391213
Unfortunately some rules started to apply since August 2025.
https://artificialintelligenceact.eu/article/113/
> (b) Chapter III Section 4, Chapter V, Chapter VII and Chapter XII and Article 78 shall apply from 2 August 2025, with the exception of Article 101;

That includes this:
https://artificialintelligenceact.eu/article/53/

> 1. Providers of general-purpose AI models shall:
...
>(c) put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790;
>
>(d) draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office.
>>
>pull
>Error: Jinja Exception: After the optional system message, conversation roles must alternate user and assistant roles except for tool calls and results.
>revert to version from last week
>>
>>108391288
Is it the parser is enforcing the order or the Jinja template itself?
>>
>>108391213
Mistral-7B and other early Mistral models used Libgen datasets at the very least, and with Nemo they probably added Anna's Archive data in collaboration with NVidia. Can't do that anymore...
>>
>>108389879
Nobody runs ollama in docker, docker sucks ass
We use proxmox and the openwebui helper script
>>
File: 00031-22-06-2025_003613.jpg (1.76 MB, 1536x2304)
1.76 MB
1.76 MB JPG
>>108389174
Vivian agrees anon.

https://files.catbox.moe/4k707b.wav
>>
File: ComfyUI_00094_.png (1.28 MB, 1024x1216)
1.28 MB
1.28 MB PNG
>>108391339
That doesn't make any sense. Even most people here do not have the hard word necessary to run a 400 GB model so they'll likely just use a cloud option. I thought the entire point of Docker was to create an instance of whatever software you're trying to use without having to deal with dependency hell a server farm would absolutely use that. Pretty much every premade template on Runpod uses a docker image the creator made themselves.
>>
>>108391279
tl;dr Europeans are only going to be relevant in AI as customers.
>>
>>108390672
thats the whole gist of every major religion thoughever
>>
>>108391336
>>108391423
And they were the only ones willing to make models that aren't safetyslopped to shit
It really is over and local peaked with nemo
>>
>>108390702
Perhaps if you weren't an annoying miserable fag All the fucking time shitting up these threads people would not be so hostile to you guys.....
>>
Agentic stuff via API

remote and local

It seems as if with each next LLM, the parameter "format" to switch reasoning ON/OFF is different

Also, should I have the reasoning ON or OFF for tool-calling? With enable_thinking: True, it can take agonizingly long for simple tasks

Any thoughts?
>>
>>108391445
Any decent model that supports tool calling shouldn't need reasoning to work well but I would test to confirm. None of the good coding or general purpose models I use have a reasoning except for one, and given the one that has reasoning, doesn't print five pages worth of reasoning, tokens in order to do something simple, unlike other recent models
>>
nvidia has shared what datasets they have used for nemotron, have they done the same for nemo? if yes why doesn't anyone here create a 70B dense model based on those datasets, maybe with some other added ones? It should be trainable with an rtx 6000 pro at fp8 no?
>>
>>108391455
>if yes why doesn't anyone here create a 70B dense model based on those datasets,
If you're trying to create one that will appease /lmg/ autists (you all seem to have the bad, rigid-thinking type of autism that makes you think you are smarter than everyone else) then exit exercise in utility because they will never be pleased. That's still wasting time on creative of writing or RP. The companies will never prioritize that, nor should they. They don't even do anything useful with these models. They just ask the same useless questions and then act surprised when it does not read their mind. It's not like fine-tuning a model or even figuring out how to do. It is particularly hard so you would think if they knew better if they would just do it themselves.
>>
>>108391439
Mistral models still are among the least safetyslopped official models available. It's just that they don't have a ton of creative pretraining data that they can use anymore, for now. I suspect they explored the synthetic route to compensate for that, looking at how Ministral behaves (when it works), but that didn't work so well.
>>
>>108391467
nah people just want an uncensored model which has a better understanding of the world, rules etc. at long context. I mean everyone is still recommending nemo constantly. it would simply be nice to have nemo but smarter
>>
File: nemotronbooks.png (241 KB, 981x1597)
241 KB
241 KB PNG
>>108391455
With Nemo they didn't disclose the content of their datasets, but they definitely used "books" for that; for the more recent fully open source Nemotron models they used exactly "0 Books".
https://torrentfreak.com/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-books/

>‘NVIDIA Contacted Anna’s Archive to Secure Access to Millions of Pirated Books’
>
>NVIDIA executives allegedly authorized the use of millions of pirated books from Anna's Archive to fuel its AI training. In an expanded class-action lawsuit that cites internal NVIDIA documents, several book authors claim that the trillion-dollar company directly reached out to Anna's Archive, seeking high-speed access to the shadow library data.
>
>Chip giant NVIDIA has been one of the main financial beneficiaries in the artificial intelligence boom.
>
>Revenue surged due to high demand for its AI-learning chips and data center services, and the end doesn’t appear to be in sight.
>
>Besides selling the most sought-after hardware, NVIDIA is also developing its own models, including NeMo, Retro-48B, InstructRetro, and Megatron. These are trained using their own hardware and with help from large text libraries, much like other tech giants do. [...]
>>
>>108391481
Be the change you want to see then...... you can literally ask llms how to do that right now, rent sone runpod gpus and do it. The companies are not going to do that for you and never will. You will never get a "smartter-nemo" (assuming it doesn't exist anywhere like you guys say). None of people will do that though because then it would deprive you of an excuse spew venom here. Not even at the companies that safety-slop the models. You will bitch at literally everyone else and make it everyone else's problem somehow just because you're a little upset.
>>
>>108391496
>None of people will do that though because then it would deprive you of an excuse spew venom here
You're a special kind of stupid if you think that's the reason.
>>
>>108391539
what's the reason the, Kruger?
>>
>>108391548
For me,
>you can literally ask llms how to do that right now, rent sone runpod gpus and do it
No I can't, in both the financial and capability sense

I promise you literally everyone would welcome a bigger, smarter Nemo, but making one is not something a simple anon can do
>>
>>108391566
>not something a simple anon can do
*single anon, meant to say
>>
File: 1682015841224104.jpg (51 KB, 896x853)
51 KB
51 KB JPG
>>108391494
>for the more recent fully open source Nemotron models they used exactly "0 Books".
why?
>>
>>108391494
>0 books
the copyright tards must be seething so hard about this lmao
>>
>>108391575
Because of lawsuits (some still ongoing) and because their models are (almost) completely open source, so it's not like they can open distribute pirated books from Anna's Archive.
>>
File: Confused.png (172 KB, 577x467)
172 KB
172 KB PNG
so did Deepseek v4 just get forgotten about or what
>>
File: sans_is-excited.png (53 KB, 1039x177)
53 KB
53 KB PNG
>>108391654
Not before Gemma 4.
>>
>>108391654
Isn't V4 just in expectation/rumor land?
Or is there some sort of official word about it?
>>
I did some RP through the API with mistral 4. Its so bad, damn.
It has no clue about characters. Just wings it with generic slop to hide the missing knowledge.
The old 3.2 24b seems actually REALLY good in comparison. We are definitely regressing.
120b and its worse. Could have been such a nice size.
Even Qwen 30ba3b did better. (still bad, but less bad, it had some grasp of the characters)
So its not just the moe tax. So tiring...
>>
>>108391666
MoE models were a mistake
>>
>>108391584
haha, yeah. w-we won right bros? we sure showed the copyright tards kek!
>>
>>108391666
>The mansion is eerily silent, save for the occasional groan of ancient timbers settling. Distant candlelight flickers against the walls, casting long, wavering shadows that seem to retreat just as you pass them. A moth drifts lazily near one flickering sconce, its wings briefly illuminating the portraits lining the hall—each face frozen in expressions of arrogance or sorrow.
>From deeper within the mansion, a faint jingle of keys drifts down another corridor, followed by the soft rustle of fabric. Roswaal’s study door stands slightly ajar, a sliver of golden lamplight spilling onto the floorboards, along with the faint aroma of what might be...spiced wine? A woman’s laughter—light and teasing—echoes from an unknown room, quickly muffled as if by a hand over a mouth.
>Somewhere above, a floorboard creaks, though no one is in sight. The air thickens with the scent of lavender and something metallic—blood? No, just the distant tang of iron from the mansion’s old heating system.
>A draft slithers down the hall, ruffling the hem of a tapestry depicting a wolf howling at a blood-red moon. The wolf’s eyes seem to follow your movement.
Might share a little bit.
I'm not sure what to call it, there probably is a word for it. But I'm overloaded with background stuff going on.
Its like somebody took R1/V3 and put it on steroids. So much noise thats not relevant or immersive at all.
>>
>>108391666
>Even Qwen 30ba3b did better.
Losing to Qwen on knowledge is a new low for the French.
>>
it's so sad, mistral is dead. who's going to save local now?
>>
>>108391666
>advertised use case: coding and agent
>>
>>108391665
According news outlets it is rumored that is has been officially confirmed by people claiming to be in the know that deepseek might be planing to release their model sometime in the next two weeks.
>>
>>108391666
Just RAG the knowledge, bro
>>
>>108391720
Its not looking too god.
Google was sued too because of gemma and copyrighted texts right?
Nvidia with all their synth releases. Can't believe they didn't hide the dataset, its so bad.
GLM/Kimi if you have the horsepower, but those are getting worse too.
Everybody goes full agentic/coding. I was hoping for a saudi prince to rescuce us all but they are getting bombed.
>>
>>108391785
I called it when i said that the last good rp models we will get are mistral small 3, og nemo and glm air.
>>
>>108391758
exact
>>
>>108391758
RAG the sex
>>
>>108390847
I thought you were referring to Mistral large.
>>
>>108391279
You are getting EU bureaucracy'd
https://www.medialaws.eu/eu-ai-obligations-for-gpai-providers-compliance-enforcement-deadlines-2025-2027/
It became law since August 2025, but compliance is still in a "grace period" and the real deadline before enforcement is August 2026. Notice how no lab has disclosed jackshit about their releases since last August with 0 repercussions
More importantly, all the major labs have already made it clear they have no intention of complying with the law as-is, which means there is a very high chance the enforcement date will be delayed again until lobbyists do their thing
>>
>>108391845
>Notice how no lab has disclosed jackshit about their releases since last August with 0 repercussions
I mean, pretty sure it was related to how ministrals were made though, as distills from small 3 which was from before the deadline, and afaik they need to disclose stuff to the EU committee thing, not like general public
>>
>>108391845
They have a page intended to show how compliant they are.
https://legal.mistral.ai/ai-governance/models
>Welcome to Mistral AI's central hub for documentation and resources relating to the AI Act and other applicable AI Regulations.
>>
this is a cool paper methinks
https://arxiv.org/pdf/2603.14315
>>
File: file.png (64 KB, 859x434)
64 KB
64 KB PNG
>>108391919
thanks validates what I >>108391884 said
>>
>>108391946
Now try models released after August 2025.
>>
>>108390346
Is there any actual reason or it's just schizo crusades? Unsloth does help me a lot in making finetuning simple.
>>
>>108391720
GLM 5 Air
>>
>>108391985
I can't breathe
>>
>>108391975
They fuck up often and seem genuinely incompetent even when people try to explain stuff to them.
They are quick to make goofs though.
>>
File: dipsyByzantine1.png (3.44 MB, 1024x1536)
3.44 MB
3.44 MB PNG
>>108391738
>>
>>108391956
Actual general-purpose base models released after that date listed there are Mistral Small 4 and Mistral Large 3, by the way. They don't seem to be considering finetunes (or distillations) of older models as new models, but it's obvious it's a trick. They're trying to buy time for those in the hope regulations will change, but they're already complying to EU laws for completely new models.
>>
>>108391884
https://artificialintelligenceact.eu/article/53/
The AI act requires labs to
>(d) draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office.
https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models
And here is the template in question, which requires publicly disclosing
>(ii) nature of the content (e.g. personal data, copyright protected content, machine generated data such as Internet of Things or synthetic data
No such public summaries exists yet, despite the law theoretically applying since August 2025, because there is no enforcement mechanism in place yet and nobody cares to comply until then
In Mistral's case specifically, the closest thing they have to the EU-mandated public summaries is their "technical documentation"
https://legal.cms.mistral.ai/assets/d0b7b04d-dcb5-412d-bb45-c63b1475b805
Which largely ignores the above template, avoids disclosing any specific dataset, and completely handwaves the copyright question with a
>In particular, the Mistral Small 4 training dataset comprises a mixture of publicly available datasets and internet sources, private non publicly-available datasets licensed or otherwise obtained from third parties or partners; synthetic datasets; and Mistral AI user data used in accordance with Mistral AI’s terms of service. The datasets used by Mistral AI to train Mistral Small 4 may contain content that is subject to intellectual property rights or in the public domain. For the avoidance of doubt, the specific status of each dataset depends on a variety of factors such as applicable laws, commercial licenses, or the type and characteristics of data.
>>
Do i get this right: lm studio can use TTS models, but has no built in function to read out what an LLM in it has written? You always have to do it over some API instead. That sounds kinda dumb.
>>
just did some tests and mistral 4 has less trivia knowledge than qwen 2.5 32B. What are the french doing?
>>
>>108392077
how safe is it though? does it output less harmful content?
>>
>>108392077
EU love <3
>>
>>108392078
Haven't tested that, but it's hela fucking slopped, somehow worse than qwen 3 + gemma 3 combined.
>>
>>108392095
It wouldn't surprise me if they asked NVidia for pretraining dataset help this time around, which would explain the lack of knowledge. Perhaps under the hood this is a model composed of mostly fully open-source datasets published on HuggingFace.

https://mistral.ai/news/mistral-ai-and-nvidia-partner-to-accelerate-open-frontier-models
>Our collaboration with NVIDIA and other coalition members reflects a shared commitment to:
>
> Transparency: Open-sourcing models, data, and frameworks for global access.
> Collaboration: Fostering a community where innovation is collective, not siloed.
> Impact: Enabling developers to build the next wave of AI applications on a robust, open foundation.
>>
>>108392135
So it's safe to assume that all subsequent models will also be shit from now on?
>>
>>108392145
Stop dooming you insufferable schizo.
>>
>>108392152
It's called being realistic.
>>
>>108392156
autistic*
>>
>>108392037
So what I'm getting is that either Mistral wants the good boy points and they're trying to get their shit in order even before the law comes into effect, or they don't know what they're doing and they've been spending the past year trying to copy the chinks' homework with worse data.
>>
>>108392135
>a groundbreaking global initiative uniting leading AI labs to advance open, frontier-level foundation models
meanwhile the models are literally useless. am I missing the point here? what can nvidia's latest aborted fetus be reliably used for?
>>
>>108392180
Benchmarks
>>
>>108392180
agents and coding saar
>>
All right. Reporting. Tried unsloth-Mistral-Small-4-119B-2603-MXFP4_MOE. Its not good for RP. I'm reverting to Devstral 2. That would be all.
>>
>>108392226
kekekekek
>>
>>108392156
>It's called being autistic.
>>
>>108392226
My guess is that you are simply wrong about Mistral 4.
>>
>>108392236
>My guess is that you are simply wrong about Mistral 4.
guess based on what?
NTA, but I also tried smol 4 at q8 and it was trash that couldn't walk and chew gum at the same time
>>
>>108392226
> unsloth
what's wrong with you
>>
>>108392175
Logically, you'd think Mistral would never be in danger due to being the EU's only AI champion and they only have to make a token attempt at compliance while the regulations serve to shut out their competitors
But the EU has shown time and time again that they are more than happy to gut their own industries in exchange for being able to fine the US giants
Mistral is probably just as in the dark as anyone else and trying to comply in whatever way they think will be reasonable enough to keep the commission off of their backs
>>
>>108392243
>guess based on what?
>>108390876
>>
>>108392256
>>108392236
Could still be a broken implementation. It is written by mistral, but they could easily have pushed the PR without even verifying it produces the same result.
>>
>>108392256
"Past performance is not a guarantee of future results"
but at least you're going on more than just vibes
>>108392261
>Could still be a broken implementation
I guess we can hope. If what I ran on my rig is indicative then things are looking grim
>>
API version is shitty too though
>>
>>108392251
The other possibility is that they're overeager to comply specifically so they can keep their "champion" status
Basically accepting they can't compete with the US/China and instead just pandering to the local bureaucrats so they can keep getting gibs, model quality be damned
>>
>>108392077
EU regulations. They don't have any training data and they can't use too much compute. The EU basically kneecapped AI development.
>>
>>108392296
>they can't use too much compute
They can but at that point they are subject to disclosures.
>>
>>108392296
Except if >>108392037 is any indication, Small 4 does not comply with regulations
>>
File: 1745264074130257.png (1.27 MB, 1063x997)
1.27 MB
1.27 MB PNG
>>108392077
>Muh niche trivia

Not trying to make excuses for companies, but use case for that?
>>
>>108392325
for me, spreading the good word about the model being good for coom
seriously though, who the fuck will deploy this and why?
>>
>>108392325
GLM 4.7 knows what /lmg/ is.
Qwen 3.5 doesn't.

GLM 4.7 is better at programming.
>>
>>108392335
coronation causation dear sir
>>
>>108392335
Use case for knowing what /lmg/ is? That wouldn't necessarily lead to better coding ability because most people do hear is speculate about future releases and then bitch about an inherently non-deterministic technology Not exactly what they wanted it to do the first try.
>>
>>108390876
>Devstral 2
Using this too. I 100% prefer it for RP over GLM 4.6 when it comes to dialogue and writing, until about 6-8K context where it starts making retarded mistakes and sounding sloppy, where GLM will keep going until 14k or so.
>>
>>108392363
which size of devstral?
>>
>>108392352
It doesn't seem odd to me at all that varied training data increases performance across all areas
A model trained on github repos and ao3 fanfics where Ron gets knotted by Harry will perform better than a model trained only on github repos.
>>
>>108392375
seems very odd to me though
>>
stop trying to have sex with code models
>>
Code with sex models.
>>
File: file.png (116 KB, 1129x288)
116 KB
116 KB PNG
>>108392515
Code models are sex coded.
>>
>>108392296
it doesn't matter because the EU is better than the US so get fucked
> lol healthcare
> lol ICE
> lol unsafe schools
> lol required to drive a car to cross a road with no crossings anywhere
>>
>>108392531
Both can be true.
>>
>>108392531
>> lol ICE
imagine being so pozzed that you think immigration enforcement is a bad thing
>>
>>108392531
I don't care, all I want is a new nemo
>>
>>108392367
123b.
>>
File: QdErYcdpCfs6dgiwG6xf8.png (452 KB, 3076x2010)
452 KB
452 KB PNG
oh gawd im benchmaxxing
https://huggingface.co/miromind-ai/MiroThinker-1.7
>>
>108392531
Where's /wait/anon ? We need containment
>>
>>108392585
We needed containment over a week ago when the openclaw retards started flooding in. It's far too late now.
>>
>>108392375
>It doesn't seem odd to me at all that varied training data increases performance across all areas
Diversity in the data set is important but you're misunderstanding how that works, like a lot.... In order for a model to be good at programming it needs to be shown examples of good programming and examples of "conversations" where the assistant helps a user through a problem. Diversity in the data sets isn't important just for diversity sake. You can't just throw random shit into a pot toss it in the microwave and then expect a Michelin star level dish. You need to be intentional about what you incorporate within it. I'm convinced you guys are just hyper fixated on the models shitting out niche information just because you got bullied and to pretending to get matters.


>A model trained on github repos and ao3 fanfics where Ron gets knotted by Harry will perform better than a model trained only on github repos.

Explain to me how someone's shitty fanfic being incorporated into the data set leads to a model being better at programming at less prone to hallucination? You can't because it makes no sense. It would lead to better "generalization" and potentially even the model not being as safety cucked but it will only help in that particular area.
>>
https://unsloth.ai/docs/new/studio
guys, retard brothers are at it again
>>
>>108392623
>it needs to be shown examples of good programming and examples of "conversations"
Did you really expect me to explain the entire LLM training pipeline just to make a point that diverse data makes the model better even at tasks that are not directly related to the data?
>>
can we have a gguf quant cheat sheet in the OP? Speed, quality, this sort of thing. For example I heared that sometimes a larger quant can be faster but it also depends on the type.
>>
>>108392657
>I heared
>>
>>108392628
>Run GGUF and safetensor models locally on Mac, Windows, Linux.
lmfao.cpp is done for
>>
>>108392657
with the exception of iq quants (they run slightly slower when offloaded), it's really simple
if it fits into the bits nicely 2,4,8 bits, they run faster
odd bit quants run slower since their memory access patterns don't align nicely
all quants run faster the smaller they are
>>
>>108392628
Imagine how unstable it is
>>
File: technologyboard.png (141 KB, 1271x704)
141 KB
141 KB PNG
i like my models knowing dumb shit about /g/
>>
>>108392681
What about NL and TQ?
>>
>>108392628
>same one gui
>>
>>108392687
I know this is qwen because it has that classic hello fellow kids meme energy.
>>
>>108392700
Mental illness, not a real quant.
>>
>>108392706
bzzt wrong
>>
>>108392687
kek
>>
>>108392706
qwen doesn't even know it can see images and pretends it doesn't
this is just kimi with some prompt
>>
>>108392706
looks like kimi slop to me
>>
>>108392645
If it has diverse programming type sample then it will get better at programming. Yes. Incorporating fan fictions into both the pre-training and SFT phases of training will lead to better generalization (Not being only good at conversing about one domain. Not being too rigid as to what it can and cannot talk about, not being too rigid about how it can speak, Not being too limited on instruction following capability, etc.)

With all that said you keep failing to explain to me why Harry Potter fanfic being in the training directly correlates to better programming ability. If a bunch of the stories have no discussions about programming, how does that lead to the model performing better in a separate domain? A diverse data set for Ben's catastrophic forgetting but it does not necessarily mean a automatically gets better in one domain. The programming portions of the data set have to be high quality for it to be better at that domain. The storytelling / RP portions of the data set need to be high quality (highly suggestive) in order to not be shit. Etc etc. A diverse data set is meaningless if the samples are garbage.
>>
>>108392724
yeah it's just kimi with a prompt that tells me to give me an uncensored description of the image using casual language/slang.
>>
>>108392731
>Incorporating fan fictions into both the pre-training and SFT phases of training will lead to better generalization
Glad we agree.
>>
>>108392724
Good to know that I don't have to bother with kimi then. GLM is a lot better at pretending to be an anon without sounding like a parody and mixing in zoomer language.
>>
>>108392584
So this is the power of a modern day 235B dense model. Honestly, I'm not surprised.
Looks brilliant, I can't wait to see how badly it destroys MoE shit in actual comparisons.
>>
>>108392758
it can't see your cock tho
massive disadvantage
>>
>>108392747
Better generalization does not automatically mean increased quality or performance in a particular domain. I can learn three different sports with enough practice but if one of my trainers is shit but the other two are world-class, I'm going to be worse at whatever sport the shittytrainer is trying to help me in. Does that analogy make sense? Garbage in. ---> garbage out.
>>
>>108392759
> "architectures": [
> "Qwen3MoeForCausalLM"
> ],
>>
>>108392759
anon... that's a qwen 3 235b-a22b finetune
>>
File: 1764446103940328.jpg (90 KB, 1242x848)
90 KB
90 KB JPG
>>108389142
>>
>>108392798
No the analogy doesn't make sense because the quality of the data is irrelevant when the comparison is between two model trained on the same data but one is also trained on smut.
>>
File: ProjAni.webm (2.2 MB, 1280x720)
2.2 MB
2.2 MB WEBM
Has anyone here worked with voice2animation local models? I'm having issues with performance for my project. Running LLM, TTS, V2A, and lip syncing models all at the same time with low latency as a goal is proving to be extremely difficult. Even giving each program their own CPU threads to minimize CPU contention and or having the some of the programs run with a convoluted sequencing system isn't really working.

Very unhappy with PantoMatrix EMAGE right now. It's a two year old model and the BEAT2 dataset it's trained on is derived from public speeches (think Ted Talks) so the gesticulation output looks pretty unnatural for natural conversation. Problem is there are no good alternatives. The only thing that might look like a decent option is Meta's SARAH, but they haven't released any models yet--just the training dataset.

https://files.catbox.moe/ng51nv.webm
>>
>>108392840
there's a reason why people who are making the ai gooner tubes are making six figures a year and work for companies that raise millions and millions of dollars from investors
>>
based thread. mikulosers want to fuck their sisters
>>
File: 2089.png (81 KB, 742x522)
81 KB
81 KB PNG
>>108392724
>qwen doesn't even know it can see images and pretends it doesn't
mine seems fine with them
>>
>>108392868
>>108383821
>>
>>108392868
this anon's had a problem however >>108383821
>>
>>108392840
Absolutely unrelated. But just like I found wav2arkit, I also randomly found this:
https://huggingface.co/zeropointnine/yamnet-onnx
It categorizes sound events. Maybe you'd like to integrate it to have your ani react to random audio from your mic.
>>
>>108392816
Sex.
>>
>>108392830
So you're telling me that the data being shit leading to the output being shit makes no sense to you? It's complaints I always hear both here And even other places is that a lot of models sound too flowery, corporate, slopish, riddled with "gpt-isms", etc. That's largely because the companies who implement the data sets for the training choose to sterilize the data sets of anything "problematic" or anything that could get them in legal trouble with copyright trolls. And this very thread someone even pointed out that Nvidia not (publicly) incorporating any books in the training was likely the reason that family of models sucks now.

>>108391618
>>108391575
>>108391494
>>108391455

The data set quality has a very very large effect on the quality of the output data. I get you'll have a hyper fixation on smut generation and don't care about any other use case. That's fine. I don't really care for smut generation that much. But there is a fine line between not caring about a certain domain and flat out putting out misinformation to sooth your own favor or turbo autism (and not even the good kind where it at least makes you good at a particular thing. The stubborn, annoying kind)
>>
A model that knows more things is better than a model that knows fewer things.
>>
>>108392801
>>108392810
Oof
>>
>>108392895
>sooth your own favor or turbo autism
sir pls
>>
>>108392860
I've been doing some more reverse-engineering behind Animation.inc's process (they made Grok Companions and Razer's Ava) and my understanding at this point is that their "voice2animation" system doesn't actually generate locomotive frames (6D for each bone--extremely taxing on hardware) from speech directly. I think what they do is they have a have a complex pre-rendered BVH mocap library and their AI model simply manages blending and cross-fading between those premade animations in accordance to voice analysis. This seems a lot more computationally lightweight in theory, but it also sounds extremely complex to manage/set up and there are no open-source implementations from what I've seen.
>>
>>108392904
Better is subjective if you're not using a specific metric to define "better". Better at what? Coding? Coom? Drafting up new cooking recipes? If you want it to be good at all of that it has to have good examples of all of that
>>
>>108392895
I made a very general point about more diverse data being better and you barged into the conversation with
>b-but what if the data is bad
>b-but you need to have instruct data too

Completely useless fucking comments.
>>
>>108392927
>Better at what?
Everything that their training data allows. Data diversity helps.
>If you want it to be good at all of that it has to have good examples of all of that
I don't want them to be good. I want them to be fun and interesting.
Knowing more is better than knowing less.
>>
File: 1735070906740.png (11 KB, 688x290)
11 KB
11 KB PNG
>PocketTTS.cpp
14.24s audio in 3.67s; first chunk latency: 98ms
The CPU is i7-11700
>>
File: ye.png (45 KB, 646x578)
45 KB
45 KB PNG
>>108392884
Not really useful for my project since it's just a sound classifier (laughter, glass breaking, keyboard typing, etc) but it's somewhat interesting regardless.

I haven't even integrated speech-to-text to my project yet because I'm already pushing against my hardware's limits as is, unfortunately. picrel is the ideal system architecture I'm going for at the moment..
>>
Someone here >>108392375 said very data leads to better performance. I simply said they were caveats to that. You proceeded to incorrectly claim that data quality is irrelevant here >>108392830 like a bumbling buffoon who, like a lot of LLMS ironically, is confidently wrong. If you want to continue to not use your own fucking head more power to you. Garbage in, garbage out. This was well established well before LLMs were even popular. A diverse data set leads to better generalization but generalization and output quality are not the exact same things.
>>
>>108392957
Cool. Thanks for the profile report. Is it working well for you?
(still have some potential performance optimizations in the works for that btw)
>>
File: 1761689373233193.png (515 KB, 1024x1024)
515 KB
515 KB PNG
>>
>>108392969
nta and imo but I think applies to a few here, probably would prefer a somewhat mediocre true generalist to a good coder that does only that
>>
>>108392958
>Not really useful for my project
It can greet you when you open the door to your office, react to your microwave dinging, make fun of you when you drop something.
>I haven't even integrated speech-to-text to my project yet because I'm already pushing against my hardware's limits as is
tts takes very little. I suppose it's the stuff in the middle that takes the most. I don't remember if you tried piper, but that one is lightning fast (no streaming, but you can split by sentences or something. A single sentence takes less than a second on old cpu).
>>
>>108393004
>>108393004
>>108393004
>>108393004
>>108393004
>>
>>108392969
You must be the guy from last thread that claimed that incest (smut in training data) is bad because the guy probably can't fuck anyone else (the data might be bad quality).
>>
>>108392973
So far so good, thanks for adding Windows support. If it can be even faster I'm all for that, have much older junky Intels I could be running a good tts on.
>>
File: 1765406999724534.jpg (13 KB, 500x394)
13 KB
13 KB JPG
>>108393024
Glad to see you've been proven wrong so you resort to "you're this anon I don't like" fuckery. You are misguided, wrong, stubborn and stupid and you know it.
>>
At what point are we going to ignore the trolls and start making normal threads again?
>>
>>108393040
That's what's happening though? Mike trolls are being ignored.
>>
>>108393000
>It can greet you when you open the door to your office, react to your microwave dinging, make fun of you when you drop something.
Fair point. I added it to my notes.

>tts takes very little.
Ehh. I wouldn't go that far. It's definitely not the bottleneck. The benefits certainly outweigh the costs. I tried Piper initially, but I found the voice quality and latency to be pretty bad and it doesn't support voice cloning. One of the main issues with Piper was the lack of FFI support, so the only way to get fast performance was to use an HTTP server. Using a webserver to spawn the process manually for each LLM chunk request was awful. Overall I'm really happy with my Pocket TTS implementation. EMAGE and wav2arkit are what is raping me right now.

On a separate note. I probably could actually integrate STT without performance worries because it's totally separated from the usual inferencing cost that happens after LLM output, since it happens BEFORE LLM inferencing. Hopefully that makes sense.
>>
>>108393037
I am just pointing out that what you're doing is similarly retarded to what he was doing.
>>
>>108393065
That's not at all what I'm doing. Nowhere did I imply having smut in the dataset is bad, or having x type of data in the dataset is bad. I'm saying QUALITY matters. You have no business calling anyone retarded when you starlight up said data quality is irrelevant >>108392830
>>
>>108393040
He hasn't been very consistent so I have a feeling he will give up eventually.
>>
>>108393033
>So far so good, thanks for adding Windows support.
No problem. Good to hear.

Would anyone be interested in my EMAGE onnx export script btw? For some reason nobody has ever done this before, which seems insane to me, so I built it myself. I could set up a repo for that within the next couple hours. I'd really like to see more anons in general play around with the LLM -> 3D character animation pipeline. I thought you guys wanted your own waifus, kek.
>>
>>108393063
>FFI
Why would you need that? Just load the onnx models yourself and run them like you do with the rest of your models. But yeah. If you need cloning, it's not gonna help you.
I managed to run wav2arkit faster than realtime with a little demo thing. But I was just running tts and wav2arkit, without all the other overhead you have. All those little things add up.
>I probably could actually integrate STT without performance worries
It depends on if you have it running all the time or start it with a button or something. silero has a few small models for voice detection that you can let run continuously for auto-detection, but it will add another drop of overhead to everything else.
>>
>>108393106
If the quality of the data is bad then both models will be bad but the one with smut will still be better because it also knows what knotting is.
>>
>>108393117
how does that help me vibecode lamo.cpp prs though?
>>
>>108393117
Why does knowing what knotting is correlate with programming ability (or any other domain that has nothing to do with knotting)? Are you trying to pretend the concept of catastrophic forgetting doesn't exist? Based on a conversation you likely either don't even know what that is or like pretending it's not as big of an issue as it actually is with training. Like dude I get it you want your models to make you nut And there's nothing wrong with it, but you don't have to be a glue eating retard about it.
>>
>>108393110
>EMAGE onnx
What's the inferencing speed?
>>
>>108393155
I don't know why it correlates but the examples we have so far show that it does.
>>
>>108393167
Like?
>>
>>108393110
For what it's worth, I would be interested.
>>
>>108393115
>Why would you need that? Just load the onnx models yourself and run them like you do with the rest of your models.
Well for me one of the design constraints is to have everything run in one terminal window (kinda a tism thing desu). So before I was using Deno to spawn the Piper binary every for every text chunk and it was a huge latency bottleneck. That's why FFI is necessary, because it removes the overhead of spawning and prewarming the model on a constant basis.
>I managed to run wav2arkit faster than realtime with a little demo thing.
Yeah as a standalone process it's only 50ms. Quite fast overall, really. But with all of the overhead costs it's taking around 400ms (largely because of EMAGE).
>It depends on if you have it running all the time or start it with a button or something.
I'm thinking I would set it up like a voice messaging type of system. The annoying thing is that without a full-duplex LLM, an LLM can't take in streamed text input from a STT engine, so voice messages is really the best I can do.
>>108393163
Hard to say because of my overhead costs with the full system right now, but it usually hovers around 500-700ms per window (64 frames iirc, aka 2.13 seconds of audio, iirc). But if you look at the video I posted earlier it appears much worse in practice. Not really sure why that is desu.
>>108393178
Cool. I'll work on setting up the repo. Fair warning, the script is vibecoded dogshit right now, but it works fine.... so uh... yeah.
>>
>>108393191
>But if you look at the video I posted earlier it appears much worse in practice. Not really sure why that is desu.
Actually this is probably because it has to wait for the LLM to finish a full sentence and the TTS engine to process it before it can even start working.
>>
>>108393191
>spawn the Piper binary every for every text chunk
But you already know how to run onnx models. Just load the model and keep it in memory. You don't need piper. You just need the models. Again, doesn't matter if you're not gonna use it, but the whole approach seems wrong.
If I were to do it, I'd just load the model on a forked process/thread and send it text over pipes or something.
>>
>>108393231
You're absolutely right!
Nah but seriously though. This was a long time ago before I knew the right approach to take. Piper was my first TTS implementation, then I switched to Kokoro, and then I started using Pocket TTS. All I'm doing is describing why it didn't work for me initially, not why it "couldn't work".
>>
>>108393231
If you don't care about voice cloning I wouldn't even use Piper anyways. KittenTTS is waaaayyy faster and has decent (for its size) generic cute anime voices.
>>
>>108390876
as a frequent devstral user, I found mistral 4 very, very disappointing. I hope it is a bug because holy kek.
>>
File: file.png (279 KB, 570x943)
279 KB
279 KB PNG
>>108393040
I'm of the opinion that petrus is a paid NovelAI troll, since the threads that are usually trolled are always related to local models: /ldg/, /sdg/, /lmg/, /hdg/. But aicg, dall-e and other cloud threads are never touched.
>>
shitzo alert
>>
>>108391946
Wait, ministral is just a pruned small? Nothing new added? What the fuck do I want with it then when I can just run small
>>
>>108392305
If only Mistral was Italian, they could just lie about the compute



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.