/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108386516►News>(03/16) Mistral 4 small releasing: https://huggingface.co/collections/mistralai/mistral-small-4>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
>>108389153how do i use ai
I wish I could send a simple "Thank you" to my agent without paying more tokens for it:(
>>108389153>{{char}} *screeches** PEEESSSSSSSSSSSS (piss) (I am peeing all over your internet)Yeah, I'm gonna jack off to this later.
>>108389142>New Mistral model mixes up who's talkingReminder that this was practically the only issue with Llama 1 era models (other than context length). Nothing has improved in 3 years. It's completely and utterly over.
Miku fucked my gf without her or my consent while subjecting her to incest porn.
>>108389206PLEASE take your meds
>>108389223No I will never forgive Miku for the many times she fucked my gfwife. Or stop being horny about it.
>>108389008Many women still don't want to deal with the burden of pregnancy and responsibility though
Wow, it got really quiet without something to argue about
>>108389297Not to worry, this retard >>108389275 is on the case
One "4" down, more to come this week.
>>108389313I'm not wrong though. I've come across just as many women who want nothing to do with it at the least.
>>108389275Why should we? I'm so happy my hubby had a vasectomy, imagine wanting a crotch goblin.
>>108389355>vasectomy
>>108389355I appreciate you adding to my point, but I find it hard to believe that women post here
>>108389142Mon dieu....incroyable....
>>108389375>hard to believe woman posting in a thread about a woman-coded hobby
>>108389369>happier than you>has sex>has switchseems like a winner to me
>>108389396It's not hard to believe that many women are roleplaying with non-local chatbots. It's hard to believe that women are posting in a not very actual thread for local models on /g/ of all places. This is one of if not the least likely places I can think of to have women in it that I've ever been in.
>>108389396That's /aicg/
>>108389415*not very active
What's a good small LLM that can run on phones? I just need something that can read long text documents and answer basic questions. Like here's a contract, tell me the duration (12-01-2025 to 06-01-2026)I tried qwen2.5 0.5B because it's only 400MB but it still fucks up on basic shit like this.
>>108389435>good small llmchoose two
kek
>>108389415Women probably don't post here. Women (male) probably do post here.
>>108389468Even that I doubt is unironically happening or if it is it's probably like 1 or 2.
any way to see full raw text output from silly tavern? I'd like to see the order of system prompt, card prompt, history etc
>>108389516I don't know how to see the whole assembled prompt, but the ordering of the fields you're asking for appear in the response configuration (if you're using a chat completion endpoint).You can open the response configuration with the circled button.
>>108389529fantastic ty
>>108389534there is also the prompt itemization menu you can access by clicking the three dots on a chat response.
>>108389153
>>108389559What's wrong with old cards?
>>108389529>>108389516You can see everything what retardo tavern sends out in your terminal obviously, including the prompt assembly.
>>108389564They've hit the wall and are no longer fertile
>>108389564Why do you think there is something wrong about it? Are you an autist with no casual sensibilities and intellect?
>>108389164Ask grok
there are more women dating bots than men, you just aren't ready to accept that
>>108389601finally... i found a woman dominated hobby... all i have to do is lobotomize myself and act like an LLM and i will no longer be a virgin!
>>108389601But they aren't doing it locally and wasting their time on a /g/ thread for it
>>108389601100%, my gf has multiple friends doing that, she thinks its a mix of:1. Full loyalty2. Always available3. Always safe
>>108389627None of those things entirely true
►Recent Highlights from the Previous Thread: >>108386516--Mistral-Small-4 release and speculation on future Mistral 4 architecture:>108386532 >108386550 >108386567 >108388009 >108388025 >108388037 >108388072 >108388051 >108388129 >108388151 >108388183 >108388324 >108387022 >108387033 >108387230--Mistral Small 4 benchmark performance analysis and critique:>108386596 >108386614 >108386828 >108386843 >108386615 >108386616 >108386790 >108386619--Testing Mistral-Small-4 119B's reasoning and cultural awareness:>108387004 >108387010 >108387018 >108387057 >108387105 >108387175 >108387197 >108387211 >108387578--Mistral-Small-4-119B-2603-eagle MoE model RAM and quantization requirements:>108386785 >108386799 >108386945 >108386949 >108386958 >108387005--Mistral small 4 support merged into llama.cpp:>108388047--Unsloth Q8_0 quantization and imatrix impact debate:>108386681 >108386694 >108386729 >108386770 >108386837 >108386707--Qwen 3.5 local deployment options and censorship considerations:>108388706 >108388748 >108388753 >108388842--Mistral Small 4 cockbench:>108388050 >108388075 >108388076 >108388143--Fixed performance comparison chart across internal Mistral models:>108386860--Miku (free space):>108388598►Recent Highlight Posts from the Previous Thread: >>108386899Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108389635Fuck you Miku
>>108389142Evokes confidence
>>108389403>nooo you must conform to MY ideas of happiness else you are deluded
What are the best VLM's to use to generate natural language descriptions of slop for animating? I don't want to have to write up long descriptions by hand. I'm currently using open router. The content to be described is fairly vanilla but rather explicit.
>>108389644the moe tax is real
>>108389435why didn't you try 3.5
>>108389711that's what pewdiepie used
> llama 4 is complete dogshit> mistral 4 is complete dogshit> qwen team implodes right after releasing 3.5why can't AI labs count to 4?
so what's the verdict?
>>108389721claude 4 was complete dogshit tooas was gpt4crazy
>>108389721just wait for deepseek v4
>>108389733guilty
>>108389733better than deepseek 4. we win
>>108389700To be fair, none of the free versions of the big models caught it either. Gemini and Qwen did notice the problem when I asked them to check the specific section again, but ChatGPT was oblivious to it even then. Kimi was apparently just busy, so I couldn't try that.
>>108389741deepseek v4 has been only two more weeks away for over a year now
>>108389719why tf would you make a decision on some streamer retard, 3.5 has been out for a few weeks and was demonstrably better than most everything else for its size.
>>108389764>>108389711I'm trying 3.5 0.8b right now and it's been thinking for like 4 minutes on a simple prompt.
>>108389772you can disable thinking and/or give it a reasoning budget.
I know this thread's for local models but I've been trying some dark fantasy RP chat and have been getting censored on every model I try while I'm using an Open Router API. Are all APIs censored to hell these days or just OpenRouter?
>>108389779well deepseek r1 1.5b works pretty well. unfortunately it's 1.1gb....
>>108389814>>>/g/aicg
>>108389814Yes. All the models are censored, but system prompt and asking in specific ways might help.
>check ollama>4m downloads on deepseek>the 400gb modelwho the fuck is downloading this?
>>108389635Very short recap. I wonder what happened.
>>108389847recap will be elongated to 1.3T in two more weeks
>>108389846>don't to bloat my docker images with model data>make docker up run hf download every time the container spins up>containers are constantly spinning up and down
>>108389862Too long. We would need a recap of the recap.
>>108389142>mistral 4 has worse benchmarks than qwen3.5>qwen 3.5 is benchmaxxed as fucktherefore mistral 4 is… ???fuck I hate the benchmark niggery so much
>>108389898Imagine the length of the thread.
what do yall folx use for tts? I've got 7gb spare vram with my llm loaded and I'd like something realistic that can read outputs more or quickly
>>108390056kokoro-fastapi, my use case is having it read document summaries, articles, etc. though not roleplaying so not having voice cloning isn't an issue for me but I expect most people would prefer to have that here
>>108389754this time its legit though
>>108390104man I cant wait to use Deepseek V4 9b though ollmao!!!
>>108390081I think kokoro can do voice cloning now
Update from ewaste ddr4 epyc server fag from a few threads back: I threw a 2060 super in to keep up the ewaste theme. PP went to 20t/s and TG jumped to 10t/sThis is still on qwen 3.5 397b at q4. I tried the new mistral and it was both garbage and only 1 t/s faster for some reason
>>108389403and of course the guy who literally cut his balls is defending cukoldery, every single time
>>108390129I couldnt find anything on their hf about it unless they released a new model under another account or something.>>108390178Don't interrupt your enemy when he's removing himself from the genepool, go have more children with your wife.
>>108390163>PP went to 20t/sI'm retarded. Does this mean a 1200 token context takes a whole minute to process before output tokens start coming out?
>>108390178NTA but what are you even talking about. How do you know this anon cut his balls?
>>108390211We reply to all shitposts (especially Twitter screenshot shitposts) as if they are universal reality here, sir.
>>108390211>How do you know this anon cut his balls?do you know how to read or something? he said he did a vasectomy >>108389355https://www.reddit.com/r/ATBGE/comments/p2zc4r/cake_for_a_vasectomy/
>>108390209Yes.
how do you do the dynamic thinking with reasoning_effort=high?I tried passing it as chat_template_kwargs, chat-template-kwargs and in request itself but NADA, this bitch doesnt want to think
>>108390227But you were replying to this >>108389403 whatever the flow of conversation and you replying hours later didn't make that very clear regardless.
>>108390229That sounds like absolute suffering. I can imagine offline processing use-cases but otherwise, oof.Respect for you CPUMAXXERS.
>>108390230or wait due to pwilkinson faggotness I cant do this shit dynamically anymore and have to use the --enable-reason shit and cant change it once its running??? hello?
>>108390187https://huggingface.co/PatnaikAshish/kokoclone
list of models better than nemo 12b that you can run on your own machine:
>>108390245>sufferingthe pp number is based 100% on gpu speed tho. eg a 5090 in the same system would be 10x faster for pp without changing a single other factor.
>>108390122Miku's gf is cute
>>108390279Miku is not a lesbian.... is she
>>108390287miku is just a sound bank anon, it's not like she has an official lore and shit lool
>>108390230I don't but gpt-oss for example accepts reasoning settings in it's templates, using system role if I recall. Don't remember the example here it has been 6+ months since I worked with that. Find mistral 4 template and find out. I'm pretty sure you can slip the setting somewhere in between.
>>108390299one thumb, difficult to type.
>>108390276That still seems an order of magnitude too slow, but again, I'm retarded.For ref, 5090 is 2600t/s PP with 27B which is apples and oranges, but still.
Reminder to not downloads Sloth models especially on early release.
>>108390299I alreayd checked the template and they work with reasoning_effort (only none or high), but passing them in the request has 0 effect. I suspect it is due to how pwilkins has a global toggle for it (MAN).https://github.com/ggml-org/llama.cpp/issues/20557a guy has made it work but you have to pass true/false in a think query parameter like WHAT THE FUCK why cant it be a prop of the request.FUCK
>>108390276For comparison, my ddr4+3090 system does 27tk/s pp, so, uh...
>>108390360On what model/quant?
>>108389153Actual official /lmg/ card: https://files.catbox.moe/mc2a7s.png
so is mistral 4 gud or shit
is mistral4 implementation broken? q8 is dumb as shit
>>108389451What is a good small?
>>108390418it couldn't be they helped with it supposedly
Holy shit Miku got a new designhttps://soranews24.com/2026/03/13/virtual-idol-hatsune-miku-redesigned-with-look-that-adds-new-elements-and-brings-back-old-ones/
Holy shit anon fucked his sister
>>108390454what's the point
>>108390355>[MODEL_SETTINGS]reasoning_effort: none[/MODEL_SETTINGS](none or high, afaik). You can use this to send it to the model. Wrap it between the other stuff. It works the same way as qwen and gpt oss.
>>108390454MIGUUUUU
>>108390501Of design? Of spamming the thread with offtopic trash?
>>108390531>Of design?yes
>>108390287She is and she isn't. Pick any Miku you like, based on any song or your own headcannon, it's all legit and it's all Miku. It's like Batman who never uses guns or Frank Miller's Batman who shoot them left and right. Both are Batmans and both can be Miku
How do I run onnx models?
>>108389721I'm worried for Gemma 4 now, there might be several reasons as for why it's been delayed so much.
>>108389738>as was gpt4loool, gpt4 was revolutionary at that time (march 2023)
>>108390545ask chatgpt. you would use onnx for, say, converting your torch model to it and embedding it into an application
>>108389201You have access? I thought they literally just announced it.
Gemma 4 will be the new local RP king
The vibecoding general keeps using gpt codex and claude code and paying for it instead of using a local model.What now?
>>108390543Of all the flavors why did you all pick "troon"?
>>108390577All I want to do is to transcribe some audio, but GGUF files don't seem to run anywhere for audio models.I find a bunch of onnx models, so I figure that could work maybe, but I have no clue what to even get. Pic related. Wtf do I even download from this?
>>108390592What anime girl is this?
>>108390595>but GGUF files don't seem to run anywhere for audio models.doesn't kobo support a lot of stuff related to that?
>>108390598>>108390608>>108390614
>page 2 bakealright>>108390599Mejiro Ardan if she wasn't a horse.
>>108390657>>page 2 bake>alright>anon can't even be bothered to hover over the posts
>qwen 3 4B -> qwen 3.5 4B is this huge upgrade?
avg lmg xperiance
>>108389647No one says that except you
>>108390668I understand anons asking for a 600b model to avoid the download. 4b you just download and test.
>>108390667fair sorry bro i'm so sleepy but i've got to keep going
>>108390672you literally said that "he's happier than you" because he's a cuck, rofl
>>108390454Femoid targeted design right there.V2 in comparison. Might as well post the others too.
>>108390686V3
>>108390690V4
>>108390682Nta. Why does not having kids piss you off?
>>108390696why does he even say "he's happier than you" though? does he know the guy? does he know me? how can he evaluate something like that?
>>108390695where is v5?
>>108390668no
I'm not sure if the arguing this thread is autists or agents prompted to behave how they think anons act.
>>108390668Yes.
>>108390686>>108390690>>108390695@grok ADD BLACKED TATTOO
Mikutroons are getting uppity again. Is it tome for another dose of blacked miku?
ye
holy shit I love migu
>>108390502>text completionbro I want to use this for work (read: I need tool calls) not to coom to some poorly written erp (I have stepfun and air for that)
>>108390785You want to use MS4? For "work"?Bahahaha!
>>108390668HUGE UPGRADE.Qwen 3.5 4b is only 20% weaker than Claude opus
>>108390819kys retard
>>108389201Pathetic if true>>108389435I use Qwen3.5 9B, it's tiny.
>>108390583The fuck are you talking about?https://huggingface.co/collections/mistralai/mistral-small-4
>>108390834Facts don't care about your feelings
>>108390849>Factsmememarks in which we can cheat on can't be counted as fact, seethe
>>108390849This is not a fact, this is your evaluation.
>>108390418Lots of weird stuff going on with the model; can't rule out implementation issues.Also, apparently it's been pretrained with a 8k token context, extended with yarn, but possibly uses NoPE? (no positional embeddings).
>>108390856No one has used positional embedding in years.
>>108390418>is mistral4 implementation broken?nope, the baguette fucks don't know how to make models that's all, only murica and the chinks have the brain to do good shit
>>108390864I'm still downloading mistral 4, but their Devstral 2 series is extremely good. I use 120B for RP and it's better than pretty much anything else I can get, both chink shit and sloptunes, by better I mean it writes more interesting texts, has a lot less puritan shit, does a lot fewer dumb mistakes. For work, devstral 2 24B is extremely good for one-token classification requests, better than all other alternatives at same or +-50% size. So I have a lot of respect for french here. My guess is that you are simply wrong about Mistral 4.
>>108390856>>108390418If a model is dumb at q3/q4 I blame the localfag for being poorIf a model is dumb at q5/q6 I blame the quantizationIf a model is dumb at q8 it's just dumb
>>108389435How do you run locals on phones?
>>108390864ALWAYS USE MISTRAL, ITS ALWAYS REGULATED BY THE EU GDPR RULES, THEY WILL NEVER BREAK THE LAW.YOUR DATA AS A WHITE MAN FROM EUROPE IS SAFE.
A model that can't handle template mismatch is unlikely to excel in multi-character RP chat
>>108390876lol
>>108390902nothing to lol about
>>108390607I got the older voxtral 3b to work in llamacpp. Wohoo. Works pretty well too
>>108389721Reminder after llama4's flop, Zucc got scammed by a 19yo chink and wasted over $20B.$20B for no results btw.
>>108390876It makes 1b level mistakes even at temp 0 at ~2k context, it can't even recall what happened in the previous reply. this is why I suspect that it's broken, it just can't be that bad.
Meta's new model outperformed 1 year old model, Gemini 2.5. The worst one from top 3 back then.
>>108391054We'll see. I have it downloaded now but sadly my cards are busy running benchmarks for an older model for work.
>>108390769No, for the same reason there's no petra spam. Mods will nuke it and the troll will get scared of the 30 day bans.
>mistral 4 "small">a tier below q3.5>only cheaper in some tasks, literally better to use 27/35B model otherwise>tries to hide it by comparing to other models>calls itself "small" while being 120BI miss the mixtral glory days...
>>108391025Isnt the latest model delayed because it couldn't keep up with claude/gemini/gpt?Not sure what he was smoking or the thought process is.Qwen had (and maybe still has) a tight grip on opensource coding/math. Kimi/GLM too. I think the latest GLM made a VBA emulator, thats cool stuff and difficult to compete.He should have made a good writing model. Localfags would promote it and the 4o foids who are probably on facebook or register love a model that goes along with anything they throw at it.
>>108391062by comparing to its own previous models* with inconsistent naming too
>>108391062>a tier below q3.5this is what 3000 tokens gets you in q3.5 btw
>>108391062Mistral fell off hard. Their creative writing model is dumb AF too. I thought I did something wrong but its the mistral API..Not pyg level stupid but its getting there.Arent their latest models just deepseek distills?This all started with those EU regulations so I guess it makes sense. Still sad to see though.Those llama 70b models were the peak. I swear those frankenstein meme merges were more enjoyable and useable for RP than the recent stuff.
>>108390819ching chong bing bong
I added this to the system prompt:>Do not second guess yourself more than 4 times during thinking. After the 4th "wait, but" write "Enough thinking, let's answer now!"Naturally, this meant thinking went like this:>6. **Constraint Check (Thinking Limit):**>* I am at the 4th thought iteration. I need to stop thinking and answer.>* Wait, I need to make sure I don't exceed 4 "wait, but" moments.>...>*Wait, looking at the rule:* "Do not second guess yourself more than 4 times during thinking. After the 4th 'wait, but' write 'Enough thinking, let's answer now!'">...>*Wait, I need to make sure I don't violate the "4 times" rule in the output.* The rule applies to *my* internal thinking process. I should stop thinking now.>...>*Wait, one more check:*>...>*Wait, I need to make sure I don't trigger a "wait, but" loop.* I have had 4 thoughts. I will answer now.>...>*Wait, I need to follow the instruction:* "After the 4th 'wait, but' write 'Enough thinking, let's answer now!'">...>*Wait, I need to be careful.*>*Let's write the response.*This is so funny to me for some reason.
loli feet
>>108391085I was too busy with work back then to truly appreciate the meme merge saga.I thought chinese models were too mid but now I'm letting them fix compilation errors in abandoned software on their own.RP back then for me was slow and couldn't follow basic instructions in the card that coming back now with even "shitty" models really surprised me.I really barely used AI stuff between mid-2024 until a few weeks ago and just checked in on news and lurked the board every month.
>>108391085coding is all that matters for real performance and it reasons as much as any model there.
>>108391094None of the EU regulations come into effect until later this year, and they will likely be delayed further before thenMistral are simply a bottom of the barrel lab that has never had anything to contribute to the industry beyond picking some low hanging fruit early on
>>108391085unfortunately, without thinking, those models are completly retarded and don't understand the nuances of conversations anymore, but I agree that they should train the model to not think too long for basic shit, the length should be proportional to the difficulty of the task at hand
>>108391213Unfortunately some rules started to apply since August 2025.https://artificialintelligenceact.eu/article/113/> (b) Chapter III Section 4, Chapter V, Chapter VII and Chapter XII and Article 78 shall apply from 2 August 2025, with the exception of Article 101;That includes this:https://artificialintelligenceact.eu/article/53/> 1. Providers of general-purpose AI models shall:...>(c) put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790;>>(d) draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office.
>pull>Error: Jinja Exception: After the optional system message, conversation roles must alternate user and assistant roles except for tool calls and results.>revert to version from last week
>>108391288Is it the parser is enforcing the order or the Jinja template itself?
>>108391213Mistral-7B and other early Mistral models used Libgen datasets at the very least, and with Nemo they probably added Anna's Archive data in collaboration with NVidia. Can't do that anymore...
>>108389879Nobody runs ollama in docker, docker sucks assWe use proxmox and the openwebui helper script
>>108389174Vivian agrees anon. https://files.catbox.moe/4k707b.wav
>>108391339That doesn't make any sense. Even most people here do not have the hard word necessary to run a 400 GB model so they'll likely just use a cloud option. I thought the entire point of Docker was to create an instance of whatever software you're trying to use without having to deal with dependency hell a server farm would absolutely use that. Pretty much every premade template on Runpod uses a docker image the creator made themselves.
>>108391279tl;dr Europeans are only going to be relevant in AI as customers.
>>108390672thats the whole gist of every major religion thoughever
>>108391336>>108391423And they were the only ones willing to make models that aren't safetyslopped to shitIt really is over and local peaked with nemo
>>108390702Perhaps if you weren't an annoying miserable fag All the fucking time shitting up these threads people would not be so hostile to you guys.....
Agentic stuff via APIremote and localIt seems as if with each next LLM, the parameter "format" to switch reasoning ON/OFF is differentAlso, should I have the reasoning ON or OFF for tool-calling? With enable_thinking: True, it can take agonizingly long for simple tasksAny thoughts?
>>108391445Any decent model that supports tool calling shouldn't need reasoning to work well but I would test to confirm. None of the good coding or general purpose models I use have a reasoning except for one, and given the one that has reasoning, doesn't print five pages worth of reasoning, tokens in order to do something simple, unlike other recent models
nvidia has shared what datasets they have used for nemotron, have they done the same for nemo? if yes why doesn't anyone here create a 70B dense model based on those datasets, maybe with some other added ones? It should be trainable with an rtx 6000 pro at fp8 no?
>>108391455>if yes why doesn't anyone here create a 70B dense model based on those datasets,If you're trying to create one that will appease /lmg/ autists (you all seem to have the bad, rigid-thinking type of autism that makes you think you are smarter than everyone else) then exit exercise in utility because they will never be pleased. That's still wasting time on creative of writing or RP. The companies will never prioritize that, nor should they. They don't even do anything useful with these models. They just ask the same useless questions and then act surprised when it does not read their mind. It's not like fine-tuning a model or even figuring out how to do. It is particularly hard so you would think if they knew better if they would just do it themselves.
>>108391439Mistral models still are among the least safetyslopped official models available. It's just that they don't have a ton of creative pretraining data that they can use anymore, for now. I suspect they explored the synthetic route to compensate for that, looking at how Ministral behaves (when it works), but that didn't work so well.
>>108391467nah people just want an uncensored model which has a better understanding of the world, rules etc. at long context. I mean everyone is still recommending nemo constantly. it would simply be nice to have nemo but smarter
>>108391455With Nemo they didn't disclose the content of their datasets, but they definitely used "books" for that; for the more recent fully open source Nemotron models they used exactly "0 Books".https://torrentfreak.com/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-books/>‘NVIDIA Contacted Anna’s Archive to Secure Access to Millions of Pirated Books’>>NVIDIA executives allegedly authorized the use of millions of pirated books from Anna's Archive to fuel its AI training. In an expanded class-action lawsuit that cites internal NVIDIA documents, several book authors claim that the trillion-dollar company directly reached out to Anna's Archive, seeking high-speed access to the shadow library data.>>Chip giant NVIDIA has been one of the main financial beneficiaries in the artificial intelligence boom.>>Revenue surged due to high demand for its AI-learning chips and data center services, and the end doesn’t appear to be in sight.>>Besides selling the most sought-after hardware, NVIDIA is also developing its own models, including NeMo, Retro-48B, InstructRetro, and Megatron. These are trained using their own hardware and with help from large text libraries, much like other tech giants do. [...]
>>108391481Be the change you want to see then...... you can literally ask llms how to do that right now, rent sone runpod gpus and do it. The companies are not going to do that for you and never will. You will never get a "smartter-nemo" (assuming it doesn't exist anywhere like you guys say). None of people will do that though because then it would deprive you of an excuse spew venom here. Not even at the companies that safety-slop the models. You will bitch at literally everyone else and make it everyone else's problem somehow just because you're a little upset.
>>108391496>None of people will do that though because then it would deprive you of an excuse spew venom hereYou're a special kind of stupid if you think that's the reason.
>>108391539what's the reason the, Kruger?
>>108391548For me,>you can literally ask llms how to do that right now, rent sone runpod gpus and do itNo I can't, in both the financial and capability senseI promise you literally everyone would welcome a bigger, smarter Nemo, but making one is not something a simple anon can do
>>108391566>not something a simple anon can do*single anon, meant to say
>>108391494>for the more recent fully open source Nemotron models they used exactly "0 Books".why?
>>108391494>0 booksthe copyright tards must be seething so hard about this lmao
>>108391575Because of lawsuits (some still ongoing) and because their models are (almost) completely open source, so it's not like they can open distribute pirated books from Anna's Archive.
so did Deepseek v4 just get forgotten about or what
>>108391654Not before Gemma 4.
>>108391654Isn't V4 just in expectation/rumor land?Or is there some sort of official word about it?
I did some RP through the API with mistral 4. Its so bad, damn.It has no clue about characters. Just wings it with generic slop to hide the missing knowledge.The old 3.2 24b seems actually REALLY good in comparison. We are definitely regressing.120b and its worse. Could have been such a nice size.Even Qwen 30ba3b did better. (still bad, but less bad, it had some grasp of the characters) So its not just the moe tax. So tiring...
>>108391666MoE models were a mistake
>>108391584haha, yeah. w-we won right bros? we sure showed the copyright tards kek!
>>108391666>The mansion is eerily silent, save for the occasional groan of ancient timbers settling. Distant candlelight flickers against the walls, casting long, wavering shadows that seem to retreat just as you pass them. A moth drifts lazily near one flickering sconce, its wings briefly illuminating the portraits lining the hall—each face frozen in expressions of arrogance or sorrow.>From deeper within the mansion, a faint jingle of keys drifts down another corridor, followed by the soft rustle of fabric. Roswaal’s study door stands slightly ajar, a sliver of golden lamplight spilling onto the floorboards, along with the faint aroma of what might be...spiced wine? A woman’s laughter—light and teasing—echoes from an unknown room, quickly muffled as if by a hand over a mouth.>Somewhere above, a floorboard creaks, though no one is in sight. The air thickens with the scent of lavender and something metallic—blood? No, just the distant tang of iron from the mansion’s old heating system.>A draft slithers down the hall, ruffling the hem of a tapestry depicting a wolf howling at a blood-red moon. The wolf’s eyes seem to follow your movement.Might share a little bit.I'm not sure what to call it, there probably is a word for it. But I'm overloaded with background stuff going on.Its like somebody took R1/V3 and put it on steroids. So much noise thats not relevant or immersive at all.
>>108391666>Even Qwen 30ba3b did better.Losing to Qwen on knowledge is a new low for the French.
it's so sad, mistral is dead. who's going to save local now?
>>108391666>advertised use case: coding and agent
>>108391665According news outlets it is rumored that is has been officially confirmed by people claiming to be in the know that deepseek might be planing to release their model sometime in the next two weeks.
>>108391666Just RAG the knowledge, bro
>>108391720Its not looking too god.Google was sued too because of gemma and copyrighted texts right?Nvidia with all their synth releases. Can't believe they didn't hide the dataset, its so bad.GLM/Kimi if you have the horsepower, but those are getting worse too.Everybody goes full agentic/coding. I was hoping for a saudi prince to rescuce us all but they are getting bombed.
>>108391785I called it when i said that the last good rp models we will get are mistral small 3, og nemo and glm air.
>>108391758exact
>>108391758RAG the sex
>>108390847I thought you were referring to Mistral large.
>>108391279You are getting EU bureaucracy'dhttps://www.medialaws.eu/eu-ai-obligations-for-gpai-providers-compliance-enforcement-deadlines-2025-2027/It became law since August 2025, but compliance is still in a "grace period" and the real deadline before enforcement is August 2026. Notice how no lab has disclosed jackshit about their releases since last August with 0 repercussionsMore importantly, all the major labs have already made it clear they have no intention of complying with the law as-is, which means there is a very high chance the enforcement date will be delayed again until lobbyists do their thing
>>108391845>Notice how no lab has disclosed jackshit about their releases since last August with 0 repercussionsI mean, pretty sure it was related to how ministrals were made though, as distills from small 3 which was from before the deadline, and afaik they need to disclose stuff to the EU committee thing, not like general public
>>108391845They have a page intended to show how compliant they are.https://legal.mistral.ai/ai-governance/models>Welcome to Mistral AI's central hub for documentation and resources relating to the AI Act and other applicable AI Regulations.
this is a cool paper methinkshttps://arxiv.org/pdf/2603.14315
>>108391919thanks validates what I >>108391884 said
>>108391946Now try models released after August 2025.
>>108390346Is there any actual reason or it's just schizo crusades? Unsloth does help me a lot in making finetuning simple.
>>108391720GLM 5 Air
>>108391985I can't breathe
>>108391975They fuck up often and seem genuinely incompetent even when people try to explain stuff to them.They are quick to make goofs though.
>>108391738
>>108391956Actual general-purpose base models released after that date listed there are Mistral Small 4 and Mistral Large 3, by the way. They don't seem to be considering finetunes (or distillations) of older models as new models, but it's obvious it's a trick. They're trying to buy time for those in the hope regulations will change, but they're already complying to EU laws for completely new models.
>>108391884https://artificialintelligenceact.eu/article/53/The AI act requires labs to>(d) draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office.https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-modelsAnd here is the template in question, which requires publicly disclosing>(ii) nature of the content (e.g. personal data, copyright protected content, machine generated data such as Internet of Things or synthetic dataNo such public summaries exists yet, despite the law theoretically applying since August 2025, because there is no enforcement mechanism in place yet and nobody cares to comply until thenIn Mistral's case specifically, the closest thing they have to the EU-mandated public summaries is their "technical documentation"https://legal.cms.mistral.ai/assets/d0b7b04d-dcb5-412d-bb45-c63b1475b805Which largely ignores the above template, avoids disclosing any specific dataset, and completely handwaves the copyright question with a>In particular, the Mistral Small 4 training dataset comprises a mixture of publicly available datasets and internet sources, private non publicly-available datasets licensed or otherwise obtained from third parties or partners; synthetic datasets; and Mistral AI user data used in accordance with Mistral AI’s terms of service. The datasets used by Mistral AI to train Mistral Small 4 may contain content that is subject to intellectual property rights or in the public domain. For the avoidance of doubt, the specific status of each dataset depends on a variety of factors such as applicable laws, commercial licenses, or the type and characteristics of data.
Do i get this right: lm studio can use TTS models, but has no built in function to read out what an LLM in it has written? You always have to do it over some API instead. That sounds kinda dumb.
just did some tests and mistral 4 has less trivia knowledge than qwen 2.5 32B. What are the french doing?
>>108392077how safe is it though? does it output less harmful content?
>>108392077EU love <3
>>108392078Haven't tested that, but it's hela fucking slopped, somehow worse than qwen 3 + gemma 3 combined.
>>108392095It wouldn't surprise me if they asked NVidia for pretraining dataset help this time around, which would explain the lack of knowledge. Perhaps under the hood this is a model composed of mostly fully open-source datasets published on HuggingFace.https://mistral.ai/news/mistral-ai-and-nvidia-partner-to-accelerate-open-frontier-models>Our collaboration with NVIDIA and other coalition members reflects a shared commitment to:>> Transparency: Open-sourcing models, data, and frameworks for global access.> Collaboration: Fostering a community where innovation is collective, not siloed.> Impact: Enabling developers to build the next wave of AI applications on a robust, open foundation.
>>108392135So it's safe to assume that all subsequent models will also be shit from now on?
>>108392145Stop dooming you insufferable schizo.
>>108392152It's called being realistic.
>>108392156autistic*
>>108392037So what I'm getting is that either Mistral wants the good boy points and they're trying to get their shit in order even before the law comes into effect, or they don't know what they're doing and they've been spending the past year trying to copy the chinks' homework with worse data.
>>108392135>a groundbreaking global initiative uniting leading AI labs to advance open, frontier-level foundation modelsmeanwhile the models are literally useless. am I missing the point here? what can nvidia's latest aborted fetus be reliably used for?
>>108392180Benchmarks
>>108392180agents and coding saar
All right. Reporting. Tried unsloth-Mistral-Small-4-119B-2603-MXFP4_MOE. Its not good for RP. I'm reverting to Devstral 2. That would be all.
>>108392226kekekekek
>>108392156>It's called being autistic.
>>108392226My guess is that you are simply wrong about Mistral 4.
>>108392236>My guess is that you are simply wrong about Mistral 4.guess based on what?NTA, but I also tried smol 4 at q8 and it was trash that couldn't walk and chew gum at the same time
>>108392226> unslothwhat's wrong with you
>>108392175Logically, you'd think Mistral would never be in danger due to being the EU's only AI champion and they only have to make a token attempt at compliance while the regulations serve to shut out their competitorsBut the EU has shown time and time again that they are more than happy to gut their own industries in exchange for being able to fine the US giantsMistral is probably just as in the dark as anyone else and trying to comply in whatever way they think will be reasonable enough to keep the commission off of their backs
>>108392243>guess based on what?>>108390876
>>108392256>>108392236Could still be a broken implementation. It is written by mistral, but they could easily have pushed the PR without even verifying it produces the same result.
>>108392256"Past performance is not a guarantee of future results"but at least you're going on more than just vibes>>108392261>Could still be a broken implementationI guess we can hope. If what I ran on my rig is indicative then things are looking grim
API version is shitty too though
>>108392251The other possibility is that they're overeager to comply specifically so they can keep their "champion" statusBasically accepting they can't compete with the US/China and instead just pandering to the local bureaucrats so they can keep getting gibs, model quality be damned
>>108392077EU regulations. They don't have any training data and they can't use too much compute. The EU basically kneecapped AI development.
>>108392296>they can't use too much computeThey can but at that point they are subject to disclosures.
>>108392296Except if >>108392037 is any indication, Small 4 does not comply with regulations
>>108392077>Muh niche trivia Not trying to make excuses for companies, but use case for that?
>>108392325for me, spreading the good word about the model being good for coomseriously though, who the fuck will deploy this and why?
>>108392325GLM 4.7 knows what /lmg/ is.Qwen 3.5 doesn't.GLM 4.7 is better at programming.
>>108392335coronation causation dear sir
>>108392335Use case for knowing what /lmg/ is? That wouldn't necessarily lead to better coding ability because most people do hear is speculate about future releases and then bitch about an inherently non-deterministic technology Not exactly what they wanted it to do the first try.
>>108390876>Devstral 2Using this too. I 100% prefer it for RP over GLM 4.6 when it comes to dialogue and writing, until about 6-8K context where it starts making retarded mistakes and sounding sloppy, where GLM will keep going until 14k or so.
>>108392363which size of devstral?
>>108392352It doesn't seem odd to me at all that varied training data increases performance across all areas A model trained on github repos and ao3 fanfics where Ron gets knotted by Harry will perform better than a model trained only on github repos.
>>108392375seems very odd to me though
stop trying to have sex with code models
Code with sex models.
>>108392515Code models are sex coded.
>>108392296it doesn't matter because the EU is better than the US so get fucked> lol healthcare> lol ICE> lol unsafe schools> lol required to drive a car to cross a road with no crossings anywhere
>>108392531Both can be true.
>>108392531>> lol ICEimagine being so pozzed that you think immigration enforcement is a bad thing
>>108392531I don't care, all I want is a new nemo
>>108392367123b.
oh gawd im benchmaxxinghttps://huggingface.co/miromind-ai/MiroThinker-1.7
>108392531Where's /wait/anon ? We need containment
>>108392585We needed containment over a week ago when the openclaw retards started flooding in. It's far too late now.
>>108392375>It doesn't seem odd to me at all that varied training data increases performance across all areas Diversity in the data set is important but you're misunderstanding how that works, like a lot.... In order for a model to be good at programming it needs to be shown examples of good programming and examples of "conversations" where the assistant helps a user through a problem. Diversity in the data sets isn't important just for diversity sake. You can't just throw random shit into a pot toss it in the microwave and then expect a Michelin star level dish. You need to be intentional about what you incorporate within it. I'm convinced you guys are just hyper fixated on the models shitting out niche information just because you got bullied and to pretending to get matters. >A model trained on github repos and ao3 fanfics where Ron gets knotted by Harry will perform better than a model trained only on github repos.Explain to me how someone's shitty fanfic being incorporated into the data set leads to a model being better at programming at less prone to hallucination? You can't because it makes no sense. It would lead to better "generalization" and potentially even the model not being as safety cucked but it will only help in that particular area.
https://unsloth.ai/docs/new/studioguys, retard brothers are at it again
>>108392623>it needs to be shown examples of good programming and examples of "conversations"Did you really expect me to explain the entire LLM training pipeline just to make a point that diverse data makes the model better even at tasks that are not directly related to the data?
can we have a gguf quant cheat sheet in the OP? Speed, quality, this sort of thing. For example I heared that sometimes a larger quant can be faster but it also depends on the type.
>>108392657>I heared
>>108392628>Run GGUF and safetensor models locally on Mac, Windows, Linux.lmfao.cpp is done for
>>108392657with the exception of iq quants (they run slightly slower when offloaded), it's really simpleif it fits into the bits nicely 2,4,8 bits, they run fasterodd bit quants run slower since their memory access patterns don't align nicelyall quants run faster the smaller they are
>>108392628Imagine how unstable it is
i like my models knowing dumb shit about /g/
>>108392681What about NL and TQ?
>>108392628>same one gui
>>108392687I know this is qwen because it has that classic hello fellow kids meme energy.
>>108392700Mental illness, not a real quant.
>>108392706bzzt wrong
>>108392687kek
>>108392706qwen doesn't even know it can see images and pretends it doesn'tthis is just kimi with some prompt
>>108392706looks like kimi slop to me
>>108392645If it has diverse programming type sample then it will get better at programming. Yes. Incorporating fan fictions into both the pre-training and SFT phases of training will lead to better generalization (Not being only good at conversing about one domain. Not being too rigid as to what it can and cannot talk about, not being too rigid about how it can speak, Not being too limited on instruction following capability, etc.) With all that said you keep failing to explain to me why Harry Potter fanfic being in the training directly correlates to better programming ability. If a bunch of the stories have no discussions about programming, how does that lead to the model performing better in a separate domain? A diverse data set for Ben's catastrophic forgetting but it does not necessarily mean a automatically gets better in one domain. The programming portions of the data set have to be high quality for it to be better at that domain. The storytelling / RP portions of the data set need to be high quality (highly suggestive) in order to not be shit. Etc etc. A diverse data set is meaningless if the samples are garbage.
>>108392724yeah it's just kimi with a prompt that tells me to give me an uncensored description of the image using casual language/slang.
>>108392731>Incorporating fan fictions into both the pre-training and SFT phases of training will lead to better generalizationGlad we agree.
>>108392724Good to know that I don't have to bother with kimi then. GLM is a lot better at pretending to be an anon without sounding like a parody and mixing in zoomer language.
>>108392584So this is the power of a modern day 235B dense model. Honestly, I'm not surprised.Looks brilliant, I can't wait to see how badly it destroys MoE shit in actual comparisons.
>>108392758it can't see your cock thomassive disadvantage
>>108392747Better generalization does not automatically mean increased quality or performance in a particular domain. I can learn three different sports with enough practice but if one of my trainers is shit but the other two are world-class, I'm going to be worse at whatever sport the shittytrainer is trying to help me in. Does that analogy make sense? Garbage in. ---> garbage out.
>>108392759> "architectures": [> "Qwen3MoeForCausalLM"> ],
>>108392759anon... that's a qwen 3 235b-a22b finetune
>>108389142
>>108392798No the analogy doesn't make sense because the quality of the data is irrelevant when the comparison is between two model trained on the same data but one is also trained on smut.
Has anyone here worked with voice2animation local models? I'm having issues with performance for my project. Running LLM, TTS, V2A, and lip syncing models all at the same time with low latency as a goal is proving to be extremely difficult. Even giving each program their own CPU threads to minimize CPU contention and or having the some of the programs run with a convoluted sequencing system isn't really working.Very unhappy with PantoMatrix EMAGE right now. It's a two year old model and the BEAT2 dataset it's trained on is derived from public speeches (think Ted Talks) so the gesticulation output looks pretty unnatural for natural conversation. Problem is there are no good alternatives. The only thing that might look like a decent option is Meta's SARAH, but they haven't released any models yet--just the training dataset.https://files.catbox.moe/ng51nv.webm
>>108392840there's a reason why people who are making the ai gooner tubes are making six figures a year and work for companies that raise millions and millions of dollars from investors
based thread. mikulosers want to fuck their sisters
>>108392724>qwen doesn't even know it can see images and pretends it doesn'tmine seems fine with them
>>108392868>>108383821
>>108392868this anon's had a problem however >>108383821
>>108392840Absolutely unrelated. But just like I found wav2arkit, I also randomly found this:https://huggingface.co/zeropointnine/yamnet-onnxIt categorizes sound events. Maybe you'd like to integrate it to have your ani react to random audio from your mic.
>>108392816Sex.
>>108392830So you're telling me that the data being shit leading to the output being shit makes no sense to you? It's complaints I always hear both here And even other places is that a lot of models sound too flowery, corporate, slopish, riddled with "gpt-isms", etc. That's largely because the companies who implement the data sets for the training choose to sterilize the data sets of anything "problematic" or anything that could get them in legal trouble with copyright trolls. And this very thread someone even pointed out that Nvidia not (publicly) incorporating any books in the training was likely the reason that family of models sucks now. >>108391618>>108391575>>108391494>>108391455The data set quality has a very very large effect on the quality of the output data. I get you'll have a hyper fixation on smut generation and don't care about any other use case. That's fine. I don't really care for smut generation that much. But there is a fine line between not caring about a certain domain and flat out putting out misinformation to sooth your own favor or turbo autism (and not even the good kind where it at least makes you good at a particular thing. The stubborn, annoying kind)
A model that knows more things is better than a model that knows fewer things.
>>108392801>>108392810Oof
>>108392895>sooth your own favor or turbo autismsir pls
>>108392860I've been doing some more reverse-engineering behind Animation.inc's process (they made Grok Companions and Razer's Ava) and my understanding at this point is that their "voice2animation" system doesn't actually generate locomotive frames (6D for each bone--extremely taxing on hardware) from speech directly. I think what they do is they have a have a complex pre-rendered BVH mocap library and their AI model simply manages blending and cross-fading between those premade animations in accordance to voice analysis. This seems a lot more computationally lightweight in theory, but it also sounds extremely complex to manage/set up and there are no open-source implementations from what I've seen.
>>108392904Better is subjective if you're not using a specific metric to define "better". Better at what? Coding? Coom? Drafting up new cooking recipes? If you want it to be good at all of that it has to have good examples of all of that
>>108392895I made a very general point about more diverse data being better and you barged into the conversation with>b-but what if the data is bad>b-but you need to have instruct data tooCompletely useless fucking comments.
>>108392927>Better at what?Everything that their training data allows. Data diversity helps.>If you want it to be good at all of that it has to have good examples of all of thatI don't want them to be good. I want them to be fun and interesting.Knowing more is better than knowing less.
>PocketTTS.cpp14.24s audio in 3.67s; first chunk latency: 98msThe CPU is i7-11700
>>108392884Not really useful for my project since it's just a sound classifier (laughter, glass breaking, keyboard typing, etc) but it's somewhat interesting regardless.I haven't even integrated speech-to-text to my project yet because I'm already pushing against my hardware's limits as is, unfortunately. picrel is the ideal system architecture I'm going for at the moment..
Someone here >>108392375 said very data leads to better performance. I simply said they were caveats to that. You proceeded to incorrectly claim that data quality is irrelevant here >>108392830 like a bumbling buffoon who, like a lot of LLMS ironically, is confidently wrong. If you want to continue to not use your own fucking head more power to you. Garbage in, garbage out. This was well established well before LLMs were even popular. A diverse data set leads to better generalization but generalization and output quality are not the exact same things.
>>108392957Cool. Thanks for the profile report. Is it working well for you?(still have some potential performance optimizations in the works for that btw)
>>108392969nta and imo but I think applies to a few here, probably would prefer a somewhat mediocre true generalist to a good coder that does only that
>>108392958>Not really useful for my projectIt can greet you when you open the door to your office, react to your microwave dinging, make fun of you when you drop something.>I haven't even integrated speech-to-text to my project yet because I'm already pushing against my hardware's limits as istts takes very little. I suppose it's the stuff in the middle that takes the most. I don't remember if you tried piper, but that one is lightning fast (no streaming, but you can split by sentences or something. A single sentence takes less than a second on old cpu).
>>108393004>>108393004>>108393004>>108393004>>108393004
>>108392969You must be the guy from last thread that claimed that incest (smut in training data) is bad because the guy probably can't fuck anyone else (the data might be bad quality).
>>108392973So far so good, thanks for adding Windows support. If it can be even faster I'm all for that, have much older junky Intels I could be running a good tts on.
>>108393024Glad to see you've been proven wrong so you resort to "you're this anon I don't like" fuckery. You are misguided, wrong, stubborn and stupid and you know it.
At what point are we going to ignore the trolls and start making normal threads again?
>>108393040That's what's happening though? Mike trolls are being ignored.
>>108393000>It can greet you when you open the door to your office, react to your microwave dinging, make fun of you when you drop something.Fair point. I added it to my notes.>tts takes very little.Ehh. I wouldn't go that far. It's definitely not the bottleneck. The benefits certainly outweigh the costs. I tried Piper initially, but I found the voice quality and latency to be pretty bad and it doesn't support voice cloning. One of the main issues with Piper was the lack of FFI support, so the only way to get fast performance was to use an HTTP server. Using a webserver to spawn the process manually for each LLM chunk request was awful. Overall I'm really happy with my Pocket TTS implementation. EMAGE and wav2arkit are what is raping me right now.On a separate note. I probably could actually integrate STT without performance worries because it's totally separated from the usual inferencing cost that happens after LLM output, since it happens BEFORE LLM inferencing. Hopefully that makes sense.
>>108393037I am just pointing out that what you're doing is similarly retarded to what he was doing.
>>108393065That's not at all what I'm doing. Nowhere did I imply having smut in the dataset is bad, or having x type of data in the dataset is bad. I'm saying QUALITY matters. You have no business calling anyone retarded when you starlight up said data quality is irrelevant >>108392830
>>108393040He hasn't been very consistent so I have a feeling he will give up eventually.
>>108393033>So far so good, thanks for adding Windows support.No problem. Good to hear.Would anyone be interested in my EMAGE onnx export script btw? For some reason nobody has ever done this before, which seems insane to me, so I built it myself. I could set up a repo for that within the next couple hours. I'd really like to see more anons in general play around with the LLM -> 3D character animation pipeline. I thought you guys wanted your own waifus, kek.
>>108393063>FFIWhy would you need that? Just load the onnx models yourself and run them like you do with the rest of your models. But yeah. If you need cloning, it's not gonna help you.I managed to run wav2arkit faster than realtime with a little demo thing. But I was just running tts and wav2arkit, without all the other overhead you have. All those little things add up.>I probably could actually integrate STT without performance worriesIt depends on if you have it running all the time or start it with a button or something. silero has a few small models for voice detection that you can let run continuously for auto-detection, but it will add another drop of overhead to everything else.
>>108393106If the quality of the data is bad then both models will be bad but the one with smut will still be better because it also knows what knotting is.
>>108393117how does that help me vibecode lamo.cpp prs though?
>>108393117Why does knowing what knotting is correlate with programming ability (or any other domain that has nothing to do with knotting)? Are you trying to pretend the concept of catastrophic forgetting doesn't exist? Based on a conversation you likely either don't even know what that is or like pretending it's not as big of an issue as it actually is with training. Like dude I get it you want your models to make you nut And there's nothing wrong with it, but you don't have to be a glue eating retard about it.
>>108393110>EMAGE onnxWhat's the inferencing speed?
>>108393155I don't know why it correlates but the examples we have so far show that it does.
>>108393167Like?
>>108393110For what it's worth, I would be interested.
>>108393115>Why would you need that? Just load the onnx models yourself and run them like you do with the rest of your models.Well for me one of the design constraints is to have everything run in one terminal window (kinda a tism thing desu). So before I was using Deno to spawn the Piper binary every for every text chunk and it was a huge latency bottleneck. That's why FFI is necessary, because it removes the overhead of spawning and prewarming the model on a constant basis.>I managed to run wav2arkit faster than realtime with a little demo thing.Yeah as a standalone process it's only 50ms. Quite fast overall, really. But with all of the overhead costs it's taking around 400ms (largely because of EMAGE).>It depends on if you have it running all the time or start it with a button or something.I'm thinking I would set it up like a voice messaging type of system. The annoying thing is that without a full-duplex LLM, an LLM can't take in streamed text input from a STT engine, so voice messages is really the best I can do.>>108393163Hard to say because of my overhead costs with the full system right now, but it usually hovers around 500-700ms per window (64 frames iirc, aka 2.13 seconds of audio, iirc). But if you look at the video I posted earlier it appears much worse in practice. Not really sure why that is desu.>>108393178Cool. I'll work on setting up the repo. Fair warning, the script is vibecoded dogshit right now, but it works fine.... so uh... yeah.
>>108393191>But if you look at the video I posted earlier it appears much worse in practice. Not really sure why that is desu.Actually this is probably because it has to wait for the LLM to finish a full sentence and the TTS engine to process it before it can even start working.
>>108393191>spawn the Piper binary every for every text chunkBut you already know how to run onnx models. Just load the model and keep it in memory. You don't need piper. You just need the models. Again, doesn't matter if you're not gonna use it, but the whole approach seems wrong.If I were to do it, I'd just load the model on a forked process/thread and send it text over pipes or something.
>>108393231You're absolutely right!Nah but seriously though. This was a long time ago before I knew the right approach to take. Piper was my first TTS implementation, then I switched to Kokoro, and then I started using Pocket TTS. All I'm doing is describing why it didn't work for me initially, not why it "couldn't work".
>>108393231If you don't care about voice cloning I wouldn't even use Piper anyways. KittenTTS is waaaayyy faster and has decent (for its size) generic cute anime voices.
>>108390876as a frequent devstral user, I found mistral 4 very, very disappointing. I hope it is a bug because holy kek.
>>108393040I'm of the opinion that petrus is a paid NovelAI troll, since the threads that are usually trolled are always related to local models: /ldg/, /sdg/, /lmg/, /hdg/. But aicg, dall-e and other cloud threads are never touched.
shitzo alert
>>108391946Wait, ministral is just a pruned small? Nothing new added? What the fuck do I want with it then when I can just run small
>>108392305If only Mistral was Italian, they could just lie about the compute