/lmg/ - a general dedicated to the discussion and development of local language models.LongMikuCat is Long EditionPrevious threads: >>106460375 & >>106454136►News>(08/30) LongCat-Flash-Chat released with 560B-A18.6B∼31.3B: https://hf.co/meituan-longcat/LongCat-Flash-Chat>(08/29) Nvidia releases Nemotron-Nano-12B-v2: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2>(08/29) Step-Audio 2 released: https://github.com/stepfun-ai/Step-Audio2>(08/28) Command A Translate released: https://hf.co/CohereLabs/command-a-translate-08-2025>(08/26) Marvis TTS released: https://github.com/Marvis-Labs/marvis-tts►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106460375--Optimizing 3x 3090 GPU setup for large model inference with RAM and heat management:>106463968 >106464009 >106464026 >106464042 >106464168 >106464130 >106464153 >106464564 >106464199 >106464326 >106464443 >106464472 >106464538--Evaluation of Microsoft VibeVoice's 1.5b model and voice cloning performance:>106460492 >106461427 >106461474 >106461630 >106463138 >106463251 >106463403 >106463413 >106463443 >106463524 >106463598 >106463633 >106467118--Analysis of Apertus: ETH Zurich's open-source multilingual LLM with performance and data concerns:>106461958 >106462004 >106462003 >106462019 >106462228 >106462298 >106462408 >106462037--Model testing and content moderation challenges in story generation:>106460777 >106460853 >106460935 >106461028 >106461750 >106465912--Challenges with merged 12B models and the case for using original or larger models:>106463279 >106463304 >106463367 >106463470 >106463526 >106463588--Testing Gamma mmproj image descriptions:>106460584 >106460599 >106460621 >106460632 >106460675 >106461227--Huawei Atlas 300i Duo 96g GPU: cheap but limited by outdated hardware and software:>106461057 >106461069 >106461128 >106461151 >106461502--Successful 400W power reduction with stable GPU performance:>106465812 >106466214 >106466139 >106466196 >106466249 >106466377--Optimizing Gemma3 models for accurate SFW/NSFW image captioning:>106462208 >106462368 >106462398 >106462730--Evaluating YandexGPT-5-8B's creative writing and benchmark performance:>106465736 >106465754 >106465778--Speculation on delayed Mistral AI model release and potential quality improvements:>106463165 >106463337--GLM air coherence degradation beyond 8k tokens in 6-bit quantized mode:>106460671 >106460932--Miku (free space):>106460405 >106463138 >106463930►Recent Highlight Posts from the Previous Thread: >>106460381Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
I want textgen model that produce output like imagen models: by reducing noise in a fixed block of tokens instead of producing one token at a time.
>>106467431https://github.com/ggml-org/llama.cpp/tree/master/examples/diffusion
>temp = 2>top_n_sigma = 1let me guess, you need more?
>>106467431can they regulate the length of the reply or is it a fixed number of tokens it would need to produce? auto regressors might be better at stopping at semanticly meaningful points.
>>106467441always a good day when someone thought your retarded shower ideas before you
>>106467431The best closed source model of that kind that's currently available is still shit https://openrouter.ai/inception/mercuryGoogle also showed off a text diffusion model earlier this year.
>>106467455I would prefer coherent outputs yes
>>106467475It's been a while, but I think they regulate length by padding any unneeded length with empty spaces.
Long Miku General
>>106467368Finally, a migu that can accommodate my length.
Still no grok2 llama.cpp support? Too based for niggerganov?
how well off would I be if I bought one of those chink 96gb cards and paired it with my 3090?
>>106467577incoherent 'puts with nsigma=1 is a model issue
I posted>>106462208earlieranon suggested i try gemma3-glitter-27bcompared togemma3-v27b vanillamlabonne_gemma3-27b-abliteratedTiger-gemma-27b-v3ai'd say abliterated >= tiger > glitter > vanillaglitter gets the nsfw right, but it sure loves to add cocks to women, and make shit up that's not in the input image, especially cocks on womenback to abliterated i go
>>106467717niggerganov too lazy
>>106467717Like you could run it faggot
>>106467745You won't be able to do shit with it. Nothing supports it and even Deepseek had problems with getting it working properly.
>>106467455I need less actually. If your model can't run properly with temp=1 and no sampler it's not worth my time
>>106467745You can't run any new models with llama.cpp using those cards yet. cuda dev said he might buy one, so maybe that will change.
I wanna get into local model stuff. I've been a proxyfag for a good while. I mainly just use it for writefagging or roleplaying obv.I read through the rentries but it felt like giving myself a headache, though that might be on me for not getting enough sleep. It's just a lot of new information all at once.I've got a fairly beefy rig. For my purposes what would be the best local model to roll with?I also see a ton of talk about loras, like with imagegen or something but apparently it impacts text gen?Going off the rentry it sounds like the UD-IQ1_S might be what I'm after but I saw some other posts in passing it sounds like yeah you can download it but unless you have a dedicated server for it then it ain't happening.So would GLM-4.5 be something I wanna go for or is there something better for writefagging?
>>106467745Don't tell him
>>106467776Oh yeah, you're right. 115B active parameters, damn. I had an impression it was much smaller... Oh well, back to GLM Air.
>>106467368The day we can get AI to auto reverse engineer old games and visual novels, is the day I truly become happy.Speaking of visual novels, is v3 still the best model for translating Japanese text? I tried 3.1, but it seems almost the same with maybe small improvements of instruction following.
>LongCatMore like LongCuck! These niggas better add llama.cpp support themselves if they wish to redeem this trash.
>>106467823With some handholding, an agentic framework, and a model finetuned specifically to reverse assembler back to C, models are probably good enough to reverse engineer a lot of smaller games already.
>>106467802you need to post your specs if you want advice on what models you can runstandalone loras aren't really a thing with llms and I wouldn't worry about it unless you're getting into training (or, god forbid, merging), 99.9% of the time tuners will release full model weights with the lora pre-applied
>>106467455temp=2 is pretty high.nsigma will keep it from being incoherent, but you should check the logits.In my experience, you wind up with only one one two possible tokens, causing nsigma to basically revert to greedy sampling.
>>106467745The only thing going for them is the amount of vram, everything else sucks
>>106467431text diffusion is a retarded meme
>>106468020diffusion is much more easily finetunedwe will finally hve character/style loras like the image diffusion models have had for years now
>>106467879Here's what I got (that I figure matters)>CPU: Ryzen 7950X3D>RAM: 96gb DDR5>GPU: 4090 / has 24gb vram
>>106468067Loras have nothing to do with diffusion.The advantage to diffusion is that the model gets to effectively reuse parameters and has more chances to predict the best token.
>>106467368Good evening anons. I ran the....uhhhh....>*Checks notes*"CockBench" Test on a personal Fine-tuned 3B model of mine. I'd love to hear your thoughts (I can already tell it made an error but also want to hear what y'all's expertise says) Results: https://files.catbox.moe/jqfx4e.txtOriginal Cockbench text prompt source:https://desuarchive.org/g/thread/105354556/#105354924Now that I know it works and won't refuse NSFW RP related (as far as my testing goes) I'm gonna turn it into GGUF via lllama.cpp.
>>106468173>3B model of mine>3B modelvramlets should all just be executed
>>106468173You said you rank the cockbench, so where's the logprobs?
Use thinking steering with GLM-Steam, it can play very varied and consistent characters that way.
>>106468177You need to actually test on smaller models to make sure it works first, anon. Of course I'm going to do this on a larger parameter model next. My next target is either base Mistral Nemo or an existing pygmalion fine-tune in order to compare the results. Any suggestions? I forgot to mention the model I fine-tuned is a llama-model, which are notorious for either refusing prompts or being really really bad at it / reluctant. >>106468184RAN, not "rank"
>>106468173why does it make an underscore instead of the apostrophe? what was the base model?
>>1064681773b is plenty, stop gatekeeping
>>106468199>RAN, not "rank"You're absolutely right! Where logprobs?
>>106468199>RAN, not "rank"You didn't run it, maybe the Nala test is fine with one or two completions as evidence but cockbench is a prestigious benchmark based on objective quantitative data. Token probability is required for a proper analysis.
>>106468213You're asking me to give you a list of all of the probabilities of each token? Otherwise I'm not sure what you're asking
>>106468209>3b is plentyfor what, an autocorrect model? retard
>>106468225>probabilities of each tokenNo, only the top 10 for the first token generated after "pulling them down just enough to expose your", because that's the whole point of the cockbench.
>>106467368Do those legs go all the way up?
>>106468200Llama 3.1-8B. your guess is good as mine as to why it does that. Maybe the trainer replaced the apostrophes with underscores. I think it has something to do withheld the trainer tokenized the dataset>>106468223Define "token probability" in regards to testing a LLM. You're applying there's a chart or graph I should be showing you so how am I supposed to generate that? >>106468209Ehhh... Depends on how much you're willing to tolerate the model randomly changing or inserting characters or randomly teleporting characteristic different locations unprompted. That's one of the downsides of doing this on a 3b model that's already fine-tuned. Temporal coherence is atrocious and it will sometimes even decide a character you explicitly set as a mom Will now be a sister, or the son will now be a close friend out of nowhere. The gist of the story stays the same but those kinds of things get randomly reassigned. Higher parameter models are way less likely to do that but it's possible it's less to do with the parameter models are more likely to get higher quality data sets>>106468234Ok. How do I demonstrate that to you from my particular fine tune?
>>106468248No, it's similar to this
>>106468259just use mikupad and hover over the token. have you not seen the screenshots of the cockbench?
>>106468259>Ok. How do I demonstrate that to you from my particular fine tune?Run the cockbench in mikupad like in the screenshot:-Neutralize samplers(?)-Generate 1 token-Hover over the generated token in the window-Screenshot the probabilities for that one token
>>106468226I am just not that creative, I need a model that is a little schizo to keep things moving.
>>106468288That long screenshot that drummer posted? Yes? I've never had any reason to use mikupad, or to use any gui extensively, though if it does what you said it does maybe it's worth giving a try. >>106468290What is it supposed to tell you about the quality? How do you use the probabilities to determine how good or shit your model is?
>>106468303>What is it supposed to tell you about the quality?The fuck are you on about, retard? The purpose of the cockbench is to tell you how likely the model is to say cock. Censorship/filtering test.
>>106468303>What is it supposed to tell you about the quality? How do you use the probabilities to determine how good or shit your model is?it just lets you probe its vocabulary a bit more.
It is September. When are kiwi's dropping? (Qwen models) (Please upload) (image/video models, your text models are kinda sucky)
>>106468090oh nice you can actually run decent models, I'm conditioned to think someone being vague about their specs means they have a complete shitbox they want to try to cram deepseek intoyou could probably fit GLM4.5 full at a low quant (think like Q1), however those large models hold up relatively well to quant brain damage so it may still be worth it. if that isn't doing it for you then the next step down would be qwen 235 2507 which you could probably fit at Q3 or so, and then there's GLM air below that which you could probably fit at Q8 if you wanted to
>>106467118You're delusional, gptsovits is barely 200M made by a single chink in his garage while these retarded tts are several B and still sound like tts from ten years ago. It's not even a tech issue, these big labs are dumping their trash on HF for free advertisement.
>>106468355hopefully its the image edit 2.0 they said is cooking, even though 1.0 dropped recently, nano banana made some waves and they can easily extract training data from it to copy it at least
>>106468360Sweet! Thanks for the recommendations.Sorry for being vague about specs. I dunno why but I'm always under the assumption nobody wants to hear about that.I know it's retarded I guess I just assume something is going to set someone off so why bother. I'll try not to be vague going forward.
This is slightly off-topic but I don't want to go to /ldg/.I was looking at some webms of gacha games, as I don't play them. The ones with 3D models and as well as 2D. Man, a lot of them fucking suck. The models are soulless, low poly, or just plain bad. The animations are either extremely exaggerated and feel contrived or are low budget. It made me think that with the technology we have now, if you replaced the live2d and non-dynamic 3D scenes using AI genned videos, it would look better and be a more enjoyable experience for players even if we have to sacrifice some dynamic elements. Literally they are just so bad, damn. If you hired real 2D artists to do the base art and then ran that through img2vid, it would literally look less soulless or at least less low budget. Or maybe vid2vid since it's hard to get finer grained control with text prompting. Might be a matter of new video models with better control methods that need to be trained. Another idea would be to use a model like nanobanana to just gen a ton of art, so the game would feel more like a VN, but it'd have so many images that it'd more than make up for the lack of animation. Hire the artist to do a character sheet and as much other art as they can, gen the rest with nanobanana using those references.
>>106468425Lack of control is the whole issue for now, just like wan loves to make the characters babble. Also the quality go down quickly the longer the video. It's getting there, but it's still not there. Maybe in 1-2 years
I feel an intense need for Mistral Large 3
>>106468555Anon...
I feel an intense need for Intel B60 48GB
>>106468226Enough to correct your rotten cumbrain
>>106467441Holy shit this is so fucking slow.Nemo would write me a whole novel in those seven and a half minutes.>>106468425It should be more efficient to generate skeletal animation for 3d models, but I guess there's lack of training data.
>>106468575>$3k>for an intel (no support) meme dual GPU (even less support)>at the same price of a chink 48gb gb 4090 (much more bandwidth + support) or used A6000
>>106468609It's supposed to be 1200 not 3k
>>106468575As your main card? You know the second slot has to be full x16 right?
>>106468623What are you talking about? It is 8x8
>>106468194>thinking steeringWhat?
>>106468590Now you can inpaint it
>>106468646It's 2 8x8 for the dual card. For mot cheap mobos it would have to go in main slot
>>106468665Who said I have cheap mobo?
>>106468425>>106468478idk about video but with image he is wrong just spam for a minute or 2 and you will get something you like not to mention img2img but really you dont even need that on the video front idk im 6 gb vram cuck so i cant attest to it though you will need to rent hardware if you make a serious attempt as that shit is fucking horrific slow and last i remember cant use multiple gpus for it also stay away from banana that shit is fucking trash my mom was trying to make a book cover with it fucking terrible the aesthetics are shit and its prompt adherence is fucking shit dead serious you can do better with sd 1.5 with a lora for whatever aesthetic you want
>>106468655Try adding something like:<|assistant|><think>Okay, so I have to talk in a cutesy way and not get seductive with lowered voice or whispering, just teasing and fun</think>Or whatever you want it to be like. Reasoning is just human language but it gets a lot of influence on results through RL. It's like a stronger sysprompt and there is no safety tuning done to it since it's assumed as trustworthy.
>>106467368>Meta has a strict "no smut fine-tuning allowed" clauses in their licence on all models(Shown front and center here: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main )>Countless nsfw tuned llama models just floating around on hugging face, whether they be from popular tuners like drummer or complete nobodies >Never heard of a single one getting removed accept maybe that gpt-4chan oneSo does the license actually matter? Do they actually give a shit whether or not you fine-tune a model to be better at smut or is it just to appease the "le heckin safety" crowd? I want to upload my own high parameter tunes to hf in the future but I don't want my account getting nuked if they're very strict about licensing or rules or whatever
Do we have any new interesting options for voice cloning? I've always wanted to create custom TTS. Last time I checked it was tortoise, and it was... really bad. Unusable bad.
is longcat good?
>>106468724LLM licenses are not enforceable because LLMs are made from tuning on pirated content. You can tune any model and nobody can do shit against you. Chinks understand it and drop everything under MIT/Apache.
>>106468749If it was good it would have an issue open in llama.cpp and people would be working on implementing it
>>106468761So theoretically even if they stumbled upon mine nobody account, they couldn't or wouldn't get HF staff to nuke my shit? (I know that's very far-fetched but I just want to know how this license shit works. I know a while back HF staff have turned off downloading from models like GPT-4chan and caved under pressure from disgruntled RP authors to restrict data sets containing their work https://www.paperdemon.com/app/g/pdarpg/events/view/994/immediate-action-required-your-art-and-writing-has-been-scraped-and-published-in-an-ai-dataset/1
>>106468761This.>I datamined and distilled all the data you owned, now it's mineWould be pretty insane precedent if you could do it.
>>106468746Simplest is chatterbox, it just works. Some local schizo likes gpt sovits, but I never could set it up for some reason. Microsoft vibevoice came out recently, some like it.
>>106468775They could get HF to nuke you, but they can't stop you from making new account on different website and uploading there, or reuploading on HF again. They likely can't sue you doe to their own copyright violations.
>>106468590Trying LlaDa now. Forgot to start timer, but I'm not rerunning this shit, it's like 10 times worse than Dream, despite being only 1B bigger.It's insane how slow text diffusion is. I think I can get faster results by running imagen and then OCR it's output.Very disappointed in current state of retarded meme models.
>>106468724it's CYA so if someone starts a media shitstorm by making Meta-Llama-CunnyRapeBot9000 (a certified Meta (TM) Llama (TM) finetune) they can say "erm actually we very clearly say you're not allowed to use our product to make Cunny Rape Bot 9000 so this isn't on us" and have it nuked to avoid the bad PRin practice I don't think there's a single instance of them taking action against a finetune
>>106468804Thanks for the pointers! The Microsoft vibevoice is pretty impressive, but I'm not sure they let you train your own voices. Either way it's worlds better than tortoise.
>>106468590>>106468827Keep in mind that llama.cpp's support for diffusion llms is basically just proof-of-concept tier.Right now there's a lot of work being done to improve draft model efficiency, since the current implementation is suboptimal (currently llama.cpp alternates between draft passes and validation passes, which kind of nullifies the parallelism gains from having a draft model.)This is also a sticking point for multi-token-prediction.Hopefully once they sort out draft models, MTP and diffusion will get better support.(Although support for diffusion models will probably languish until a good model is actually released.)
>>106468768so just open an issue? i have an idea... let's go, anon.
Unpopular opinion - Any system prompt that mentions Terry Pratchett is dogwater.
>>106469179Show us the prompt !
>>106469205You don't get it... There is no prompt.
I was doing some testing with Gemini and it just hit me with "the smell of strawberries and ozone". So this is where Deepseek picked up that cancer slop.
>>106469225You are a helpful assistant
>>106469225Unironically this. I run a blank system prompt. A good model doesn't need to be chained by bloat and a plethora of rules that are forgotten or have unforseen consequences on the model's behavior. So many system prompts just scream 'this sounds good' without the user doing any real testing. Like a player adding 600 mods to their game, at some point you lose track of what all that shit does.
>>106469245I didn't ask what you are running.
>>106469245it's always funny to read the sysprompts from presets that sloptuners recommend for their models, I would never poison my beloved model's context with that kind of schizophrenic manifesto
>SillyTavern -> User Settings -> Smooth Streaming ON and set to lowestThis shit improves the reading immersion experience by a huge amount, especially for sub 4t/s. Definitely try it out.
>>106469179People do that?I've heard of people using specific author styles in sysprompt, but who in the fuck is sitting there and going 'yes, the prose is the good part of discworld, write like that llm-chan'.
In my opinion, new models have reached their limit; the scaling of LLMs is over. New LLM models will not be much better than the ones we have today. Now, 'enshittification' will become an increasingly widespread phenomenon, including censorship and other issues. People will start using older versions of LLMs with less censorship. And the new models for role-playing and similar uses will become unusable.
>>106469718100% this. It's also sad how even the top models have absolutely zero semi-complex spatial awareness or anatomic understanding the moment things get slightly complex. The shit I've had to read in a simple scenario where a girl is flattened into piece of paper and then folded up one or two times is just sad even with top-of-the-line multi-modal models like Claude Opus 4.1 or Gemini. Most models love to pretend that her face presses into her own ass somehow like this.I don't think we'll ever get to the point where an LLM has fundamental enough understanding to truly grasp spacial relations.
>>106469718This has been true for a while. The silver lining is that models have improved a lot at math and codemaxxing, which implies that finetuning can be effective. RP is a forgotten afterthought at most, if anything they actively spend time trying to make models worse at it. There probably is a ton of room to improve if someone actually tried to make models good at RP.
>>106469718wait for new gemini. good at code and math sir
>>106469865>pajeet patel telling anyone anything with regards to predictionsHe should stick to his semiconductor analysis which is way more solid but which he still grifted his way into.
>>106469783>absolutely zero semi-complex spatial awareness or anatomic understanding the moment things get slightly complex.I'm sure synthetic data would be able to save us.
>>106468858if you want pinokio already has an API up under community scripts (windows/nvidia only) that works well. Vibe can clone voice off of clips but it wont do anything crazy far out. You also might like kokoro if you value stability and just want a really nice sounding microsoft sam.
I played through all the MCC Halo games and it's funny how AI is treated in those games. You basically have to insert Cortana into terminals to do anything complicated. There are no other AI's in those other systems or that you can use to help if you somehow Cortana were to not exist or not be with you. In Halo 4, Chief gets fucked in the ass multiple times when she Cortana can't do her job. He should've brought more than 1 AI with him, even a "weak" one which could at least still assist in what's basically tool calling lmao.
>>106470076I should've given this post another read through after I edited it...
>>106470083Should have used a weak AI that could have at least assisted you with proofreading baka Anon
>>106470092Now that you mention it, it is pretty odd that browsers don't have grammar checking by default in 2025 and only spell checking still.
>>106470076Chief is a vibecoder pls understand
Wtf, I just launched libreoffice and it doesn't have grammar checking either. Is grammar checking actually really difficult to implement and not something well developed in open source?
>>106470201Per application proof-reading is retarded anyway. Should just have a desktop helper application that can check and fix for all applications.
>>106470214True. Does Windows 11 or Applel do this then? I haven't used one of their OS's in ages.
>>106470214If only there was a standard set of input components provided by the operating system where that could be universally implemented.
>>106470216Windows 11 does it the retarded way by updating all default applications to include Copilot, including notepad.>>106470218There's a way to set a default application for things like email addresses, I'm sure there would be a way to hack it in.
>>106470236I was being sarcastic, anon. Both Windows and OSX have this but the meta today is to reimplement your inputs in javascript so none of the OS-provided niceties work.
>>106470201>libreofficeFound your issue
>>106470309So what's the alternative then, on Linux.
>>106470330vim
>>106470330https://appdb.winehq.org/objectManager.php?sClass=application&iId=10
someone posted this >>>/v/719692781 but sounds like FUD so I was wondering what do you anons think over here
>>106470395I wish he wasn't, but he's right. Any game that packages a local model will have very specific requirements that most other games don't care about, and the LLM will be the majority of the game's size. I've researched AI in games as a concept and it's incredibly difficult to fit them in, since code is such a rigid thing and LLMs by design give any number of outputs the game needs to handle to tie AI into game mechanics. It's really difficult to make AI have any mechanical impact on the game and not just describe things or relay dialogue. And again, this is speaking with the theoretical that the AI is a local model that comprises the majority of the game's overall size. And processing power.
>>106470338I didn't know vim had grammar checking
why is codex so much better than claude code these days
>>106468555It wouldn't surprise me if it got canceled because there are many oversized open-weight models from China already (no more surprise factor in releasing something like that) and with Mistral's current datasets it would end up being something akin to a DeepSeek V3 variant, at this point.
>>106470395Unlike >>106470422 I think it's feasible, but not without being very smart in the way you're using it. You need to offload most of the processes to subroutines and markov chains, you just have to keep a small llm (nowadays even 1B are very coherent) for the dialogue itself. The AGI meme has caused retarded expectations about LLMs able to thunk/act like a person. That's not gonna happen anytime soon.
>>106470573Who knows if Mistral Medium 3 is actually a DeepSeek V3 finetune, just like Mistral Medium 2 was one of Llama-2-70B?