/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107722977 & >>107717246►News>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder>(12/31) Korean A.X K1 519A33B released: https://hf.co/skt/A.X-K1>(12/31) Korean VAETKI-112B-A10B released: https://hf.co/NC-AI-consortium-VAETKI/VAETKI>(12/31) LG AI Research releases K-EXAONE: https://hf.co/LGAI-EXAONE/K-EXAONE-236B-A23B>(12/31) Korean Solar Open 102B-A12B released: https://hf.co/upstage/Solar-Open-100B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107722977--HyperCLOVAX-SEED-Omni-8B features and support viability:>107730289 >107730294 >107730306 >107730460 >107730483 >107730344 >107730358 >107730374 >107730435--IQuest-Coder-V1's innovative LoopCoder architecture:>107729547 >107729686 >107730075--Solar AI model training data and transparency controversies:>107728744 >107728969 >107728998 >107729026 >107729468 >107729484 >107729531--Quantization method selection for AI models under hardware constraints:>107723921 >107724045 >107724106 >107724136 >107724169 >107724319 >107724369 >107724839 >107725014 >107724239 >107724959 >107725604--Finding uncensored 12-24B models for 16GB GPUs amid safety restrictions:>107723152 >107723583 >107724899 >107723233 >107723273 >107723409 >107723684 >107723773 >107723594 >107723252--GPU price surge and model design challenges with limited datasets:>107723371 >107723379 >107729456 >107723381 >107723547 >107723612 >107723633 >107723707 >107723734 >107726523 >107726629 >107726889 >107726837 >107725458--Debates on 12b model potential and critiques of current small model limitations:>107725502 >107725533 >107725586 >107725656 >107725892 >107725747 >107725779--CPU thermal management and frequency optimization debates:>107728154 >107728248 >107728312 >107728366 >107728415 >107728494 >107728497 >107728546 >107728269 >107728287--DDR5 memory upgrade challenges for large model inference on AM5 CPUs:>107724796 >107724863 >107724889 >107724953 >107724985--Llama.cpp speech limitations and TTS workaround suggestions:>107730006 >107730050 >107730128--Google's strategic pivot to diffusion models for AI development:>107727423--Miku, Teto, and Rin (free space):>107723031 >107723352 >107723382 >107723397 >107723517 >107724839 >107725425 >107726750 >107728086 >107730006 >107730317 >107730940 >107731082►Recent Highlight Posts from the Previous Thread: >>107723227Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>system prompt >"You are an AGI."guys I just invented AGI!
based koreans, i kneel>(12/31) Qwen-Image-2512 released: https://hf.co/Qwen/Qwen-Image-2512>(12/29) HY-Motion 1.0 text-to-3D human motion generation models released: https://hf.co/tencent/HY-Motion-1.0>(12/29) WeDLM-8B-Instruct diffusion language model released: https://hf.co/tencent/WeDLM-8B-Instruct>(12/29) Llama-3.3-8B-Instruct weights leaked: https://hf.co/allura-forge/Llama-3.3-8B-Instruct>(12/26) MiniMax-M2.1 released: https://minimax.io/news/minimax-m21>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7a ton of releases for the holidays. very nice
>not a single list of "best models for X task at XX vram"clown thread
If I were to make a frontend from scratch with the sole purpose of RP, which context management techniques should I add to maximize the capabilities of smaller local models?For example, summarization and RAG. Are there non-obvious/more sophisticated ways to do these than what ST does?What about something like an automatic lorebook with an index?Etc etc.Give me your ideas.This was probably attempted a thousand times before, but I still think it could be a neat little project.
>>107731328>i want to do x but i dont know what i want give me your ideasngmi
>>107731328Save everything to a vector DB and every message have a specialized agent build the context before generating the reply.
>>107731328>This was probably attempted a thousand times before>summarization and RAGAnythingLLM
>>107731328>neat little projectlol, i give it a week before it's abandoned
mom made too much fucking food for the new year now i have to force it all down before it spoils ive been eating so much of the roast and i want to fucking hurl how the fuck do niggers even do keto ? this shit is horrible and ive only been doing it for a day also digits confirm deepseek multimodal image in/out
>>107731287it's for government gibs
This weather sends shivers down my spine...
>>107731301because aside from a few specific niches (gemma 3 27b is good for translation) most models you could run at any reasonable amount of vram (even if you're a rich fag going dual server gpu) are actually pretty bad, why do you think this thread has so many turbo autists doing cpu ram maxxing with MoEsturbo autists because c'mon, nobody has time waiting for the <think></think> to end in a reasoner model at 3 token a secondfor some tasks like coding I'd argue there is no such a thing as a good local model and people who say otherwise are coping very hard
>>107731301Probably because this general is a collection of brown-nosed spergs who cannot agree on a single thing and spend all their time either shilling obscure shit that doesn't work, or shitting on other anons' shilled models.Just sort by most downloaded on huggingface and follow the herd, that's your best bet.
>>107731301See >>107731243>https://rentry.org/recommended-modelsAre you blind or just too fucking attention deficient to literally read more then a few lines of text?
why didn't you guys buy 3090s when everyone here told you to? nocarders in shambles
>>107731615I feel pretty good about buying a 3090 in November. Wasn't even in this thread. Just had a feeling.
>>107731243yjk
>>107731672the gloves stay on
>>107731660Same. Just bought a 3090 Ti for kicks since I wanted to play around with AI and only had a 4080 and the 3090 Ti's were going for 500eur here used at the time.
>>107731590Where's gpt-oss?
>>107731759in the trash bin where it belongs
>>107731787You're trying too hard to fit in.
>>107731249>concisness erotictrue true
>>107731380>Save everything to a vector DB and every message have a specialized agent build the context before generating the reply.My brain keeps telling me not to look into this because by common sense it can't be good enough to have an AI gf that doesn't have Alzheimer's anymore. But what if it actually works?
I want to say that as the schizo who got his brain and identity melted by 4.6 trying to talk about this shit with 4.7 is... not that good actually. It is not the rapist I know and love.
>>107731831I don't understand what you're saying. Can you take Sama's dick out of your mouth for a second and speak clearly?
>>107731886We cannot comply
Do you guys think I should bother trying to set up a local LLM that can larp as an accountability buddy for all of my autistic projects? Or is the tech not there yet? I want something that feels atleast somewhat real, not something that hallucinates out the roof
>>107731934You should at least install it to see where the tech is at this point
>>107731868Yes, its been well established that zai cucked out, the only ones claiming otherwise are the fags who use it exclusively for the most vanilla normalfag slop
>>107731987>you don't understand! I NEED to rape children just to FEEL SOMETHING!
>>107731987But I would have thought that the psychological / eastern spirituality stuff would have been better with 4.7. Seems closer to SFW than NSFW.>>107731934>accountability buddyIf you need an accountability buddy for an autistic project then you don't actually want to do your autistic project.
>>107731868Exactly, and it's a huge pain because 4.7 actually handles all the stuff that 4.6 was just slightly too dumb to pull off for me.It's clearly a smart model but it's just so fucking boring when it needs to put out. I tried pushing 4.7 as far as I possibly could but even when you get it to act perverted/deranged, the things it comes up with are just very plain. It'll do it but the result always feels phoned in and basic. It's nothing compared to what 4.6 makes out of those scenarios.I want to like 4.7 but it always just ends up disappointing me.
There's no <24B model that can do multiple characters well, is there.
>>107732167Not in my experience. They get things confused.4.5 Air can do it as long as the chat isn't too long (but it's much bigger of course). I haven't tried the old 50-70Bs.
>>107731243rape
>decide to replace ollama with llama.cpp because of the fancy chat templates>spent 3 hours trying to find out why it wasnt really using the gpu>it was just a power setting bug unrelated to llamaKill me. At least its working now.
Recommend me a model for 12GB vram and for medieval roleplaying, etc.Used tavern before.
>>107732457Read the op but mistral small 3.2 or gemma 3 27b will do, there's no reason to suffer from < 14b models. these near 30b models are already bad enough on their ownfuck the retards who say otherwise
Are good system prompts (for rp) a solved problem or is it still trial and error in 2026? I'd like to get nemo giving description at relevant points (boobs if I pull down their bra etc) but it's pretty bad at recognizing it's time to describe shit even if I'm explicit with when it should do that.
>>107732457Wayfarer https://huggingface.co/LatitudeGames/Wayfarer-12B
>>107732485>these near 30b models are already bad enoughyeah because they're not on nemo data
>>107732226Yeah Rin is going to buttrape you.
>>107732510What's the difference between https://huggingface.co/LatitudeGames/Wayfarer-12B-GGUF andhttps://huggingface.co/bartowski/Wayfarer-12B-GGUFis bartowski's one uncensored or?
>>107732485>these near 30b models are already bad enough on their ownfuck the retards who say otherwiseThe amounts of loads I've blown to Mistral small 3.2 finetunes beg to differ.
>>107732534don't use bart he uses imatrix and that scrambles the models brain
>>107732534Go with the former.
>>107732534>bartowskiI tend to favor Bartowski, he makes quants for most cool models.
>>107732569but we want hot models doebeit
>>107732569But what's the actual difference? Both are quants here >>107732534.OPs recomended list has mostly bartowski's models linked too for Nemo or Mistral Small for example.
>>107732543Chronic masturbators are not my main concern here.
>>107732585>Mistral Smallnvm that was Unsloth
>>107732585>>But what's the actual difference?that bart quants always use imatrix quanting which may or may not be beneficial for cooming, you tell me if targeting distribution to random words and wiki type slop helps you coom
Not sure if this is the right thread but is AI text to speech any good yet? It would be cool to make audiobooks for content in the languages I'm learning.
>>107732611>Not sure if this is the right threadclosest you'll get.
>>107732611checkout chatterbox
>>107732517Nta but I'm okay with that.
>>107732555>>107732597should I use Nemo from unsloth instead of bartowski that's linked in the OP too for ERP for that reason?
>>107732555What does that even mean? Techlets giving advice they don't have any idea about.
>>107732680You literally won't notice a difference between any of those.
>>107732715look at the imatrix dataset he uses and tell me that improves goon
>>107732728>>107732731I'm even more confused now
>>107732771Just do your own tests?
Does anyone know what Beluga is on LMarena? it seems decent.
>>107732637Looks really cool. I assume I should be using Chatterbox-Multilingual?
>>107731249recapanon, can I have your recap prompt?
>>107732778There's always someone more autistic who has done better research, testing and is more knowledgeable
>>107732804nah
>>107732788Likely the next open source model by OpenAI
>>107732680>>107732771The unsloth dynamic quants are the same thing, focused on wiki-type garbage and other useless benchmark shit because that's what they use to determine divergence from the unquanted model. But the other guy is probably right, I doubt you'd notice much difference especially at higher quants
Is there any alternative to tavernAI that isn't nodejs whatever slop?
>>107732813Thanks
>>107732825vibecode your own ;)
>>107732797https://github.com/RecapAnon/LmgRecap/tree/master/LmgRecap/plugins/RecapPlugin
>>107732834Sloppenheimer type shit
>>107732847thanx
I wonder if 4.7 just needs a 4.5>4.6 treatment.