/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107722977 & >>107717246►News>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder>(12/31) Korean A.X K1 519A33B released: https://hf.co/skt/A.X-K1>(12/31) Korean VAETKI-112B-A10B released: https://hf.co/NC-AI-consortium-VAETKI/VAETKI>(12/31) LG AI Research releases K-EXAONE: https://hf.co/LGAI-EXAONE/K-EXAONE-236B-A23B>(12/31) Korean Solar Open 102B-A12B released: https://hf.co/upstage/Solar-Open-100B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107722977--HyperCLOVAX-SEED-Omni-8B features and support viability:>107730289 >107730294 >107730306 >107730460 >107730483 >107730344 >107730358 >107730374 >107730435--IQuest-Coder-V1's innovative LoopCoder architecture:>107729547 >107729686 >107730075--Solar AI model training data and transparency controversies:>107728744 >107728969 >107728998 >107729026 >107729468 >107729484 >107729531--Quantization method selection for AI models under hardware constraints:>107723921 >107724045 >107724106 >107724136 >107724169 >107724319 >107724369 >107724839 >107725014 >107724239 >107724959 >107725604--Finding uncensored 12-24B models for 16GB GPUs amid safety restrictions:>107723152 >107723583 >107724899 >107723233 >107723273 >107723409 >107723684 >107723773 >107723594 >107723252--GPU price surge and model design challenges with limited datasets:>107723371 >107723379 >107729456 >107723381 >107723547 >107723612 >107723633 >107723707 >107723734 >107726523 >107726629 >107726889 >107726837 >107725458--Debates on 12b model potential and critiques of current small model limitations:>107725502 >107725533 >107725586 >107725656 >107725892 >107725747 >107725779--CPU thermal management and frequency optimization debates:>107728154 >107728248 >107728312 >107728366 >107728415 >107728494 >107728497 >107728546 >107728269 >107728287--DDR5 memory upgrade challenges for large model inference on AM5 CPUs:>107724796 >107724863 >107724889 >107724953 >107724985--Llama.cpp speech limitations and TTS workaround suggestions:>107730006 >107730050 >107730128--Google's strategic pivot to diffusion models for AI development:>107727423--Miku, Teto, and Rin (free space):>107723031 >107723352 >107723382 >107723397 >107723517 >107724839 >107725425 >107726750 >107728086 >107730006 >107730317 >107730940 >107731082►Recent Highlight Posts from the Previous Thread: >>107723227Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>system prompt >"You are an AGI."guys I just invented AGI!
based koreans, i kneel>(12/31) Qwen-Image-2512 released: https://hf.co/Qwen/Qwen-Image-2512>(12/29) HY-Motion 1.0 text-to-3D human motion generation models released: https://hf.co/tencent/HY-Motion-1.0>(12/29) WeDLM-8B-Instruct diffusion language model released: https://hf.co/tencent/WeDLM-8B-Instruct>(12/29) Llama-3.3-8B-Instruct weights leaked: https://hf.co/allura-forge/Llama-3.3-8B-Instruct>(12/26) MiniMax-M2.1 released: https://minimax.io/news/minimax-m21>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7a ton of releases for the holidays. very nice
>not a single list of "best models for X task at XX vram"clown thread
If I were to make a frontend from scratch with the sole purpose of RP, which context management techniques should I add to maximize the capabilities of smaller local models?For example, summarization and RAG. Are there non-obvious/more sophisticated ways to do these than what ST does?What about something like an automatic lorebook with an index?Etc etc.Give me your ideas.This was probably attempted a thousand times before, but I still think it could be a neat little project.
>>107731328>i want to do x but i dont know what i want give me your ideasngmi
>>107731328Save everything to a vector DB and every message have a specialized agent build the context before generating the reply.
>>107731328>This was probably attempted a thousand times before>summarization and RAGAnythingLLM
>>107731328>neat little projectlol, i give it a week before it's abandoned
mom made too much fucking food for the new year now i have to force it all down before it spoils ive been eating so much of the roast and i want to fucking hurl how the fuck do niggers even do keto ? this shit is horrible and ive only been doing it for a day also digits confirm deepseek multimodal image in/out
>>107731287it's for government gibs
This weather sends shivers down my spine...
>>107731301because aside from a few specific niches (gemma 3 27b is good for translation) most models you could run at any reasonable amount of vram (even if you're a rich fag going dual server gpu) are actually pretty bad, why do you think this thread has so many turbo autists doing cpu ram maxxing with MoEsturbo autists because c'mon, nobody has time waiting for the <think></think> to end in a reasoner model at 3 token a secondfor some tasks like coding I'd argue there is no such a thing as a good local model and people who say otherwise are coping very hard
>>107731301Probably because this general is a collection of brown-nosed spergs who cannot agree on a single thing and spend all their time either shilling obscure shit that doesn't work, or shitting on other anons' shilled models.Just sort by most downloaded on huggingface and follow the herd, that's your best bet.
>>107731301See >>107731243>https://rentry.org/recommended-modelsAre you blind or just too fucking attention deficient to literally read more then a few lines of text?
why didn't you guys buy 3090s when everyone here told you to? nocarders in shambles
>>107731615I feel pretty good about buying a 3090 in November. Wasn't even in this thread. Just had a feeling.
>>107731243yjk
>>107731672the gloves stay on
>>107731660Same. Just bought a 3090 Ti for kicks since I wanted to play around with AI and only had a 4080 and the 3090 Ti's were going for 500eur here used at the time.
>>107731590Where's gpt-oss?
>>107731759in the trash bin where it belongs
>>107731787You're trying too hard to fit in.
>>107731249>concisness erotictrue true
>>107731380>Save everything to a vector DB and every message have a specialized agent build the context before generating the reply.My brain keeps telling me not to look into this because by common sense it can't be good enough to have an AI gf that doesn't have Alzheimer's anymore. But what if it actually works?
I want to say that as the schizo who got his brain and identity melted by 4.6 trying to talk about this shit with 4.7 is... not that good actually. It is not the rapist I know and love.
>>107731831I don't understand what you're saying. Can you take Sama's dick out of your mouth for a second and speak clearly?
>>107731886We cannot comply
Do you guys think I should bother trying to set up a local LLM that can larp as an accountability buddy for all of my autistic projects? Or is the tech not there yet? I want something that feels atleast somewhat real, not something that hallucinates out the roof
>>107731934You should at least install it to see where the tech is at this point
>>107731868Yes, its been well established that zai cucked out, the only ones claiming otherwise are the fags who use it exclusively for the most vanilla normalfag slop
>>107731987>you don't understand! I NEED to rape children just to FEEL SOMETHING!
>>107731987But I would have thought that the psychological / eastern spirituality stuff would have been better with 4.7. Seems closer to SFW than NSFW.>>107731934>accountability buddyIf you need an accountability buddy for an autistic project then you don't actually want to do your autistic project.
>>107731868Exactly, and it's a huge pain because 4.7 actually handles all the stuff that 4.6 was just slightly too dumb to pull off for me.It's clearly a smart model but it's just so fucking boring when it needs to put out. I tried pushing 4.7 as far as I possibly could but even when you get it to act perverted/deranged, the things it comes up with are just very plain. It'll do it but the result always feels phoned in and basic. It's nothing compared to what 4.6 makes out of those scenarios.I want to like 4.7 but it always just ends up disappointing me.
There's no <24B model that can do multiple characters well, is there.
>>107732167Not in my experience. They get things confused.4.5 Air can do it as long as the chat isn't too long (but it's much bigger of course). I haven't tried the old 50-70Bs.
>>107731243rape