/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107847320 & >>107838898►News>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107847320--AI companion project updates and TTS technology debates:>107850660 >107850672 >107850928 >107850947 >107853430 >107853443 >107853483 >107853554 >107853511 >107853575 >107853593 >107853616 >107853690 >107854122 >107854331 >107850983 >107851185 >107851216 >107851258 >107851258 >107853585--Ouro-2.6B with claimed 12B performance and Scaling Latent Reasoning via Looped Language Models:>107850350 >107850413 >107850458 >107850486 >107850501 >107850519--Optimizing client-side beam search with logprobs and batching challenges:>107848617 >107850562 >107853006 >107853060 >107853196 >107853678--Tools and methods for converting EPUB books to audiobooks:>107848869 >107848873 >107848915 >107849013 >107849252--LLM roleplaying tool UI comparisons: RisuAI vs Talemate tradeoffs:>107847698 >107847879--Model-GPU compatibility and quantization tradeoffs for consumer hardware:>107855431 >107855475 >107855507 >107855527 >107855540 >107855562 >107855603 >107855650 >107855676 >107855692--Conflicting mmap behaviors in model loading across Windows/Linux platforms:>107849640 >107849759 >107849779 >107849831 >107849870 >107849925 >107849966 >107849976 >107849992--Multi-model debate system feasibility in chat environments:>107850478 >107851241 >107851246 >107851397 >107851424 >107851506--Llama.cpp integration via three-vrm example code:>107851325--GPU options for LLMs: Ampere+ requirements vs budget constraints:>107851370 >107851398 >107851413 >107854581 >107851423 >107851429 >107851444 >107851450 >107851463 >107851469 >107851817--P40 limitations for modern LLM frameworks despite llama.cpp support:>107851776 >107851795 >107851803 >107852059--GLM-Image release:>107856290--Miku and Teto (free space):>107847978 >107849177 >107849273 >107850396 >107851664 >107852635 >107852843 >107853561►Recent Highlight Posts from the Previous Thread: >>107847322Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
S-so it's really here?autoregressive imagegen for local? I'm not dreaming?
>>107856463Worse than zit despite being newer and bigger, basically a garbagehttps://huggingface.co/zai-org/GLM-Image/discussions/1
>>107848869>>107849013>>107849252ended up with kokoro-tts docker image and a chatgpt script to automatize the process. werks well, sounds good.im running the model on runpod since i have some credit on there. rtx 4000 and 5080 both seem give around 3,5 sec/chunk. gonna try it locally on my 3060.
>>107856493lmao at the shills seething in the commentsnovelai must be planning to adopt that model in the future with how they're panicking about getting exposed
>>107856717You're the *** shill, how would they have plans in the past to adopt an unknown model? Or are you suggesting Z.ai bought them out?
Hey guys! Is this the place where I ask about NSFW Roleplay AI or do I need to move to another general? Looking for a place to like, learn the AI goober things really...
>>107856809Buy an ad.
>>107856809this place is as good as any other.
>ouro-2.6Bverdict?
>>107856424Newfag here. How good are local models now? If I had a mountain of GPU how close to chat gpt can I get
>>107856717From a practical standpoint it should be smaller or better, there's no point in making it bigger and worse. That is embarrassing. But if you think about it for a second, you'll understand why it's a good thing to do something new and still release it even if it turns out to be shit. That's how progress is made
>>107856886>1.4MB
>copy the instructions from the readme>it doesn't work>use the docker container>it doesn't workThanks for nothing.
>>107856809this general is technical discussion and asshats/aicg/ is casual roleplay discussion and mouthbreatherstake your pick! oh, and I forget that a general still exists on /vg/. That one is where a lot of this started, but last I checked it was kind of a weird place to hang out. in this general, actual groundbreaking SOTA methods like CoT have been explored by an anon who was later cited in a paper, I think kalomaze was his name but some anon can correct me. that was the principle behind reasoning, which became a big part of 2025 LLM releases. and we have a resident llama.cpp dev, and he's a pretty cool guy. but if you're less interested in running models than you are in roleplaying with them, /aicg/ might be more your speed. i hope you have a fun and warm 4chan experience newfriend :)
>>107857025Model?
>>107857025>CoTThat was /aids/ or /aidg/, back when it was GPT-3. Kalomaze made min p.
>>107856983
Anyone tried these cheap Mi50 blower fan kits from Aliexpress? Tempted to replace my jank solution with something more aesthetic.
>>107857227>big corporations bought up entire hardware supplies to build AI god, causing shortages>chinks sell 3D-printed customization kits for literal e-waste so you can run your local AI waifusWe're already living in a cyberpunk dystopia
>>107857227Are those printed? Is it really a good idea to be sticking thermoplastic directly onto your big ass gpu heatsink?
I gaze upon this thread from the pit of despair. Ram and gpu prices won’t drop for decades. All hope is lost.
>>107857483All we need is a zit moment for llms. Z image turbo and LTX-2 gave me so much hope, I believe the future isn't grim anymore
>>107857475Yeah man just do it. Don't be a pussy, give your GPU a nice new coat to keep it warm this winter.
>>107857494Popping zits?
>>107857494Can the z image stuff do good nsfw? How about video stuff?
>>107857483Honestly a modern, well-trained 12b model could probably BTFO just about everything short of SOTA 300b+ models in creative and RP if it was trained on a quality, curated dataset with minimal synthetic slop. This is why Nemo is still so highly regarded today, imagine if we got a model with the relative smarts of gemma 12b without safetycuckery and less gemini slop.Throwing more hardware at the problem is a bandaid solution for a competency issue, coupled with companies being to scared or gay to give people what they want.
>>107857528>Throwing more hardware at the problem is a bandaid solution for a competency issue, coupled with companies being to scared or gay to give people what they want.truth nuke
>>107857494>Z image turbo and LTX-2 gave me so much hopeZ-image turbo yes, it's a 6b model, but LTX2 is a huge boi (19b), one of the biggest local video models, not the best example of efficiency imo
>>107857528the problem is that it is difficult for the model during training to differentiate in the dataset between intelligence and style. if you feed the model every single high quality non synthetic token, it will still be dumb due to the sheer lack of non synthetic data in the world. ai is used everywhere at every level now and has been for at least a year and a half. you cannot train a model that is aware of the current state of the world without using synthetic data.