/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 01/13/26(Tue)21:20:05 No.107856424

File: 1758010686171721.jpg (1.53 MB, 1298x2116)

/lmg/ - Local Models General Anonymous 01/13/26(Tue)21:20:05 No.107856424

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107847320 & >>107838898

►News
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2
>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/13/26(Tue)21:20:34 No.107856428

Anonymous 01/13/26(Tue)21:20:34 No.107856428

File: luka vocaloid potato chip(...).jpg (446 KB, 3112x3022)

446 KB JPG

►Recent Highlights from the Previous Thread: >>107847320

--AI companion project updates and TTS technology debates:
>107850660 >107850672 >107850928 >107850947 >107853430 >107853443 >107853483 >107853554 >107853511 >107853575 >107853593 >107853616 >107853690 >107854122 >107854331 >107850983 >107851185 >107851216 >107851258 >107851258 >107853585
--Ouro-2.6B with claimed 12B performance and Scaling Latent Reasoning via Looped Language Models:
>107850350 >107850413 >107850458 >107850486 >107850501 >107850519
--Optimizing client-side beam search with logprobs and batching challenges:
>107848617 >107850562 >107853006 >107853060 >107853196 >107853678
--Tools and methods for converting EPUB books to audiobooks:
>107848869 >107848873 >107848915 >107849013 >107849252
--LLM roleplaying tool UI comparisons: RisuAI vs Talemate tradeoffs:
>107847698 >107847879
--Model-GPU compatibility and quantization tradeoffs for consumer hardware:
>107855431 >107855475 >107855507 >107855527 >107855540 >107855562 >107855603 >107855650 >107855676 >107855692
--Conflicting mmap behaviors in model loading across Windows/Linux platforms:
>107849640 >107849759 >107849779 >107849831 >107849870 >107849925 >107849966 >107849976 >107849992
--Multi-model debate system feasibility in chat environments:
>107850478 >107851241 >107851246 >107851397 >107851424 >107851506
--Llama.cpp integration via three-vrm example code:
>107851325
--GPU options for LLMs: Ampere+ requirements vs budget constraints:
>107851370 >107851398 >107851413 >107854581 >107851423 >107851429 >107851444 >107851450 >107851463 >107851469 >107851817
--P40 limitations for modern LLM frameworks despite llama.cpp support:
>107851776 >107851795 >107851803 >107852059
--GLM-Image release:
>107856290
--Miku and Teto (free space):
>107847978 >107849177 >107849273 >107850396 >107851664 >107852635 >107852843 >107853561

►Recent Highlight Posts from the Previous Thread: >>107847322

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/13/26(Tue)21:29:18 No.107856463

Anonymous 01/13/26(Tue)21:29:18 No.107856463

S-so it's really here?
autoregressive imagegen for local? I'm not dreaming?

Anonymous
01/13/26(Tue)21:34:18 No.107856493

Anonymous 01/13/26(Tue)21:34:18 No.107856493

>>107856463
Worse than zit despite being newer and bigger, basically a garbage
https://huggingface.co/zai-org/GLM-Image/discussions/1

Anonymous
01/13/26(Tue)21:56:44 No.107856644

Anonymous 01/13/26(Tue)21:56:44 No.107856644

File: Capture.png (6 KB, 968x64)

6 KB PNG

>>107848869
>>107849013
>>107849252
ended up with kokoro-tts docker image and a chatgpt script to automatize the process. werks well, sounds good.
im running the model on runpod since i have some credit on there. rtx 4000 and 5080 both seem give around 3,5 sec/chunk. gonna try it locally on my 3060.

Anonymous
01/13/26(Tue)22:05:46 No.107856717

Anonymous 01/13/26(Tue)22:05:46 No.107856717

>>107856493
lmao at the shills seething in the comments
novelai must be planning to adopt that model in the future with how they're panicking about getting exposed

Anonymous
01/13/26(Tue)22:11:10 No.107856757

Anonymous 01/13/26(Tue)22:11:10 No.107856757

>>107856717
You're the *** shill, how would they have plans in the past to adopt an unknown model? Or are you suggesting Z.ai bought them out?

Anonymous
01/13/26(Tue)22:17:50 No.107856809

Anonymous 01/13/26(Tue)22:17:50 No.107856809

Hey guys! Is this the place where I ask about NSFW Roleplay AI or do I need to move to another general? Looking for a place to like, learn the AI goober things really...

Anonymous
01/13/26(Tue)22:20:31 No.107856829

Anonymous 01/13/26(Tue)22:20:31 No.107856829

>>107856809
Buy an ad.

Anonymous
01/13/26(Tue)22:27:38 No.107856861

Anonymous 01/13/26(Tue)22:27:38 No.107856861

>>107856809
this place is as good as any other.

Anonymous
01/13/26(Tue)22:28:12 No.107856867

Anonymous 01/13/26(Tue)22:28:12 No.107856867

>ouro-2.6B
verdict?

Anonymous
01/13/26(Tue)22:29:15 No.107856875

Anonymous 01/13/26(Tue)22:29:15 No.107856875

>>107856424
Newfag here. How good are local models now? If I had a mountain of GPU how close to chat gpt can I get

Anonymous
01/13/26(Tue)22:31:43 No.107856886

Anonymous 01/13/26(Tue)22:31:43 No.107856886

File: 1767551307991286.png (1.37 MB, 2175x1234)

1.37 MB PNG

>>107856717
From a practical standpoint it should be smaller or better, there's no point in making it bigger and worse. That is embarrassing. But if you think about it for a second, you'll understand why it's a good thing to do something new and still release it even if it turns out to be shit. That's how progress is made

Anonymous
01/13/26(Tue)22:44:56 No.107856983

Anonymous 01/13/26(Tue)22:44:56 No.107856983

>>107856886
>1.4MB

Anonymous
01/13/26(Tue)22:49:54 No.107857018

Anonymous 01/13/26(Tue)22:49:54 No.107857018

File: file.png (365 KB, 2053x1322)

365 KB PNG

>copy the instructions from the readme
>it doesn't work
>use the docker container
>it doesn't work
Thanks for nothing.

Anonymous
01/13/26(Tue)22:51:12 No.107857025

Anonymous 01/13/26(Tue)22:51:12 No.107857025

>>107856809
this general is technical discussion and asshats
/aicg/ is casual roleplay discussion and mouthbreathers
take your pick! oh, and I forget that a general still exists on /vg/. That one is where a lot of this started, but last I checked it was kind of a weird place to hang out. in this general, actual groundbreaking SOTA methods like CoT have been explored by an anon who was later cited in a paper, I think kalomaze was his name but some anon can correct me. that was the principle behind reasoning, which became a big part of 2025 LLM releases. and we have a resident llama.cpp dev, and he's a pretty cool guy. but if you're less interested in running models than you are in roleplaying with them, /aicg/ might be more your speed. i hope you have a fun and warm 4chan experience newfriend :)

Anonymous
01/13/26(Tue)22:54:55 No.107857050

Anonymous 01/13/26(Tue)22:54:55 No.107857050

>>107857025
Model?

Anonymous
01/13/26(Tue)22:57:15 No.107857063

Anonymous 01/13/26(Tue)22:57:15 No.107857063

>>107857025
>CoT
That was /aids/ or /aidg/, back when it was GPT-3. Kalomaze made min p.

Anonymous
01/13/26(Tue)22:59:16 No.107857077

Anonymous 01/13/26(Tue)22:59:16 No.107857077

File: e7f743c2-6dfc-4d0e-ba1e-c(...).png (1.62 MB, 768x1344)

1.62 MB PNG

>>107856983

Anonymous
01/13/26(Tue)23:20:38 No.107857227

Anonymous 01/13/26(Tue)23:20:38 No.107857227

File: brrrrrrrrrrrrr.jpg (390 KB, 1200x898)

390 KB JPG

Anyone tried these cheap Mi50 blower fan kits from Aliexpress? Tempted to replace my jank solution with something more aesthetic.

Anonymous
01/14/26(Wed)00:01:59 No.107857431

Anonymous 01/14/26(Wed)00:01:59 No.107857431

>>107857227
>big corporations bought up entire hardware supplies to build AI god, causing shortages
>chinks sell 3D-printed customization kits for literal e-waste so you can run your local AI waifus
We're already living in a cyberpunk dystopia

Anonymous
01/14/26(Wed)00:12:45 No.107857475

Anonymous 01/14/26(Wed)00:12:45 No.107857475

>>107857227
Are those printed? Is it really a good idea to be sticking thermoplastic directly onto your big ass gpu heatsink?

Anonymous
01/14/26(Wed)00:14:01 No.107857483

Anonymous 01/14/26(Wed)00:14:01 No.107857483

I gaze upon this thread from the pit of despair. Ram and gpu prices won’t drop for decades. All hope is lost.

Anonymous
01/14/26(Wed)00:17:37 No.107857494

Anonymous 01/14/26(Wed)00:17:37 No.107857494

>>107857483
All we need is a zit moment for llms. Z image turbo and LTX-2 gave me so much hope, I believe the future isn't grim anymore

Anonymous
01/14/26(Wed)00:24:53 No.107857504

Anonymous 01/14/26(Wed)00:24:53 No.107857504

>>107857475
Yeah man just do it. Don't be a pussy, give your GPU a nice new coat to keep it warm this winter.

Anonymous
01/14/26(Wed)00:29:36 No.107857516

Anonymous 01/14/26(Wed)00:29:36 No.107857516

>>107857494
Popping zits?

Anonymous
01/14/26(Wed)00:30:54 No.107857524

Anonymous 01/14/26(Wed)00:30:54 No.107857524

>>107857494
Can the z image stuff do good nsfw? How about video stuff?

Anonymous
01/14/26(Wed)00:32:42 No.107857528

Anonymous 01/14/26(Wed)00:32:42 No.107857528

File: 1749774314960475.gif (209 KB, 900x650)

209 KB GIF

>>107857483
Honestly a modern, well-trained 12b model could probably BTFO just about everything short of SOTA 300b+ models in creative and RP if it was trained on a quality, curated dataset with minimal synthetic slop. This is why Nemo is still so highly regarded today, imagine if we got a model with the relative smarts of gemma 12b without safetycuckery and less gemini slop.
Throwing more hardware at the problem is a bandaid solution for a competency issue, coupled with companies being to scared or gay to give people what they want.

Anonymous
01/14/26(Wed)00:36:41 No.107857541

Anonymous 01/14/26(Wed)00:36:41 No.107857541

>>107857528
>Throwing more hardware at the problem is a bandaid solution for a competency issue, coupled with companies being to scared or gay to give people what they want.
truth nuke

Anonymous
01/14/26(Wed)00:39:08 No.107857550

Anonymous 01/14/26(Wed)00:39:08 No.107857550

>>107857494
>Z image turbo and LTX-2 gave me so much hope
Z-image turbo yes, it's a 6b model, but LTX2 is a huge boi (19b), one of the biggest local video models, not the best example of efficiency imo

Anonymous
01/14/26(Wed)00:42:19 No.107857562

Anonymous 01/14/26(Wed)00:42:19 No.107857562

>>107857528
the problem is that it is difficult for the model during training to differentiate in the dataset between intelligence and style. if you feed the model every single high quality non synthetic token, it will still be dumb due to the sheer lack of non synthetic data in the world. ai is used everywhere at every level now and has been for at least a year and a half. you cannot train a model that is aware of the current state of the world without using synthetic data.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.