/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107834480 & >>107826643►News>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107834480--Quantization performance discrepancies in Gemma-3-27B GGUF models:>107837128 >107837136 >107837144 >107837149 >107837148 >107837164 >107837180 >107837192 >107837306 >107837197--Local AI models for image feedback and multimodal tasks:>107835848 >107835882 >107835900 >107835915 >107835941 >107835980 >107835996 >107836032 >107836089 >107836175 >107836347--Running LLMs on low VRAM hardware with quantization and CPU offloading:>107837436 >107837473 >107837514 >107837539 >107837558 >107837573 >107837603 >107837609 >107837633 >107837639--Self-taught AI learner's motivation vs. math complexity challenges:>107835318 >107835331 >107835431 >107835463 >107835488 >107835627 >107835644 >107835375 >107835403 >107835383--PowerShell cmd confusion with Gemma model response critique:>107835679 >107835772 >107835785 >107835832--Ethical and practical concerns about AI-generated PR descriptions on llama.cpp's GitHub:>107837074 >107837130--Struggles with model size limitations and RAM requirements for large vision models:>107836240 >107836244 >107836537 >107836554 >107836556 >107836593 >107836268 >107836272 >107836286--Extensive banned token list with model-specific customizations:>107835736 >107835765--Meta's controversial nuclear energy investment for AI criticized as misguided:>107835873--Critique of complex ST webui implementation:>107834750--Optimizing documentation vectorization through token-efficient formatting:>107835121 >107835923--File numbering organization debates in documentation directories:>107834742 >107834901 >107835077--CPU-optimized GPT-SoVITS via ONNX inference engine:>107836452--Teto (free space):>107835833►Recent Highlight Posts from the Previous Thread: >>107834483Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107838898The box has clearly just been opened, why is her hair outside the box?
Is the Pangu age finally upon us?
Sup qt's
>>107838951Yes, 鬼佬 maximum white man benchmark score best model revolutionary intelligence new LLM AI for men, women, children, pets
Do you think they will start up production of additional ram or do you think cloud models will be the only way to go in the future?
>>107839186No, everyone expects the bubble to pop
Why are all templates greyed out in ST when using chat completion? And why are chat and text completion two differrent things?
why do a lot of you favor GLM 4.6 and not 4.7? I was running 4.6 for awhile but I've been liking 4.7, it feels different. just curious
>>107839204chat completion applies model default templates on the backend
>>107839208>why do a lot of you favor GLM 4.6 and not 4.7?Only NAI shills do.
>>107838898>open box>I accidentally Miku
>>107839196How bad will things get if the bubble simply doesn't pop?
>>107839363When rich people start becoming bothered by it
>>107839186>>107839363If you're not poor it's actually extremely funny to watch on the sidelines. I hope it won't pop.
>>107839196>he thinks RAM prices are going to return to normal after the bubble pops
OP's pic reminded me of when I had to put my cat that died in a shoebox to take him in for cremation and now i'm sad
>>107839383retard
>>107839208I like both for different reasons. 4.7 is smarter and tends to stick to the prompt more closely. Often to an autistic degree which sometimes leads to issues with cards that aren't tightly written. 4.6 feels more flexible and creative out of the box.But I've had a lot of success with 4.7 here as well after adjusting my older cards to cover for that.