/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107660171 & >>107652767►News>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107660171--Hardware upgrades for running quantized AI models and multi-GPU configuration:>107666345 >107666402 >107666408 >107666433 >107666445 >107666462 >107666463--Crafting effective system prompts for specific LLM behavior:>107664962 >107665087 >107665580 >107666887 >107666904 >107667023 >107667116 >107667119 >107667145 >107667173 >107667303 >107665026 >107665077 >107665090 >107665101 >107665167--Proposing MoE-LLM architecture with LRU expert cache for local deployment:>107666054 >107666073 >107666262 >107666288 >107666325--Challenges in implementing persistent prompt caching for model routers:>107661573 >107661968 >107662025 >107662156 >107662236 >107662311 >107662450 >107662575 >107662599 >107662681 >107662725 >107662782 >107663650 >107663642--Debating GLM 4.7's censorship:>107664203 >107664223 >107664270 >107664467 >107664687 >107664738 >107664766 >107665148 >107665188 >107665225 >107665325 >107665461 >107665807 >107665943 >107666016 >107666035 >107665706--Challenges with disabling reasoning in AI models and potential workarounds:>107660184 >107660198 >107662102 >107662167 >107663018 >107663033 >107663057 >107663102 >107663672 >107663721 >107666949 >107668010 >107667675--Intel Arc Pro B60 GPU pricing and performance debates:>107660199 >107660226 >107662460 >107662477 >107662497 >107662506 >107662572 >107662640 >107662650 >107662651 >107665998--Repurposing NAS with S3 storage as HuggingFace API-compatible model source:>107667186 >107667290--Model limitations in generating complex vector graphics:>107664405 >107664417 >107664438 >107664455 >107664461 >107664472 >107665953--Translating NSFW ASMR/JAV audio with specialized Whisper models:>107664657 >107664716 >107664792--Miku (free space):>107661427 >107663987 >107664266 >107667371 >107668010 >107668457 >107664599►Recent Highlight Posts from the Previous Thread: >>107660173Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107668478v4 when
>>107668489Assuming DeepSeek operates on Pacific Standard Time, there's still a chance for V4 within the next 6 and a half hours.
There's still nothing better than mythomax for rp, is there?
>>107668489We don't even have 3.2 support in llama.cpp yet.
>>107668527new drummer just dropped: https://huggingface.co/QuixiAI/Ina-v11.1
>>107668527Miqu.Kimi.
>>107668532we don't even have deepseek_v32 support in mf'ing _transformers_ yet for god sake
>>107668532vllm doesn't have this issue :^)
>>107668547Let me know when someone makes a 2 bit quant for vllm.
Is 3.2 good for sex?
GLM Air or Gemma better for safe and not safe rp?
>>107668555Be the change you want to see
>>107668575Yes... Gemma good 100%
>>107668532v4 would be big news and they would have to make adding support a priority
>>107668527have you tried Mixtral-8x7B ?it's probably the best model i had when i had only 16gb vram
>>107668575air is miles better for both.
Kimi thinking IQ4-XS or Q4_0 or Q4_K_M?
>>107668737IQ4-XS. imatrix quants are generally better than normal quants at the same bit count. Don't download an unsloth quant.Those are always broken.
>>107668478I just got A/B tested on Aistudio (while using G3 Pro). Maybe it's Gemma 4? What's weird is that one respond used internet resources, while the other was the raw model (I did not activate web searches). The new model was much better: It was an actual written respond and not a list of lists of bulleted 6-word sentences.
>>107665998EP no, marlin kind of, they've done a lot of kernel support for shit like fp8 but you won't just load fp8 off huggingface, if you can quant the model you'll likely get anything to run, I got devstral2 123b working recently but I had to quant it to W4A16.if you have the system resources you can dynamically at load time turn fp16 into fp8 or mxfp4some things are still cuda identified/cuda only in VLLM like cpu offload gb and some of the dtypes that intel actually supports, I'm not confident they'll get there in the end but it's fun to tinker with.I got cucked by Qwen 3.2 thinking I believe but didn't try too hard to see if it was workable.It's in good shape, it's not rocm but it's no cuda and probably never will be.
For the easiest out of the box setup, can I just get a consumer ryzen, 128gb of ddr5 ram, and a dual 9070xt for a total of 32gb of VRAM?
We'll get there howevermany more thousand rolls it takesPerfection is possibleThe models shall delight in every interaction and hardware will be irrelevantRobowaifus will be truly viable for those choosing that pathI believe
>>107668940btw I don't want to be near a stinky ass irl it's merely an interesting test-casehonest
>>107668940
>>107668940>oily thicket
>>107668940Impressive.Not excuse me while I puke.
>>107668940anon that is disgusting worse the cockbortion ngl
>>107669006im sickened but curious
i am convinced nobody actually has any system prompt that makes GLM 4.5 air actually be not cucked
>>107668940I remind you that our Lord Jesus Christ was born on this day.
>>107668907probably not the best use of your money, but that would be a decent build. quad 16gb 9060xts or 16gb 5060tis might be better.
>>107669034>system promptThe system prompt is the character card.I just used a thinking prefill.
>>107668940nvidia made fabs... for this?
>>107668955It's pretty funny. Nice alternative to... you know what... Just say the word!
>>107668995stirred something inside you didn't it?
>>107669047It's the perfect task for an text generative AI. It's not like a human would ever be so despite as to write that for others.