/lmg/ - a general dedicated to the discussion and development of local language models.Christmas EditionPrevious threads: >>107652767 & >>107643997►News>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107652767--Reasoning step control tradeoffs and multi-GPU setup fixes in SillyTavern:>107654025 >107654033 >107654054 >107654563 >107654882 >107655765 >107655833 >107655903 >107656043 >107656116 >107656253 >107656486 >107656988 >107657096 >107657168 >107657180 >107657297 >107657689 >107657823 >107657906 >107658051 >107658061 >107658104 >107658169 >107657498 >107657307 >107657350 >107657351 >107657477 >107657573 >107657627 >107657639 >107657404 >107657294 >107657176 >107657194--Performance comparison between ik_llama and exllamav3 in VRAM-bound scenarios:>107656297 >107656349 >107656555 >107656715 >107656838 >107657115--Resolving GGUF conversion errors with outdated dependencies:>107659075 >107659099 >107659110 >107659130 >107659134 >107659157 >107659165 >107659117 >107659129--Cost and performance considerations for Mac-based AI clusters vs traditional GPU setups:>107657777 >107657794 >107657813 >107657828 >107657854 >107657870 >107657937 >107657853 >107657876 >107657816--MoE model parameter vs expert count performance analysis:>107652819 >107652836 >107652840 >107654372--ARC-AGI 2 achievement and its implications for future LLM advancements:>107653556 >107653757 >107653789--Benchmarking GLM-4.7 models with livebench and GGUF format:>107656875 >107657040 >107657121 >107658101--GLM 4.7 model performance and quantization calibration controversies:>107656256 >107656302 >107656312 >107656327 >107656401 >107656577--llamafile project update from Mozilla.ai:>107658257--Post-training resource demands for advanced AI models:>107653833--Critique of dense models and praise for alternatives like qwen3:>107655084--Logs: GLM-4.7:>107658013 >107658080--Miku (free space):>107652814 >107652980 >107652999 >107653495 >107654563 >107656486 >107656586 >107657689 >107658850 >107659977►Recent Highlight Posts from the Previous Thread: >>107652827Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
I still can't turn off thinking for GLM 4.7
>>107660184Works for me
>>107660184<|assistant|></think>
>24gb>700eur>460gb/s bandwidthreally intel?>24gb under 500 dollars>it will be 500$>where's my 600$>HE STOLE MY 700EUROShttps://videocardz.com/newz/sparkle-says-its-arc-pro-b60-gpus-are-now-availablehttps://videocardz.com/newz/intel-arc-pro-b60-24gb-workstation-gpu-to-launch-in-europe-mid-to-late-november-starting-at-e769who is this card for?>770euros
>>107660197Wtf? I didn't post that pic
>>107660198still thinks
>>107660199>https://youtu.be/0qS6HmiRNzE>llama 70b>5t/s>that gpu utilizationits OVER
>>107660199>23t/s with qwen3 30b a3b>23t/s>on empty context
>>107660199Enterprise™
What do you mean local. Is everyone here a billionaire? How are you fuckers affording anything?
>>107660238don't sound right. the software must be horrible.
>>107660248Most of us have gainful employment. Shocking, I know.
>>107660254Considering most people don't earn more than 100k a year, I still don't see it.
>>107660248No but I have a job
I downloaded locallm and a gpt oss 20b uncensored model and now I'm drawing a blank on what to try. What can I do with local models besides cooming and coding?
>>107660302wife agent
>>107660297most people ITT bought their ram maxxed hardware for deepseek/kimi/glm before the ram price surgesome anons run deepseek on hardware that cost like 1000-1500$
>>107660248I'm not American, so I can splurge a little.
>>107660302Vibe code a revolutionary app.
>>107660248If you don't mind low speeds and sloppy slop you can run quantized models on most PC hardware
>>107660302>coding>toss 20BYou can remove that part
so this is the power of llama1 7b...
GLM4.7 feels like the K2-0905 to 4.6's K2-0726 or the 4.5 Opus to 4.6's 4.1 Opus. Everything really is going down the shitter.
>>107660248Trillionaire actually
>>107660307Yup, my Rome DDR4 build from like 2 years ago runs DS reasonably and was like $1500 at the time (not counting the 3090s I already had).
>>107660307That's the only benefit of being a /lmg/ resident. Buying hardware before their price surge.
>>107660248>How are you fuckers affording anything?No poors allowed
>>107660462Is middle class at least tolerated?
>>107660468As long as you don't get too far in debt
>>107660248We dont. Most are coping with a small model
>>1076602483.5T/s for 4.7 with just a high end gayming desktop.
>>107660248it only costs 300k for a full h200 server
>>107660386That's precisely how I feel about it. Can't they make more "calm" models?
>>107660197How do they already have a datapoint for 2100?
>>107660491>only
>>107660491For that price you could rent that same h200 rack on vast.ai for two years straight lmao
>>107660722and after 2 years you would have nothing
>>107660729Your h200 servers would have been obsolete in 2 years anyways
>>107660745they wouldn't be obsolete even under the most rushed deprecation schedule possible, but /g/tards love pretending things are obsolete
>>107660763If they won't be obsolete why is Nvidia selling GPUs with buyback clause?
>>107660796I don't know since I'm not nvidia's sales departmenttechnical obsolesce has a very strenuous relation with sale conditions
>Thump-thump. Thump-thump
>switch to linux>2x as fast>processing takes 1/10th the timewtf is wrong with windows???
>>107661082jeets
Fuck me, with the power of linux I can actually run a 24B model at Q3 now on my poorfag rig. So far, cydonia is way smarter than anything I've used before but feels really sloppy. Any recs? Magidonia?
>>107660202well now we know what you were previously planning to post on pol, chuddiei definitely had this happen when i was using kuroba ex though i think, it remembers if you uploaded a pic previously which will only ever fuck you over>>107661082Same but i won't pretend i didn't wish i could run this and have text appear like i'm on a 30b a3b moe. Especially when this little fucker decides to spend 10000 tokens on thought
>>107661127dont use drummerslop models.
>>>/v/729277223>NovelAI's whole thing is being unfiltered.>Now they offer GLM 4.6 with 32k context, which is pretty good considering that you get unlimited use.>I think it is a good service and very user friendly.Yeah, I think I'm sticking with NAI. Z.ai ruined 4.7 with their safety training.
>>1076612854.7 seems as horny as ever. It's just a worse model because it's one of those modern releases that have zero sense for pacing and the ADHD "but wait, self-correction:" style of thinking that's made a horrible return in the past few months. 3.2-Speciale remains the best of the modern bunch because it at least writes well but 4.6 will have to do until that one guy working to implement it is done learning how to vibecode
>FirePaintedCydoniaIt's slop time
>>107661127>Q3>24Byou can't be serious
>>107661127try the base model (mistral small)
>>107660171
>>107660171is trooncinante still the top-tier 12b model? i got so used to its isms i already know what it's going to generate before it does