/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107686942 & >>107679732►News>(12/26) MiniMax-M2.1 released: https://minimax.io/news/minimax-m21>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107686942--Implementing character roleplay with system prompts in Open-WebUI to constrain LLM responses:>107697050 >107697202 >107697800--Quantization quality thresholds in Llama.cpp for large language models:>107694813 >107695987 >107696086--Quantized model optimization under RAM/VRAM constraints:>107688512 >107688542 >107688581 >107688771 >107688839 >107688911 >107689227 >107689299--Feasibility of running 4.7 model with 128GB RAM and 32GB VRAM at 3T/s speed:>107694348 >107694375 >107694574 >107694605 >107694687--RWKV.cpp as Microsoft's on-device AI implementation:>107697911 >107698596 >107698971--Open-source model GLM-4.7 achieves top ranking on benchmark index:>107689325 >107689538 >107689545--M2.1 model performance and roleplay evaluation:>107698092 >107698194 >107698171 >107698182 >107698198--Hardware selection dilemmas for local LLM enthusiasts:>107687115 >107687159 >107687197 >107687217 >107687326 >107687392 >107687421 >107687348 >107687388--Quantizing Llama model with bf16 tensors:>107696219--Tennessee AI training restrictions on emotional relationships:>107698160 >107698180--Gaslighting language models to bypass censorship:>107692222 >107692310 >107692314 >107692485 >107693252 >107695957 >107696118 >107696260 >107696518--Browser-specific performance differences in ComfyUI workflows:>107695920--Anticipation and skepticism around Small Creative:>107689009 >107689037 107689080--AI tech for authoritarian parenting in China:>107698384 >107698600 >107698985--AI as interactive fiction game director:>107690487 >107690540 >107690553--Proposing Cockbench update with chat templates for training insights:>107698263--EGPU scalability for local 3T parameter models:>107692736 >107692864 >107692886--Miku (free space):>107688568 >107690307 >107694744 >107688652►Recent Highlight Posts from the Previous Thread: >>107686945Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Gemma
>>107700893Why was this hit?
>>107700924Canceled
>>107700977Previous thread links were wrong. Unacceptable.
is local still light years behind in TTS voice cloning?
the ai is being very kind and understanding. im inspired to embrace my inner snowflake
>>107700909Bald was better
>>107701174this is illegal in TN
>>107701174There is no wrong way to fantasize
>>107701174Yes, even the very AI the ones who make the models try to suffocate agrees with you and not them.
>>107701253There is, it's literally part of "wrongthink".
>>107701268i want to migu migu
>>107701088>is local still light years behind in TTS voice cloning?local TTS is light years behind local voice cloning was perfected pretty early on with RVC models in my opinion. If you can get a TTS from some other source, turning that voice into someone else's is trivial locally at that point
>>107701174What model?
>>107701310>huggingface.co/google/switch-c-2048
>>107701314tank
>>107701280migu is not for migu
>>107701310there aren't a lot of models that allows you to fuck the still warm corpse of a dead eight year old little girlit's nemo 12b or specifically rocinante v1.1
>>107701332Kek, nemo is pretty horny to begin with
Is Gemma 3 the best general purpose model around 30B parameters, or should I use something else?
>>107701332>corpseHow does it work? She literally can't react
>>107701394Roleplay, it can still describe you what's happening.
>>107701386Gemma3 feels so cucked to meI prefer qwen3-vl 30b
>>107700909
>>107701394i like to have another character watching me do it too, it's even better if it's a woman describing her horror over what im doing, along with the dirty detailslike when the dead little girl's bladder releasesim such a sick fucker...
>>107701450Maybe Tennessee is right after all...
>>107701332>rocinanteplacebo>nemoonly reason it was good is complete lack of censorship. makes me wonder how much better glmchan would be if she had zero censorship.
>>107701534Sucks that Chinese models can never have zero censorship since they just distill it from western models. Hopefully one day they can move past that need.
>>107701332Are you using straight nemo or some kind of finetune of it? It always seemed to lose coherency for me, maybe I need to give it another shot.
>>107701433Fixed your glazed garbage ^.^
>>107701332>it's nemo 12b or specifically rocinante v1.1How does it compare to normal nemo? Like what benefits?
Is Medusa Halo going to save local models?
>>107701631UOH?!?!?HOW DID YOU REMOVE THE POISON?!?!
>>107701715>OH NO MY ART>I better make it look like shit, that will show the AI people!!!
>>107701696>Like what benefits?It got astroturfed when nemo was new. People downloaded it instead or regular instruct and thought it is the magical finetune and not just the instruct model being good.
>>107701696You can use ChatML with it if you're really anal about using the format. That's about it.
I actually went back to 4.6 for sex.
>>107700056in the same way that I don't believe TV psychics because they haven't gone out and bought any lottery tickets, I don't believe ASI, or even scalable AGI, is out there simply because the players at the frontier with the most resources haven't suddenly started acting super intelligent as organizations. If they can't do significantly more than I can with my cpumaxxing rig, then why should I assume that there is a breakthrough any time soon?Sure, vibecoding is breddy gud these days, but its just accelerating those who already know. Not unlocking some crazy new tech tree shit.
who's publishing offline-nc? I'd send some BTC to get an updated version for 2026.
>>107701817I wrote some stuff in post-history instructions and now 4.7 (non-thinking) isn't censored at all for me. Plus I like the dialogue better.
>>107702052Send some BTC to OpenRouter and vibe coder your own updates. It's all the rage these days.
>>107702070Dunno how well that will work with a 13 MB blob of minified JS?
>>107702077>13 MB blob of minified JSwhat the fuck does it do?
>>107702077https://files.catbox.moe/zy8t2t.htmlThat's disgusting, but it's mostly embedded base64 images. You could probably extract out the scripts and have a bot clean it up if the guy abandoned it.
>>107702052Yeah, I can update it later. Just let me procrastinate a bit more.
>>107697882>>107697905no sorry get fucked, both of you.other sites have been and gone but lmarena is the only one still trying. you're the one who can't name a single viable alternative, fucking cockbench and nala sluts.
>>107702219Absolutely right sirs. Fuck those benchnod bastard bitch lasagna guys fellow white person.
I just want a local model able to say racist shit to my friends once plugged to a telegram bot
>>107702219Models have strengths and weaknesses. You need to write your own tests for the tasks you need, that's the alternative. Naming something that's slightly less shitty in your own arbitrary opinion doesn't make it good. Nala and Cockbench are also shit btw.
>>107702219the alternative is me loading up one of my chats and swiping
>>107702052>who's publishing offline-nc?a being known as "ff"
>launch GLM-4.5-Air-UD-Q2_K_XL (the one in OP) >kobold throws and error and shits itself immediately so I cannot view the error messageWhat do I do now?
I've been trying Minimax at IQ2_M a bit and it's not very good. Swiping on various chats, I already noticed repetition and sometimes even to the point of infinite looping. It's a rather smart model at times, and stupid at other times in a way that's different from 4.5 Air. Not sure if I like it more or less yet, feels like another sidegrade. There's no winning for 64-96GB RAMlets.
>>107702400get rocinante1.1air is shit and at this quant I don't even know what to say.but if you are serious about it try updating kobold
>>107702400how old is kobold version?
>>107702400Launch it from the command line, rather than double clicking the executable.
>>107702421>>107702426>>1077024284th of september ver, the newest from december makes it launch fine. So yes, updating worked, thank you.
>Assistant response prefill is incompatible with enable_thinkingHow do I fix this? This used to work.
>>107702566disable thinking and prefill the start thinking token.
Is there an easy way to modify the koboldcpp AUR package on arch linux while still having it auto update?I want to modify "ban_token_max = 768" to be a higher value and have it apply automatically every time the package updates.One simply can't have enough banned strings and the limit is dumb.I wish it supported regex in banned strings, that would be amazing.
>>107702566Just remove that from the code. It serves no purpose.
>>107702566If you're trying to disable thinking, the proper way is to do the /nothink thing on main or --reasoning_budget 0 on ik_ Or you could abuse ST being a piece of shit and set the prefil while in Text Completion mode and switch over to Chat Completion. Last I checked this still worked.
>ldg is literally dead, the legitimate thread now gets instantly deleted by a thread schizo who either is mass reporting or infiltrated the mod team>current thread is just a bunch of reposts/spam to make it look like it's alivewelp, now there is nowhere on the internet to discuss local image/video gen anonymously.
>>107702657https://2ch.org/ai/catalog.html
>>107702745true... how to post on that site btw? do I need a special russian proxy or something?
>>107702657You're about as anonymous as on reddit here.
>>107702778no bro, thread schizos and attention whores are totally anonymous because they don't have a trip on