/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108633862 & >>108630552►News>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108633862--Implementing real-time search using browser-based MCP servers and tools:>108635788 >108635795 >108635801 >108635814 >108635845 >108635847 >108635850 >108635863 >108636123 >108635867 >108635921 >108635957 >108636055 >108636110--Comparing Gemma-4 26B MoE and 31B dense for quality vs speed:>108636610 >108636626 >108636640 >108636644 >108636664 >108636673 >108636713 >108636725 >108636678 >108636733 >108636772 >108636836 >108636907--Comparing Gemma 4 and GLM regarding user parroting and RP quality:>108634812 >108634837 >108634842 >108634848 >108634855 >108634916 >108634925 >108634987 >108635013 >108635156 >108635191 >108634962 >108635079 >108635479 >108635589 >108634884 >108634895--Discussing XML tags and indentation for improving system prompt attention:>108635966 >108635979 >108636138 >108636462 >108636468 >108636506 >108636510 >108636540 >108636560 >108636572 >108636815--Benchmarking Gemma 4 and Qwen with Puppeteer for automated tasks:>108635408 >108636007 >108636089 >108636106 >108636111 >108636140 >108636126 >108636219--Hardware requirements for dense models versus Gemma-4's efficiency:>108634252 >108634342 >108634533 >108634542 >108635918 >108634365 >108634379 >108634669 >108634452--Benchmarking thinking tokens and speed between Gemma 4 and Qwen:>108634323 >108634513--Comparing noir prompts versus descriptive prose for better narrative flow:>108634519 >108634528 >108635090 >108635130 >108635132 >108634696--Theorizing reasons for Gemma 4's low censorship and RP performance:>108635566 >108635571 >108635613 >108635618 >108635825 >108635616--Dealing with 403 errors and blocks when web crawling via MCP:>108634013 >108634031 >108634066 >108636022--Logs:>108634316 >108634519 >108634634 >108634696 >108635814 >108636241 >108636774--Neru (free space):>108635532►Recent Highlight Posts from the Previous Thread: >>108633866Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
gemmaballz
yup, dflash is cookedit's over
>>108637581>>108629083
This week will be a week.
You are a knight living in the kingdom of Larion. You have a steel longsword and a wooden shield. You are on a quest to defeat the evil dragon of Larion. You've heard he lives up at the north of the kingdom. You set on the path to defeat him and walk into a dark forest. As you enter the forest you see
>>108637701This week will be 2 weeks.
i'm kind of a noob. i have 8gb vram so i took gemma e4b. how worse is it than the other models for conversation?
>>108637758very
>>108637758how much ram do you have?if you have 32gb ram, you should use 26b instead
>>108637758try q4 of the moe
qwen cant follow basic instructions ignore all chink shillshttps://files.catbox.moe/p8fpnk.png
How is Gemma4 so good bros? No slop, no refusals, better writing than deepseek, and it's just 31b.
Do NOT buy any hardware. Just wait a couple years and you'll be able to run Kimi on a consumer GPU.
>>108637801>no slopI love Gemma, but come on.>better writing than deepseekDunno because the Deepseek, GLM, and Kimi shills never post their logs.
Gemma is implementing her own self-modifiable MCP server. On 24 fucking GB of VRAM. GPT 4 could not have done this.I remember the news cycle about room temperature semiconductors when some anon said "if this works we will have GPT 4 at home".The world might be going to shit fast but I'm so happy to be living this timeline.
is there a list somewhere of the most common overused expressions in LLMs, either purple prose or just generally written too many times in the same chat?
>>108637879https://github.com/conorbronsdon/avoid-ai-writing
>>108637873What does it modify?
>>108637811>Just wait a couple yearsim hoping we get inference cards with embedded models like these https://taalas.com/products/i assume you cant buy them yet because atm things are moving so fast that the cards will basically be obsolete on release and not worth the money. but once things start slowing down i could see googlel bringing out a gemma 6 one of these
>>10863777416 sadly>>108637787ok thx
>>108637890It's not that different from an agent like hermes or openclaw, but it's implemented as an MCP server I can use anywhere, and it provides tools so the LLM can implement more tools if it needs to, or just general persistence. It's a self-modifying agent encapsulated as an MCP server.
>>108637873>self-modifiable
>>108637916I'm doing all this with q4 kv cache, which proves it's not as unreliable as some people here claims.The model shows some signs of stupidity when using tools (but is great at self-introspection to avoid those pitfalls when prompted), but no confusion regarding past context.
>>108637970
anyone tested higher context RP with Gemmers 31b yet? The lack of context shift means reprocessing hell so I've been limiting myself to ~40k context, but I wonder if there's actual merit to going above that
Orb-anon, are you there? Why did you decide to host the project on gitlab and not on github? Any chance you will move to github? More people are there and it's easier to track issues and receive pull requests.
>>108637885This is interesting but not exactly what the anon asked for as this is primarily for general purpose tasks. I myself am curious if anyone bothered to put together a list/database of all such LLM prose cliches, namely in relation to my ablation research.
>>108637985I'm going to assume the answer to that question is fuck microsoft and also fuck having unicorns every five seconds. It doesn't take a genius to see why github is dogshit in 2026.
>>108637985Exhibit A of a retard in his natural environment
https://teenaegis.com/intelligence/ai-danger-indexDeepSeek has been listed as "Very Dangerous"Stop using them
>>108637993https://github.com/sam-paech/slop-score/tree/main/datahttps://github.com/sam-paech/antislop-sampler
>>108637993>fighting prose clichesYou'll end nowhere
>>108637993Maybe LLMs aren't for you.
>>108638036This thread isn't for YOU, Luddite shill.
>>108637798And without the retarded jailbreak and mesugaki persona?
>>108638011Thanks anon, the first is what I wanted, especially:https://github.com/sam-paech/slop-score/blob/main/data/slop_list_trigrams.json>>108637885Interesting, maybe I can adapt that for the assistant chat.
>>108637978E4B can reliably cauge information from ~60k context. I'm pretty sure that 31B will handle more complex situations.
24 hours until k2.6
>>108638062https://github.com/SicariusSicariiStuff/SLOP_Detector/blob/main/SLOP.ymlThis one includes regexes for phrase structure.
>>108637976I did all this so I could make it get this for me btw
>>108638086Thanks!
>>108638000trips of trvth
>>108638075I'm happy for you and the one other anon who will be able to run it.
>>108637873>her
I have successfully wrangled the success rates of non-thinking qwen 3.6 tool calling by fixing the prompt schema. Character library is also coming along nicely.>>108637985Just post the issues here I'll read them ¯\_(ツ)_/¯
>>108638191Isn't this too bloated already?
>>108638211Wdym? That's for people who have hundreds of characters. The tags for filtering only show the most 15 popular tags to avoid bloat.
>>108637978I've reliably used 31b up to 76k context for rp without any problems. It's pretty crazy to be able to keep it going this long without having to summarize.
>>10863822215 most*
>>108637978No because I'm a vramlet but I've seen a couple anons mention it performing well at 100k+ context.
>The weather forecast suggests that the end of April looks much more unstable than the beginning, meaning we're in for some meteorological shitshow.Right.
>>108637825>Kimi shills never post their logs.I posted kimi logs / screenshots / retard summaries in the past 3 or 4 threads.Also, not excited for 2.6 because I bet it'll be code-only like qwen.
>>108638211That modal is displayed with the Browse button, the left bar still shows the 5 most recently talked to characters.
>>108638259I was kidding... (or not)
>>108638259link?
>>108638050no point in trying without if it cant do it with a persona it cant follow instructions. gemma can do it fine, people are saying qwen is better but it cant do it
>>108638292https://gitlab.com/chi7520115/orb
>>108638259>Amaryllis>Shodan>Gothic Coding SenseiAre we back in 2023?
Does this legitimately improve Qwen 3.6?huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Wasserstein-GGUFAnyone tested it? Can't tell if it actually helps long context tasks as claimed or if it is just LLM hallucination gibberish.I apologize for posting plebbit, but here is further info:/r/LocalLLaMA/comments/1sp2l72/
>>108638367No finetune has ever improved a since 2024.
Is there a way to force gemma/qwen to reason from first person (picrel)? Base GLM-4-32B-0414-32b and Mistral-24b seem to be doing it fine but gemma/qwen just writing reasoning like a code. Even with explicit instructions it still gives me summary and bullet point reasoning.The explicit instructions in question:System prompt:You're {{char}} in this fictional never-ending roleplay with {{user}}.<|channel>thoughtCharacter inner monologue should be mark like this.<channel|>"Speech must be marked with quotation marks."*Actions, internal thoughts, physical descriptions, and narrations should be marked with asterisks.*Post-History Instructions:Note for thinking block: Fully immerse yourself to the point of reasoning from {{char}}'s perspective. Thinking block must be from {{char}}'s POV, first person.
>>108638259GPT 5.4 UI (slop)
Bought this giga gaming laptop with 128gb of RAM, sharing up to 96gb with the iGPU, hoping to be able to use my desktop (with a 5090 in it) for gaming while doing some casual chatting with a chatbot on the laptop. Unfortunately it's AMD, and the difference between CUDA and Vulkan is stark. >5090: Process 1.86s (3570.12 T/s), Generate: 20.01s (42.78 T/s)>Laptop with Ryzen AI MAX+ 395: Process 43.6s (152.39 T/s), Generate: 99.53 (8.47 T/s)Might be more effective to just play my vidya on the laptop and use the desktop for chatting.
>>108638379Text completion and prefill hackery, maybe.Or terminate the real thinking process and instruct it to use <charname_thinking>, custom CoT style.
*speculates*
>>108638325God I wish
>>108638380I'm coding with qwen 3.6 q4km + Roo kek. I described ST's design to opus 4.7 and had it draft a skeleton for me though.
I slopped up my own VN frontend that uses anima with comfyui to automatically generate sprites and CGs for nsfw ERP (or wholesome) with gemma 4, it also automatically handles location changes and generates depthmaps to give locations a "3D" feeling.I was tired of the other "engines" that added useless bullshit like inventory, stats and turned them into a cluttered mess.the "slowness" is mostly caused by GPU struggling with gemma 4 31b, I only have 16gb vram sadly.
>>108638451nta, I use the same (Roo+Qwen3.6-35B-A3B-UD-Q4_K_M), its very good :3
>>108638473that's pretty damn cool
>>108638397> <charname_thinking>Thank you, it did work! In my experience any change in <think> formatting would break reasoning process.For those who interested, what I did:Replaced this line:<|channel>thoughtCharacter inner monologue should be mark like this.<channel|>with this:<{{char}}_thinking>Character inner monologue should be mark like this.</{{char}}_thinking>
>>108638473Cool. You gonna share eventually?>16gb vramAre you running comfy on a separate machine? I have 24 and Gemma eats it all up.
>>108638473Impressive. Generates prompts for user's given action in the current scene?
>>108638473Damn, now that's the future
>>108638473Pretty cool. Reactions seem out of order though. Is it prompt issue or can't 31B handle it?
>>108638506No, THIS is the future. Real time AI generated advertisements everywhere. Forget about games...
>>108638506I'd say ERPing with AI in VR is the future but it's still pretty damn cool.
>>108638521Don't give them ideas
>>108638521For me, it's BEER ONLINE and SCENE SELECTION.
>>108638486be sure to change the reasoning tags in response formatting or all that CoT will be filling up your context
>>108638473How do you do image and text with 16gb? Do you load/unload the model every time you need the other one? Doesn't that take way too long?
Does shorter response = better quality?