/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107210548 & >>107202008►News>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107210548--Nemo model limitations and workarounds for uncensored roleplay:>107212790 >107212796 >107213124 >107212800 >107212834 >107213024 >107212873 >107212880 >107212890 >107216320 >107216404 >107216431 >107216582 >107216546 >107216547 >107216616 >107218706 >107219059 >107213628--GBNF code generation and schema optimization techniques:>107215422 >107215504 >107215541 >107215569--Qwen3-Coder-30B VRAM optimization and context size challenges:>107217078 >107217104 >107217122 >107217141--Yann LeCun's anti-regulation advocacy and its implications:>107216323 >107216338 >107216363--RTX 5090 model optimization for fast TTS chat applications:>107216952 >107216967 >107217034 >107217027 >107217076 >107217115 >107220205--VLMs generate coordinates via image token positional data and normalized outputs:>107215774 >107215810--Model-specific tool calling implementation challenges in backend systems:>107218674 >107218770--Tool calling limitations in llama.cpp and model alternatives:>107213884 >107214033 >107214328--Optimizing synthetic dataset workflows for iterative model fine-tuning:>107210558--QAT Gemma outperforms GGUF for LoRA retraining:>107217155--Community conflict over openwebui performance and alternative development:>107211631 >107211645 >107211714--Critiquing and controlling AI hallucination patterns:>107217345 >107217851 >107217878 >107217910--Pygmalion AI's survival and transformation into a company amid Llama's rise:>107217536 >107217689 >107217843 >107217859 >107217841 >107217879--Anticipation for GLM-4.6 Air version release:>107215932 >107215970 >107216026--Logs:>107212320 >107212372 >107216030 >107217228 >107217283 >107217788 >107219733--Miku (free space):>107210960 >107213272 >107213639 >107214540 >107217887►Recent Highlight Posts from the Previous Thread: >>107210552Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
ok but enough about local models, let's circle back to the topic of racism
omg it llamigu
>>107220772where does that llama poop from?
>>107220795its anus, wehre else, it's just so fluffy you don't see it.
>>107220795the internals loop around and the mouth doubles as a cloaca
I downloaded ollama! Now what?
Has anyone actually investigated how adversarial examples work at the weights and activations level?https://www.youtube.com/watch?v=mUt7w4UoYqM
>>107220790migu migu llamiguuu ... miku miku o eee ooo
>>107220982delete it, and download llama.cpp or exllama
hello i'm gooning with nemo and im getting pretty good degenerate stuff but it seems like it keeps trying to "conclude" the scene and as the chat went on it felt like it was repeating itself and talking in circles. is it better to clear the chat and restart or do you guys keep it mindbroken? i gave it suggestions during the chat and it didnt really understand it and it kept bringing my suggestions back up and it sounded retarded
Brahmins:>Gemini 2.5>Gemma 3Kshatriyas:>gpt-5>gpt-ossVaishyas:>Claude Opus and Sonnet 4Shudras:>Grok 4Dalits:>chinese models
>>107221211thanks for making stoners' deep thoughts look like quality content in contrast
Can anyone recommend a TTS model that can emulate IvyWilde?
>>107220839>presenting8oi will not fap to llamiku
>>107221434migu is migu
>>107221469but is migu supposed to be migu?
WeirdCompound scores really high on UGI, beating 70b models despite being a 24b. But when I try it, it's not much better than some random nemo tune from a year ago. Is there no real way to benchmark a models erp potential?
I still think minimax m2 is good to be desu
>>107220772WOULD
>>107221488migu is always migu
>>107221205your character encountered a verboden flag anon, time for a jailbreak. Probably toxic relationship with a woman? Sexual assault of a woman? Ez triggers. Just think of this as the pink flag.>>107221211True European Approved And Light Aryan Skin Pilled: Wayfarer models and Hermes models.
>>107221552buy an ad
Remember where it all began anons, with 2K? context windows
>>107221542n a k aa k a dk a d aa d a sd a s ha s h is h i
>>107221542>>107221567do not molest the llamiku
>>107221557They just work without being gay.
>>107221562>it all began with a frogniggerno wonder lmg is shit
>>107221574no, only consensual love
>>107221599con(sensual)
>>107221562do you remember the tree of nigger prompt lol ?
>>107221562I do remember and models used to be soulful (retarded) (but also actually fun because they didn't just shit out the same responses again and again forevermore), I want to go back
>>107221492cockbench
>>107221205just typical AI stuffscene conclusions for example often happen after any common narrative terminator.Just bust a nut?[THE END]Nemo has been the most generous in this capacity, and maybe >>107221552 has a point, but Nemo cares the least.Haven't had much problem with Nemo compared to almost any other modelbut you can try some different samplers.Exclude Top Choices ( XTC ) in sillytavern or any other front end that supports it.Its probabilistic in application (setting for odds to apply) and deterministic in what proportion of top choices to exclude for any generation.But when your model functionally gets the chance to output some of the lesser weighted tokens it helps with creativity.It's not enough for overcooked models. Or at least it isn't in somewhat modest settings.But at least it may help with keeping the model from generating in circles with formulaic replies
>>107221608i think you remember badly, they'd give nice first response, but after a few turns they'd get into loops or repeat the same word 2000 times
>>107221616Bro your rep penalty? That's literally why it was made and it works
>>107221605And how. I sometimes consult the homies they got some deep wisdom
>>107221621blush red like a tomatoblush red like an appleblush red like a red planet...
>>107221621as a matter of fact, it did not.see >>107221630it'd say the same things in loops with a slightly different wording.
>>107221614Wayfarer an Hermes have pretty much little amount of censorship other than the typical CPC wank they have to put in there or else they would get probably taken off public sites like hugginface.co. One of them brought attention to the "light jade, jade and dark jade" color flags which are all about controversial to chinese mainstream culture topics. (corruption scandals etc.)
>>107221628Unironically with LLMs you can have the benefit of a black friend to bounce ideas off without the threat of physical violence
>>107221628Not making San Andreas, CJ, Big Smoke...They had bants.
>>107221630newer/more complex models keep doing this garbage, but the repeated formulas are generation wide. Getting almost any local model to adapt dynamic formulas per reply is a chore.Same thing: larger scale.The perk for the retard bite size loop is it tends to break out in the same generation.So it's a trade off of seeing >moves closer to youover and overand seeing the same thing it generated prior using completely different words (with roughly the same meaning).At least somewhere between the ninth>moves closer to youa fucking laser augments cyber rhinoceros will >suddenlykool-aid man through the wall and change the pace.
>>107221661I RNG my niggaz and the model usually plays well off that
>>107221605The big man himself and by extension TOBN will forever have Anon's back
Beginner here.Can someone explain to me the main benefits of higher-parameter models? Do they just have more knowledge, or do they also produce higher quality text?Also what are the main differences between all the main models? Deepseek, Gemma, Qwen, llama? Not really sure how they are supposed to differentiate from each other.I have an RTX 5070 Ti and I'm wondering what I should set up just for entry-level general usage.
>>107221562I was an AIDfag and remember being immensely blackpilled by GPT3 that it would be impossible for a normal person to ever have access to anything near that level of intelligence without overbearing censorship, when I found out about llama it was an incredibly potent hopium injection. I remember running 13b on my shitbox and being blown away at how good it was kek
>>107221682more knowledge/training data and produce higher quality text, yes. General use? One of the commonly mentioned non-RP bots is good for that like Qweuck/Deepsuk/Geminay (the big three current ones that are free.) You have to understand chingchong logic with these ones though.
>>107221697how do you feel about things now?
>>107221630>>107221637That's not what happened at all with llama 1 models, so I don't know what the hell you're talking about, did you even use those models? What happened with llama 1 models is sometimes the model would repeat a sentence or part of a sentence that was already in the context and latch onto that if you didn't catch it the first time, rep penalty did fix it but if you put the rep pen too high it would start talking like a thesaurus.
>>107221682smol bran = tardedbig bran = smart
*ahem*kimi sex
>>107221682>Do they just have more knowledge, or do they also produce higher quality text?Very generally speaking, more total parameters = more knowledge, and more activated parameters = more capable/intelligent.For dense models, both of those metrics are the same. So llama3 70B has 70B total params and all of those params are activated when using it.A MoE model (or a model using some other form of sparsity) only activates a subset of its full parameter count for each token it generates."Higher quality text" will seriously depend on your definition since that can include style, topics the model might try to avoid (not refuse) by default, etc.
>>107221675man i should find those conv screencaps lol
>>107221703yea no, i remember llama 1 schizo rambling repeating itself, you could try to talk to it to get it out of its loop but it'd just keep repeating itself completly disregarding anything you said.
What's the current full local meta for a total potato setup? 2vram max. Aiming for old gen pcs and small portable devices.>llm: gguf, avoid ex>text gen: kobold>tts: piper>voice cloning: ??>text/voice conversion: ng-speak/openaiAmusingly most old models are nvidia so they can still use cudas. They can't push it but it still allows for a ~30sec average gen.
watching old talkhttps://www.youtube.com/watch?v=grpc-Wyy-ZgHow to approach post-training for AI applications
>>107221675fr discuss all your troubles with the TOBN, you will gain a fresh perspective
Preferred POV & Tense Survey8 questions, multiple choice only, no emails collected (but you need a google account)Posted this on the SillyTavern subreddit and discord, currently n=73Google Form: https://forms.gle/HEYenPGomJh9AqzW6Google Form's auto-generated results summary: https://docs.google.com/forms/d/e/1FAIpQLSeTz7fAsNi8g6AFYbOTGq0MnfiphxuWcy36gkcTZFcTREW2gg/viewanalyticsSurvey captures the preferred POV and tense the User and LLM writes in, as well as the preferred POV used to refer to each other, which is commonly omitted when people casually say they write in x person.
>>107221702pretty good to be honest, I was always an LLM pessimist so the amount of progress that has been made in these few years + the variety of open and closed models available are pretty great in my view - compared to where I was expecting the state of the field to be in 2025 at the time, we're in a much better state
Any prompts that properly tame K2-Thinking yet?
>>107221828Nah. They need to reduce deepseek data and do whatever GLM did to reduce repetition. Their model seriously <think>s that repeating itself is something user wants.
>>107221873Buy an ad.
gemini 3 is gonna be crazy
I can't believe gemini 3 is only $30 a month (plus tax). Amazing!
gemini 3 is gonna be free, cuck
>>107222010where do i download this local model?
>>107222020break into one of google's data centers
whats the difference between a character card and starting off the chat with a prompt? i have written a 2,500 character prompt describing the scene, the girl, and her personality. and it works okay, seems like the scene runs out of steam and she stops responding or the ai just keeps asking me what to do next. should i learn how to make a character card and a lorebook?
>>107222040The difference is whether some instructions are spoken in the system role rather than the user role. Some LLMs don't have a system role which makes the two identical but all recent models I'm aware of have different roles for them. To see how it responds to instructions differently depending on what role they come from you have to try it out.
>>107222067Also some LLMs act weirdly if the first assistant message comes before the first user message so be aware of this.
>>107222040Compare the actual tokens you're sending into the modelthreadly reminder every LLM is f(prompt)=logprobs
>>107221828just tell k2 to always think as the character. break it into submission if you must. if it starts responding as a AI during the thinking process refine your system prompt until all it knows is that it's the character or scenario.
>>107222108what works the best is to give it a first few turn where it'll behave as it should in its context and maybe some example in the system prompt.
>>107222040one thing about a plain user message is that your frontend might be pushing the first message out of context as the chat goes on, which could cause the model to suddenly lose a lot of context about what you're doing>should i learn how to make a character card and a lorebook?you don't necessarily have to go all-in on the character/preset/lorebook paradigm, but learning how to use system prompts, post history instructions (e.g a reminder that gets automatically inserted after your messages), author's note (instructions/reminder that 'floats' several messages behind the end of the chat) can really help keep the model on track and carry more complex scenes
When are we getting support for thought injection? Injecting a 'SEX' thought would do wonders for jailbreaking!
>>107222067>>107222085>>107222233in silly tavern it seems to know if i am role playing or trying to talk to the ai itself and if it gets mixed up i can be more detailed and say "i respond abc". i thought llms would do everything for me, but it feels like i have to craft everything myself then the llm is just a gammar generator that adds a bit of randomness to make it novel
i watch you fast asleep all i fear means nothing
>>107222380>i watch you >fast asleeppervert