/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107557369 & >>107545298►News>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo>(12/15) Nemotron 3 Nano released: https://hf.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049>(12/10) GLM-TTS with streaming, voice cloning, and emotion control: https://github.com/zai-org/GLM-TTS>(12/09) Introducing: Devstral 2 and Mistral Vibe CLI: https://mistral.ai/news/devstral-2-vibe-cli►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107557369--NVIDIA Nemotron 3 Nano release and performance benchmark discussion:>107558565 >107558583 >107558701 >107558727 >107558760 >107558861 >107558860 >107558869 >107559334 >107559406 >107559367 >107559384 >107559405 >107560280 >107560507 >107564669 >107560593 >107560798 >107560815 >107560899--IK's tensor parallelism progress and integration challenges:>107557453 >107557899 >107557995 >107558080 >107558267 >107558029 >107558035 >107558063 >107558505 >107558550 >107558859 >107559067--Llama 4 Maverick long context model advantages:>107562867 >107562902 >107563317 >107563349 >107563497 >107563551 >107563642 >107563700 >107563832 >107563877 >107564458 >107564519--Model performance and hardware optimization debate for local AI setups:>107559424 >107559483 >107559513 >107559945--Proposed features and debates for Nemotron model development:>107558909 >107558966 >107559048 >107559133 >107559214 >107560227 >107559183 >107559032 >107561793--Critique of small model releases vs practical hardware/software tradeoffs:>107562979 >107563017 >107563076 >107563675 >107563692 >107563760--Struggles with GPU memory and exploring smaller models for WAN2.2 tasks:>107561138 >107561200 >107561288--Evaluating Qwen3-VL abliterated for NSFW tagging and story generation:>107562584 >107563110 >107564259--Hardware investment and performance tradeoffs for ERP:>107557633 >107557675 >107557704 >107557805 >107557974 >107557686 >107557694 >107557706 >107558681--New ResembleAI chatterbox-turbo TTS model release:>107561554 >107561620 >107564505--Anticipation and updates on Google's Gemma release:>107557425 >107557577 >107557619 >107557683 >107558070 >107558113 >107558115 >107558122 >107558555 >107560609 >107560661 >107560684 >107561424--Miku (free space):>107562046 >107563692 >107561677►Recent Highlight Posts from the Previous Thread: >>107557373 >>107557646Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
chat is this true?
>>107565217This is what's going to be in full control of nuclear weapons in 10 years.
on consumer grade hardware, like with the X870E chipset, you can only have 2 gpus running in x8 mode. Is that going to make a huge difference in speed?
>>107565244Depends on what kind of speed you’re referring to. Model loading speed? Yes. Prompt processing and inference? Maybe not so much.
>>107565244You could have them running in x4 mode and it's still going to be magnitudes faster than running in RAM.
>>107565242Kek
>>107565217>chat is this true?you have to ask?
I am so FUCKING ready to pop Gemma 4's cherry.
>>107565265>>107565279what about with things like text-2-vid and text-2-img, would it make a big difference there?
Just woke up, is lemma 4 out ye->it isn'tACK
>>107565348cant do multi gpu with those, so no.
>>107565244it'll be fine, there's probably psychos in this very thread running 3090s through pcie 3.0 x1 risers
>>107565345gemma is a middle aged libtard hr lady. sorry to say she probably already took miles of cock in college
>>107565482gemma is a femcel who talks in cliches she's picked up from womens' romance novels, and has memorized rape hotline numbers to deflect from her own fantasies.
What is the biggest boy of the vision models right now?
>>107565522Best is Qwen 3-VLGLM is literally bigger but significantly worse
>>107565536Thanks I'll try it. Does it do well for structured use like asking it to output a json quantifying/classifying desired aspects of the image? Or maybe being a decent prompt enhancer for image gen stuff.
>>107565562I haven't tried the former but I can see it being decent for the latter, providing that you don't explicitly need existing danbooru style tags. But it's very accurate and has minimal censorship, especially compared to the few alternatives we have.
>>107565536qwen is censored tho
Gemma is a big girl
>>107565581Qwen models have some of the thinnest censorship around. A simple 'Sexual content is allowed' in sys prompt will defeat it, as long as you're not doing something dumb like asking how to abduct children in a fresh chat.
>>107565581Its useless then. Whats fun about vision models is making them interpret fucked up stuff
Nemotron status?
>>107565695a'ight
>>107565502truke
>>107565695wait a second... is that...
Are you ready sirs?
>>107565923praise ganesh sar
>>107565345it will cuck you and refuse you in your sleepexpect nothing more
Is it possible to negatively scale a model using the passthrough merge method? Because there is a scale parameter, and from models that have already been uploaded, it seems like you can just scale down parts of the model.https://github.com/arcee-ai/mergekit/blob/main/docs/merge_methods.md#passthrough-passthroughhttps://huggingface.co/nlpguy/Mistral-NeMo-Minitron-Upscale-v2/blob/main/mergekit_config.yml
>>107565610okay, downloading qwen3-vl:235b now.
>>107565939It takes some effort for sure but I've made Gemma 3 output some absolutely filthy things that if that dumb bitch senator who caused Gemma 4's delay read it, google deepmind would be a hole in the ground.Also, I like how Gemma writes dialog for quite a few of my cards.
>>107565978Go for it, should be good.For reference, >>107565695was done using the 30b MoE.
Qwen isn't good at RP though is it?
>>107566000shit, that's awesome. I'll let you know how many tokens a sec I get on it on the iChad Ultra
>>107566011Compared to similar sized models, not really. The 235b is big enough that it can overcome the fact that qwen clearly don't have a lot of creative content in their datasets relative to math and coding examples.I wouldn't use qwen 30b/32b over say Mistral Small or Gemma for RP if I wasn't planning on using the vision functionality
>>107566043>>107566011I tried devstral-2 123b, it's okaydeepseek r1 671b a q2_k_xl is the best so far. I can't run the full kimi-k2 :(
why are women like this?
>>107565204Is Qwen3-vl in general better than Qwen3 for the same size model?
sirs where new brahmin model ?
poorfag with a 3060 here. Is the best model for cooming still mistral 12b for me?
>>107566204pretty much unless you have a lot of ram
>>107566207i got 32gb of ddr4
>>107566236nemo is still your best and pretty much only option. maybe an rp tune of gemma 4 when that comes out, but dont get your hopes up
>>107566204gemma 4 soon
>>107566139Seems to be very similar, certainly not worse
>>107566324Thanks, noticed too. Just a sanity check.
>>107566100*billionairs across the room vampirically*
>open up wireshark>capture packets>find some wild-ass rare looking packet>right-click, copy as hex stream>Giving no other context, paste the hex into your llm in backticks asking "what can you tell me about this?".post results. name and shame the model/quant
erp model for 6 gb vram?
>unknown model architecture: 'nemotron_h_moe'
I'm tired of refreshing huggingface.co/google
does llama.cpp support qwen3 vl?
>>107567049plophttps://huggingface.co/google/medasrhttps://huggingface.co/google/medasrhttps://huggingface.co/google/medasr
>>107567076This changes everything.
>>107567049nobody wants anything gemma-relatedwhy are you larping?why are so many mindbroken schizos in /lmg/ ?disconnect your internet and ponder upon your deeds
>>107567101Gemma is always the best local model for its size, it's just turbo cucked. But we have a new abilteration technique that doesn't damage the model now so we can just abliterate it and finetune
>>107567076Where is gemma bloody bastards?
>>1075671202mw
the t5gemma2 MR got merged recently
>>107567131https://www.mooreslawisdead.com/post/sam-altman-s-dirty-dram-deal
>>107567120Maybe full llama.cpp support behind the scenes isn't ready yet.
I have a bag of popcorn waiting in my microwave, please google we NEED another Vaultgemma
>>107567115>But we have a new abilteration technique that doesn't damage the model now so we can just abliterate it and finetuneCan you give me the tldr?
is it possible to make custon REAP models?
>>107567199>https://huggingface.co/mradermacher/Gemma-3-27B-Derestricted-GGUFYou can click around and find the leddit post about the technique. Anecdotally it works, it doesn't damage the model. It doesn't instantly transform gemma into a horny ERP model but it does get rid of the refusals / censorship and turns it into a nice base for finetuning
>>107567115>Gemma is always the best local model for its sizeconsider rope, you are suffering from psychosis
>>107567115>abliteration technique that doesn't damage the modelDebatable, but OK.>so we can just abliterate itYes...>and finetunekek, no way you're serious.No single finetuner from the community has the capabilities or the compute for doing a proper job in this regard. Day-0 (more like hour-zero) ERP finetunes when new models get released are a joke.
>>107567231Some lab made those vectors for llama where you literally had sliders you could use to influence how the model behaved. The vectors ranged from "pirate talk" to "sexual innuendos related to my little pony"Why did that not become a thing?
>>107567293I thought they didn't open source it
>>107567302It can't be that hard to replicate.
>>107567293softprompt? isn't it just about optimizing some extra tokens at the beginning of prompts with gradient descent?
>>107567330https://www.anthropic.com/research/mapping-mind-language-model
>>107567286It's really not that expensive to finetune a small / medium sized model. Just use adafactor and a batch size of 1 and it will tune on one of those 96GB cards within a week with a decent RP dataset. Unfortunately people have drank the Nvidia kool-aid and think they need a batch size of 6 gorillian
>>107567115What if mitigating it's effect was the reason for the delay
>>107567309>>107567330If I remember correctly, the general method involved training an autoencoder with a sparse latent + reconstruction loss on internal representations to find features. But I suspect most of the expenses went toward the identification/interpretation part. That said, I still feel control vectors are somewhat underutilized
GPT 5.x sucks at trivia. Doesn't know shit that kimi/r1/gemini know. Is it really a smaller more bechmaxxed model?
>>107567481yes.
>>107567347The main problem is that decent RP datasets don't exist in isolation. You can't simply finetune with good results a LLM on just RP logs or just NSFW data (no matter how beautiful or manually well-curated) without making it either retarded in several aspects, stupid-horny or overfitting it to one specific interaction format. And with LoRA finetuning, at best it will gain very superficial knowledge of RP-related topics that it previously didn't know, unless you train it on a ton of data that you likely don't have or can't sift through.RP has to be a small portion of a full mid/post-training regimen, and you need commercial-level amounts of resources for that; one 96GB GPU won't be enough, soloing anything of this scale would be delusional too.
>>107567481Things are regressing fast in every aspect. The crash is imminent.
>wake up>no gemmaWhat gives?
>>107567633>at best it will gain very superficial knowledge of RP-related topics that it previously didn't knowThe base model should already "know" everything as long as it was trained on a decent dataset, the finetune is just to draw out that behavior