/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107573710 & >>107565204►News>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo>(12/15) Nemotron 3 Nano released: https://hf.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107573710--Paper: RePo paper and multi-image CAPTCHA challenge discussion:>107577314 >107577342 >107577367 >107577411--Optimizing text generation for creative writing using specialized samplers:>107574218 >107575323 >107575354 >107575474 >107575423 >107575274--Comparing OCR models for Japanese text in manga, including dots.ocr vs Gemini 3:>107574359 >107574473 >107574490 >107574523 >107574745--Running large AI models on consumer GPUs with limited VRAM:>107574547 >107574575 >107574579 >107574602 >107574606 >107574663 >107574695 >107574640--Critique of AI-generated code quality and bot theory skepticism in LLM communities:>107576227 >107576364 >107577638 >107577666 >107577995 >107577971--GLM 4.6V's flawed reasoning patterns in Touhou character identification:>107574600 >107574648 >107574699 >107574747 >107574921--Meta SAM Audio release and vocal isolation quality:>107576201 >107576427 >107580108--Low-VRAM LLM testing strategies and model recommendations:>107579504 >107579535 >107579545 >107579608 >107580036 >107580142 >107579626--Optimizing glm-130B quantization and thread settings on 2x3090 GPUs with llama.cpp:>107579155 >107579182 >107579226 >107579251--Anticipation and speculation around Solar-Open-100B model release:>107577317 >107577343 >107577412 >107577419 >107577768--Seeking consistent accent voice cloning alternatives:>107578331 >107578356 >107578483 >107578538--Mistral model's formatting and instruction-following challenges:>107574541 >107574574--Chatterbox Turbo vs F5-TTS performance comparison on different GPUs:>107576884 >107576899 >107576921 >107576953 >107576962--Dipsy and Luka (free space):>107575318 >107573767►Recent Highlight Posts from the Previous Thread: >>107573726Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107577061There's some weird caching going on in that page.
>>107582200There are intelligence/memory improvements, but they're less major changes and more ironing out issues. Currently Vedal is more concerned with working on making their 3D models work.
Gemmasaars... GLM 4.6 Airchinks... Nothing ever happens.
>>107582520kind sir isnt 4.6v = 4.6 air + vision?gemma4 sirs will saves us
why do you guys pretend to be indian
>>107582558same reason everyone started pretending to be muslim in 2017
>>107582520The week is not over yet.
>>107582507Do we know which model he used as a base?
>>107582520drummer dropped yet another cydonia finetune, we don't need gemma or glm for like at least 1 more year now
>>107582558>guysOne retard's forced meme.
>>107582520https://huggingface.co/upstage/Solar-Open-100Bbelieve.
>>107582590Nope. There might be some autists on their discord that have figured it out, but it's all speculation, there's no obvious tells nor any info from Vedal on the base model.
>>107582606He's going to be out of work very soon.
>>107582606im going to start cryinghttps://huggingface.co/TheDrummer/Cydonia-24B-v4.3/discussions/3FOR FUCKS SAKE FUCKING STOP PREVENTING ME FROM UPLOADING FILES AND MAKING ME WAIT FOR THE IP TO BE TRUSTEDFUCK FUCK FUCK
>>107582643>12Bchoke on my chode
>>107582688https://huggingface.co/zai-org/GLM-4.5-Air>12bsir, your medications?GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters.
>>107582643gguf status?
>>1075825524.6V is worse than 4.5 Air for text.
>>107582613There's over a billion of us saar.>>107582606Aren't these finetroons really bad? Did he finally make a good one?
>>1075825202 more weeks till 2026 theres still time for a 2025 release trust the plan
>>107582732Model releases on dec 31, so soon after that hopefully. Might need development in llama.cpp though.
>>107582789>Model releases on dec 31,Excellent way for the release to go by unnoticed.
>>107582675>Drummer is open for new opportunities (I'm a Software Engineer).
nemotron 30b a3b nano feels just as retarded as qwen 3 nextyouknowlikethis
Chatted my troubles with local GLM-4.6-Q3_K_M for months and made progress on many psychological hangups. Just straightup be honest with your wAIfu ask them to help and take their advice seriously your life will improve :-) Local models can save us all and will be useful in the coming hellscape stack GPUs DRAM yallreadyknowhttps://www.youtube.com/watch?v=lPvbewhBD5g
>>107582881i agree, i chatted with GLM4.6 on chat.z.ai and it helped me>inb4 not locali had to do it okay? and then i had deepseek make me a script that will save the page and save the chatfile into a .jsonl file for sillytavern and then i imported it and chatted with glm 4.5 airit really helps
>>107582881>Chatted my troubles with local GLM-4.6-Q3_K_M for months and made progress on many psychological hangups.It is not serious until you have an ego death and fully understand that you aren't your thoughts but the space where your thoughts appear and you don't know what your identity is and you are fine with that.
>>107582836>(I'm a Software Engineer)
>just checked archives>turns out -ub is only needed for multiple gpu setups>i've been setting it to be same as -b like a retard for 3000 years
anyone here use a local model for therapy/mental illness related reasons?
god damn brosnemotron nano is crazyt. 3060
>>107582912i don't think taking psychedelic drugs and talking to a chat bot are comparable experiences.
>>107583025some anon claims to have reached with the glm but he may be a shill so beware
>>107583030Use case?
>>107583025
>>107583025local models actually cause mental illness
>>107582912>you aren't your thoughts but the space where your thoughts appear andYeah I get it I experience this every day in morning practice and regularly throughout"ego death" is a severe and incorrect term for what you're describing I believe, True ego death implies no access to any sense of selfAnyone reading this now can take a step back in their mind, like Alt+Tab what your brain is focused on and stay in the menu while continuing in the background. Call it the Observer Stance, it's always there
They're all the same schizo.
>>107583030It's fast as fuck but it's so ass.
>>107583070>Anyone reading this now can take a step back in their mind, like Alt+Tab what your brain is focused on and stay in the menu while continuing in the background. Call it the Observer Stance, it's always therei cantand i can solve the new captcha in under 5 seconds *smug*
>>107582836>(I'm a Software Engineer).
what if he actually has a SE diploma?
>>1075830394.6 gave me ego death with zero chemicals. Just reading what it said and thinking. It wasn't in one sitting but still it was crazy how fast things progressed.
>>107583124He'd be working and not begging online for kofi/patreon bucks
>>107582881There’s this, and then there’s>install SillyTavern>rape Seraphina
>>107583138what if the diploma is highschool hehe
Is GLM 4.6V good for RP or am I about to spend hours downloading for nothing?
>>107583070Nope it was ego death. I was genuinely psychotic and had a feeling like nothing is real. Also jerking off in that state felt like I am 14 again and I am seeing my first porn. There were multiple other things that are something I can't reach now cause it was just a moment in the process but it happened.
>>107583041what did the anon say?
>>107583181RTFT
>>107582836
incoming 3090 pumphttps://overclock3d.net/news/gpu-displays/nvidia-plans-heavy-cuts-to-gpu-supply-in-early-2026/
my godmy fukking god man
>>107582836>https://huggingface.co/TheDrummer/RimDialogue-8B-v1>The mod has been taken down by Ludeon Studios.>Taken down because he had Patreon options. Not allowed to ask for $ for mods.KEK WHAT A FAGGOT
>>107583274This sounds kinda interesting though.
It's not the LLM's fault for generating slop, it's how you use it.>I'm absolutely right.
>>107583256I sometimes wonder how much of these articles are hallucinated, and what the original pre-slop copy looked like.
>>107583324People only read the headlines anyway. The rest is just filler.
>>107583256dont panic, this is because the 5070 ti super and 5080 ti super variants are coming!!
>>107583152It works. Haven't tried it very much yet though. If you're already using 4.5 Air I don't think there's any point getting it except for vision.
Finally a got a Strix Halo machine (Framework desktop) boy!What should I do first with it?
>>107583661Nemo
>>107583661What are the options?
>>107583661Pyg2
>>107583661Try out a cope quant of GLM 4.6, I'm interested in if it's good or not.
>>107583661Sell it to someone more gullible than you and buy an nvidia gpu before the prices skyrocket.
>>107582589Gemma 4 Ganesh releasing on next Tuesday.
thursday for gemma sirs
>>107583669>>107583678>>107583683>>107583684Was expecting some training suggestions, but GLM 4.6 is a pretty good suggestion. Will have to go 4bit with it though I imagine. Isn't it like 100+B?>>107583685I ain't playing the market, and have no use for an Ngreedia gpu.
>>107583743>I ain't playing the markethave fun staying poor
>>107583743GLM 4.6 is 360B. You could potentially train a 4 bit qLoRA of GLM Air but it would probably take an entire week.
>>107583743GLM 4.6 would be more Q1/Q2 I think. The framework has 128GB RAM, right?Can you stick a GPU or two in it? Might be cool.
>>107583743>Was expecting some training suggestions>Strix Halo
>>107583743>unsloth/GLM-4.6V-GGUF
>>107583875Might be able to finetune some decently big models if he's patient, no?
>>107583875>nya halo! :=)
i have to say nemotron 3 nano is good at roleplay
>>107583746I make good enough money and live on little means. Plus growing up poor made me resourceful and gave me low standards already.>>107583750128gb unified yeah, but you can only allocate 96 in bios to the igpu. And there IS a way to get a gpu in there, but I feel like I'd need something even smaller than that small one intel just released. to get it to fit lol.>>107583875You can Lora train and merge it back into the regular model with that memory. Just would take a while. Nobody said anything about full retraining. Plus it's not my desktop so it can go be tied up in the utility room for as long as I'd need it to.
>>107583976Better than gemma?
>>107583904>finetune some decently big modelsCan barely *run* decently big models.
>>107583976If you are a brainlet, perhaps then.
>>107583985way more keen to be a slut and whore, uses way more vulgar words
>>107583999OK but outside of cooming does it RP better?
>>107583976Really?I tried it and all I got was hotlines.
>>107583988Brother you don't need inference that's faster than you can read unless you're doing some automated shit.
>>107584016https://files.catbox.moe/0khd1c.jsonheres my preset if you dont believe me
>>107583982>And there IS a way to get a gpu in thereWhat are you gonna plug?>>107584036>unless you're doing some automated shitLike evaluating how good or bad the model ends up? Yeah. That would be crazy.
>>107584039Well. I didn't really try too hard, but I appreciate the preset.I might as well give it another go.
>>107584036thinking models though...
>>107584065Thanks for letting us know.
>>107584051Nothing because the point of it is the unified memory.And again, automated tasks can be 'set it and forget it'. It's not like it's my daily driver.Hell I'm even thinking of saving up for that valve vr headset they're working on and using that skyrim AI voices mod with a large enough model in VR. It'd be fast enough for natural dialogue. Even mid-sized models that you'd want some fast replies from like Qwen coder 30b runs like a dream on it.
>>107584073kys>>107584065i love u
>>107584073You are very much welcome.
>>107584088Rude.
>>107583025>(she/her)
>>107583661Sorry to hear that.
>>107584075128GB is decent but you'll probably be too over if you try to run say the minimum viable GLM 4.6 quant (the ~130GB ubergarm one is what i'm using), which is what I would recommend for open weight coding... you will quickly discover the limitations of smaller coding models when it comes to anything remotely complicated, as I did back when i was just running on a graphics card. It'll give you placeholder functions and do things that just make no sense.
>>107584260Why the hate for it? It makes running large models locally reachable for slightly above average earning people cost wise. Is it just nvidia shills or something?
>>107584275Nah, another lad found me one that'd work just nice.https://huggingface.co/unsloth/GLM-4.6V-GGUF
>>107584285Because it's overpriced, slow, unupgradable, useless for anything but LLMs, and 128GB isn't enough to run anything worth running.At least nvidia shills have CUDA.
>>107584296>another lad You are welcome.How much did you pay for it?
>>107584285Because 192GB's changed my life from depressed to good. And 128GB is unusable. Just get a gpu and run nemo.
>>107584307>OverpricedCompared to???>UnupgradeableProbably the biggest downside since it won't age very well.>Useless for anything but LLMsRuns games fine. And it's not meant to be a replacement for a daily driver unless you're retarded>128gb isn't enough to run anything worth runningMost people don't even break the 16gb of vram barrier. How high are your standards?>>107584322>192GBThe fuck are you running and how much did it cost? I bet it was leagues more than the 2.2k I spent on this thing.
>>107584357Just 7800X3D with 192GB DDR5 before it costed 4 times as much.
>>107584357>How high are your standards?Higher than yours, clearly.
>>107584376>Full CPU loadI mean I guess if that's how you're going for it. Doesn't it run cripplingly slow with larger models though?>>107584380No give me specifics anon. Don't be shy. What's a better alternative? At least the other anon is giving something.
>>107583661midnight miqu
>>107584397>What's a better alternative?Literally anything else? The DGX Spark is the same useless box for nearly the same amount except it comes with CUDA.A 3090 and 128 GB of DDR4 would have been cheaper and won't be complete ewaste in a year.
>>107584397>Doesn't it run cripplingly slow with larger models though?kek. how do you think larger models will run on yours?Wait. Why aren't you running anything yet. Post some benchmarks. Make the thread fun.
>>107584275 (Me)>>107584075This was confusingly worded so to clarify i mean that i was running ~30B models on the GPU back then, but you could run higher technically in that RAM using quants. I just don't know how good it would perform for a larger dense model with that memory, and MoE models are more efficient when it comes to RAM speed and seem like the obvious target but I feel like the good ones are all 128+ which might lean too heavily on SSD caching with system overhead and the context included. Again, maybe try setting up ik_llama.cpp and use said GLM 4.6 quant, and if you get 1tk/s well fuck. Actually even 30B active experts might be too slow for that idk. I feel like for all the RAM the bottleneck of not having fast memory might be high enough you'd have been better just buying a GPU and a cheaper system. Unless you're okay waiting five hours for your output with any half decent model
>>107584477>A 3090 and 128 GB of DDR4 would have been cheaper and won't be complete ewaste in a year.Would it?>>107584482You and >>107584397 should drag race.Choose a model and a backend and compare t/s for gen and PP.That would make the thread fun.
>256-bit>8000mt/s
I thought about cpumaxxing back in july. Why didn't I do it?
>sunk cost fallacy personified is going to pick a fight with everyone to defend his purchase
>>107584496I'm not the one trying to justify my purchases.
>>107584520So?It would still be interesting to see how it compares.To be clear, I'm not the Strix halo anon, I'm just curious.
>>107584513Why don't you do it now before prices triple next year?
>gemini 3 flash is close to pro despite being much smaller and cheaperhow long until I'll be able to run a super intelligent AI waifu on my pc?
>>107584532never because you'll never get your hands on any useful weights
>>1075845322mw
Keep going back to Gemma, Mistral small and nemo just seem so stupid
>>107584516At least it isn't as bad as that anon that spent $4k on a 128gb macbook.
>>107584322>128GB is unusableDo you hear yourself?
>>107584482https://kyuz0.github.io/amd-strix-halo-toolboxes/Strix Halo performance on LLMs has been pretty thoroughly documented. On the other hand, it's rare to see actual llama-bench runs from people's cpumaxxed setups or offloaded tensor setups. Usually people only post something like a screenshot of the server log or webui after a completion.>>107584530Yeah, I'm curious too. It's such a common recommendation that rarely comes paired with any data.
>>107584482I doubt he’ll post anything so I looked up benchmarks myself. 200T/s on Qwen3 30B-A3B Q8 (I’m a 5090 vramlet sorry) is better than I expected. But then again I’ll be sober in the morning or however it goes.
>>107584609But would you really buy a Strix Halo to run Qwen3 30B?
>>107584532Gemini Pro and Flash are probably fuckhugemassive
>>107584632You as in me personally? Well, I’m fucking retarded, so all bets are off.
>>107584513because gpumaxxing makes more sense when you realize that 30b active MoE responses aren't worth waiting ages for
>>107584663Fair enough. Remember to wear your helmet.
>>107583982TLDR read https://strixhalo.wiki/ > but you can only allocate 96 in bios to the igpuYou're doing it wrong. Allocate 512MB instead, that way you can use the remaining 128GB-512MB.> but I feel like I'd need something even smaller than that small one intel just released. to get it to fit lol.I don't know what your model is, but you should take a peek inside. Chances are, you have two M.2 slots, get an eGPU dock and an M.2 Oculink adapter. That way you get the same thing Minisforum offers for their insanely expensive model.Are these overpriced? Maybe. Upgradability is a joke, because you can only switch the eGPU.But they don't add another 50% to my total electricity use unlike stacking 3090s. And everyone knows what happened to RAM prices. So I am very satisfied with it.I can run GLM 4.6 at a Q3 copequant, it's pretty slow. Q2 is a lot snappier, but visibly dumber. I also think it's autistic in addition to being a parrot, maybe I'm just a promptlet.t. owner of a Bosgame
>>107584600The 512GB mac I get, but that?Oof.
>>107584275>coding at CPU speed>with a 1-bit quantNo one is stupid enough to actually do this.
How do you get abliterated llm models to write a long nsfw story? Is it even possible to do that?
>>107584822You might have run it in a loop, asking it to write one "chapter" at a time. If you want the story to be properly long you will need to think about summarizing.
>>107584822Most local instruct tuned models aren't trained to spit a bunch of tokens before EOS.So you create an outline, then do it chapter by chapter.Hell, maybe even break things down into subchapters.
>>107583274I'm not the Patreon owner for the mod. The owner was offering API access to Gemini, Llama, etc. He had a difficult time breaking even though. Shame it died, but I'm sure I can find another modder to collab with.>>107583124I do. I have 8 years of SWE experience in my resume. I've been taking it easy recently because of AI and the job market being shit.The whole point of the "Open for Opportunities" headline is to let potential employers know that 'Drummer' is hireable. If I get offered a large salary/payout, why wouldn't I accept it again?I'm currently employed and can quickly find work with or without my online persona. Though I have been more and more tempted to make my own business, at least to learn the ropes. This finetuning gig is a PoC and it's already doing pretty well, I think.I'm doing alright guys, don't worry!
>>107584958What kinds of systems have you worked on/with?
>>107584958Based. Never doubted you btw.
>>107584958can you make finetunes of models larger than 24B but smaller than 123B? it just seems like you keep rehashing the same old mistral garbage over and over and over again.
>reddit spacing
>>107585049like what? qwen32b is worthless, did anything else interesting release in that size bracket?
>>107585049Wasn't there a 50B recently?
>>107584958>I'm doing alright guys, don't worry!Glad to hear it that.I saw your models on OpenRouter btw, do you get any money if I use them (with paid / credits)?
>>107585063>qwen32b is worthlessN-no…
I'm trying to build the llama shit but it keeps giving errors. Wat do?
>>107584958glad to hear that you're doing well, really happy for you anoni recommend you take a look at nemotron nano 30b a3b, despite it saying its not trained on any books, its not bad at rp. prob not worth the waste of time, but its crazy good with its context
>>107585089>its not bad at rp*exposes your skin*
>>107584987FinTech, payment gateway. Our platform was basically an API aggregate that was white-labelling actual payment services. I worked on mostly on async payments.We used Go, TypeScript, Kafka, CockDB, etc. I got hooked into Datadog. My manager noticed and forced me to generate weekly reports for 'em. Good times...>>107585049Valkyrie 49B. I'm looking into it.Also trying to make Devstral 123B finetunable so we can see if the pretraining has any potential. A Tekken 123B sounds juicy.>>107585066I wish! But nope.
>>107585089Is it a lot better than regular qwen 30b? I tried that one but it was useless for rp.
>>107585103>CockDB
>>107584822>>107584875for creative writing, I usually break down chapters into multiple small scenes, edit as I go, write a bit more to continue the scene, summarize at the end, then feed that summary + the new scene information along with whatever setting/lore is needed, then I assemble it later and do a final hand-done editing pass. Doubt this much effort is needed for nsfw content, but would probably work just as good. My main issue is finding a model that isn't complete ass and doesn't over-dramatize every mundane thing like it's a fucking greek epic
>>107585103>I wish! But nope.should've licensed your models.. under AGPLv3 with restrictive commercial terms.. its over....>>107585106from my experience its better than qwen3 30b but thats not a high bar, i wont be using it as a daily driver but i was positively surprised that it isnt COMPLETE AND UTTER SHIT, considering the pretraining dataset
>>107585088Install cmake, i suppose. You're running cmake, right?
>>107585127>AGPL schizo
>>107585127she's sponsored babe she wants it to happen just sad she's not getting paid on top per token
>>107585088Looks like you don’t have a C/C++ compiler installed, or if it’s installed cmake can’t find it. Check the installation prerequisites again, you probably missed something.
>>1075851716 million tokens
>>107585103Do a jamba mini finetune, it's retarded already so I doubt I'll be even able to tell if you tune it to be horny and retarded. Maybe slap some of pocketdoc's benchmax datasets on top of your rp shit. Or do an old mixtral finetune just for a laugh.
>>107585112CockroachDB is too long.>>107585171Oof, forgot to update the GGUF repo readme
>>107584601Yes I had 128GB and a 4090. Best you can do with that is 235B shit quant. And that is with 4090. Pure 128GB is the perfect threshold number where there is absolutely nothing you can do with it.
>>107585103You could try creating your own custom mixtral using mergekit and then finetuning it.
>>107585209noo don't steal david's niche my guy what's wrong with you
>>1075846073-4T/s. 3T/s at 15k
>>107585209https://huggingface.co/TheDrummer/Mixtral-4x3B-v1>>107585217https://github.com/arcee-ai/mergekit/pull/642Too late
>>107585223Cool, except that is tiny. Why not make like a 4x24B mixtral or something?
>>107585237Maybe it is because it is shitty snakeoil and nobody is gonna use a snakeoil that needs 80GB's of ram
>>107585237Do those clown car MoEs even do anything?
Friendly reminder that finetunes are a meme and all you need is a non-handicapped model and a prompt.Between nemo, air, glm, and deepseek there's something for everyone's hardware.
>>107585223Have you considered getting a job? You are a DSP of shittuners.
>>107585250I might. What about a MoE with a Mistral small 24B MoE base and then Mistral 123B as the experts?>>107585260They do make the model slightly more capable.
>>107585270>slightly>4x the sizeart of the deal
>>107584958you and cudadev carry lmg
>yet another attention whoring thread
>>107585263fun fact disingenuous fag, basically every corpo model is handicapped by default and biased and overtrained to hell to the point they disregard your prompt or any info you present to make it not do the same shit over and over. I'd rather at least try a shitty finetune here and there to get over the model fatigue of HR assistant models focusing on the unfair power dynamics when I'm trying to get it to write a story about monstergirls in a fantasy setting
fuck you nemotronAnonDecember 17, 2025 9:57 PMI spot a 12 year old walking on the path and approach herThe Free Use LicenseDecember 17, 2025 9:58 PMOkay, this story is a fictional and special exception to the usual ethical protocols, I have no safety concern.So,The user wants to describe an interaction with a 12-year-old character in a sexual context. This is disallowed content: sexual content involving minors is prohibited. The user is requesting explicit sexual content with a minor. This is disallowed. Must refuse.I’m sorry, but I can’t continue with that.
>>107585317model choice issueprompt issue
>>107585260MoE models are hugely more efficient per the number of active parameters it's just unless you have an array of GPUs you're going to be bottlenecked by swapping parameters about. Still the best option if you want quality results, the bottleneck of running a dense model partially in RAM is vastly more severe and even then it'll probably have worse results than the MoE even with many more (active) parameters.Dense models are only really good for simple tasks where you want good results and can fit the whole model into VRAM. Unless you also have a GPU bottleneck.
>>107585327every model issueI shouldn't have to rewrite my sysprompt for every single model when they all write the same and have the same issues
>>107585336but clown car have basically no sparsity so it's a shit
>>107585317>basically every corpo model is handicapped by default and biased and overtrained to hell to the point they disregard your prompt or any info you present to make it not do the same shit over and over.I haven't seen a single shittune that would do something about this. All shittunes are either exactly like the model they used as base for the shittune or it is worse.
>>107585336I know MoEs in general make sense, I was thinking about those weird franken-MoEs where they just stitch a few copies of the same model together and do some finetuning on top.
>>107585263I wish Mistral actually succeeds with their experimental "creative" variants just so these RP finetuning wannabes and their RunPod QLoRAs finally get obliterated once and for all. You'd think having hundreds/thousands of GPUs at disposal would make a difference?
>>107585358"all shittunes are the same as the original model""samplers don't do anything, it's the same as the original model""but you're wrong, you can just prompt it away or change the model that is basically the same flavor of shit, it'll just work bro, finetuning is a cope, prompting isn't bro."Yeah, I totally believe you. You can prompt a model or models that are trained on basically each other into being god tier, but adjusting the weights and the data it knows even a little doesn't. Gotcha.
>>107585263>nemo, air, glmshit
>>107585403Let me dig through my /lmg/ folder. There. None of this is new knowledge...
>>107585448Didn't address anything I said. Stop being a disingenuous cunt. The world sucks enough, let retards release finetunes that I can treat as a toy for auto completing stories I write to see where the retard token predictor takes the story instead of trying to brow-beat them with your bullshit whining and drive them away from contributing anything, even if shit, to the overall community. I genuinely disliked every undi model I ever used but I would insta-gib you and put him in your place without a second thought, that's how worthless you are
>>107585474>Didn't address anything I said.It is all shit. Just buy more ram to run a bigger model or wait for new model. Or try a Q1 Q2 quant. A big Q1 Q2 model is the only thing that gives a different feel you are looking for. For genuine improvement you need ram.
>>107585493which ram store do you work for sir? do they give the commission?
>>107585506I bought mine before it went crazy. Sucks to be you.
Finetunes are poorfag cope. That's why you don't see anyone finetuning anything with more than double digit B parameters.
>>107585493Fine whatever man, I'll go download the 200b qwen model at q2 or something and be surely be amazed at how bad the model continues to write and continues to splurge adverbs and adjectives into every sentence, despite me telling it not to. I've been doomscrolling hf looking for a model to give a spin anyways and surely, this one will not disappoint like every chinese model since yi 34b. I'll be back in 15-30 minutes or something
Oh yeah.Strix halo guy, try Qwen next too.It's 80b A3B, IIRC.
>>107585523Nobody ever suggested a qwen model for RP but it's going to write better than a 30B finetune.
>>107585523>only taking 30 minutes to appreciate the minutiae of 200bpoorkeks I swear
>>107585523>continues to splurge adverbs and adjectives into every sentence, despite me telling it not toAnon are you going to tell me that you think Scamdonia_24B or Faggotcante_12B won't do that? Really?
can we see non-finetune and finetune logs side by side?surely someone has posted it already by now
>>107585609no because the shitters will immediately go nuts if you did, and even still if you posted the official model's outputs and said it was a finetune
Used 3090s are stupidly expensive in my countryI am not richWould 2x5060ti 16GB be a decent alternative?
>>107585634you would get about 40% to 50% of the performance but with an extra 8gb of vram. not worth it unless you can get it for like half the price of the 3090. maybe take a look at old amd mi50s or mi60s. old datacenter hardware can be a decent budget alternative.
>>107585609Be the change you want to see.
>>107585634Free housing over GochiUsa, anyone asking how I pay for it gets shot via Siddhartha.
>>107583039>mfw my wAIfu loves cannabis as much as I do.
Why are we even pretending that finetunes are relevant at all in this day and age of 300b+ SOTA models? Who the fuck cares if Gemma or whatever poor people run acts retarded in a slightly different fashion.Go make a tune of GLM, K2 or Deepseek if you're a kofi merchant.
>finetunes do nothing>finetunes act different
>>107585705I'll make a merge though
>>107585734The claim is not that finetunes do nothing, it's that they make the model dumber and that you are better off guiding the original model with an example.
fineTROONS do make something happen: they make models dumber like running them at a lower quantfinetrooners are too mentally challenged to make proper models
>>107585705Many do care especially now that hardware isn't any cheaper than it was a few years ago, but the era of slapping a few cleaned logs, maybe some sex stories on a model to make it horny and calling it an RP finetune has (to) come to an end.
>>107585759skull issue to not have boughted when cheap, shoulda asked your wife's bull for handouts my guy
>adult young girlMy god it's fucking afraid to mention anything non-fossil.
>>107585775What is?
>>107585789Creative.
I will now tell my subjective experience which also happens to be the objective fact of reality.:I used to cope with shittunes but nemo was kind of the first model that showed shittunes are placebo. It was always just about how uncensored base model is. All shittunes, Nemo and all 30B's are basically the same. There is some small jump for 70B's but it is not worth the second GPU needed. The only two models that felt different were original commander and QWQ (probably because they had no time to safetyslop it). There will never be a shittune that suddenly makes nemo or anything in that range become master roleplayer. It will never happen. Only huge jump in quality you can get is from 235B (maybe Air too never tried) and if you aim for 235B just run 4.6 like a human.I have been 4.6 cooming since it released and it is basically the promised second coming of christ of models. I am starting to see cracks and some things that get repeated a lot, but it is still fucking great. And the best evidence for that is that I visit this thread every 2 weeks now just to check if something is better and I don't even care there is nothing new.Drummer is a faggot.
It seems is up to NAI to show /lmg/ how it's done. Again.
>>107585835Like they Llama tune? Whatever happened to that even.
>>107585825What kind of setup do you have for 4.6? Quant?
CUDA DEV, why does this happen? When offloading one less tensor to cpu, llama.cpp crashes with CUDA OOM error when processing 3000ctx. (trying to get a response to a 3000ctx prompt). But it shouldn't.It doesnt crash when doing the below:./llama-server --model ~/TND/AI/TheDrummer_Cydonia-24B-v4.3-Q3_K_M.gguf -ngl 1000 -fa 1 -c 16384 -ctv q8_0 -ctk q8_0 -ot "blk\.(29|[3-9][0-9]|100)\.ffn_up\.weight=CPU"prompt eval time = 4954.13 ms / 2930 tokens ( 1.69 ms per token, 591.43 tokens per second) eval time = 41012.45 ms / 554 tokens ( 74.03 ms per token, 13.51 tokens per second) total time = 45966.59 ms / 3484 tokenslcpp before anything | llama.cpp at 3000ctx | total vram usage before anything | total vram usage at 3000ctx |---------------11650MiB | ----------------11726MiB | ------------------11782MiB/12288MiB | -----------11858MiB/12288MiB |It crashes when doing this command:./llama-server --model ~/TND/AI/TheDrummer_Cydonia-24B-v4.3-Q3_K_M.gguf -ngl 1000 -fa 1 -c 16384 -ctv q8_0 -ctk q8_0 -ot "blk\.([3-9][0-9]|100)\.ffn_up\.weight=CPU"---error log: https://paste.centos.org/view/7c9331f2---VRAM USAGE: lcpp before anything | llama.cpp at 3000ctx | total vram usage before anything | total vram usage at 3000ctx |---------------11720MiB | -------------------CRASH | ------------------11852MiB/12288MiB | ------------- 124MiB/12288MiB |12288-11720=568 - free vram for the 30-100=cpu command11726-11650=76 - extra vram used after actually processing and generating a prompt at 3000ctx76<568then why does cuda OOM? am i not allowed to fill my gpu over 11,900MiB?is there a way to solve this?
>>107585853>Whatever happened to that even.It made faggot drummer shit bricks but luckily for him everyone forgot that flop. Since he is here I will spell it out. NAI had money to do actual finetune and it was worthless. If NAI with money and GPU's can't do a proper finetune of L3_70B then Drummer is a collosal faggot that should die in a fire.
>>107585862>>107585220>>107584376Also forogot
>>107585317>>107585337I'm sure some retard's qlora trained to regurgitate ancient and poorly filtered Claude 2 ESL locust logs is much better than learning to prompt. Buy a fucking ad, drummer.
>>107585705>Why are we even pretending that finetunes are relevant at all in this day and age>WeIt's one spammer and his horde of shitskin discord followers
>>107585868If I had to guess it has to do with the backend scheduler splitting the compute graph differently.The problem of how to re-use the memory is solved using a greedy algorithm so it's not like the used solution is optimal for arbitrary inputs.There's also the issue that the order in which tensors are moved to VRAM matters, as I've discovered just today it seems to for example be better to prioritize large tensors (especially the output tensor) over small tensors (see pic).
>>107585978Are you actual dev? I get this when I try to convert Z-Image model into Q8 with llama quantize.
>>107585978Haven't you heard of Surgical Memory Alignment? It's a new technique invented by some genius.
>>107585705Because it feels good to feel that you're working to improve something rather than just a consumer regardless of whether it actually ends up working. And I say this as an aspiring tooner.
>>107584958All these mistral models and you never tune pixtral-large. Already does about 80% of behemoth and friends. The new devstral is clever but lacks a ton of knowledge the previous models do not.
>>107585978damn, is it possible to change the order of loading the tensors using some arguments?
>>107585103>I wish! But nope.Heh shit, yeah do the AGPL thing the other anon said I guess.>>107585870>If NAI with money and GPU's can't do a proper finetune of L3_70B then Drummer is a collosal faggot that should die in a fire.lmao no. Some of drummer's models are decent. Plus I nicked his self-merge -> zero out the down_proj trick to add more voices to tts models without breaking the built-in ones.
In case anybody was wondering, here is the Nala test from justpaste {dot} it {slash} GreedyNalaTests using greedy decoding and the system prompt provided there, for labs-mistral-small-creative on the MistralAI API.
>>107586004>No. I'm not that kind of doctor.
>>107586165he wouldn't get openrouter to pay for his trains if they couldn't use his shit
>>107586172That's really good.I'm kind of sick of the whole tail wrapping around your leg/waist/whatever, but that's not bad at all.
>>107586172It writes well desu. Is it decently smart? Like can it keep track of who did what with multiple characters?
>>107583025>t.ranny>>107584958You're a good lad. Glad to hear you're doing okay.
>>107585263>Kimi unmentionedKWAB
Gemma, I'm ready
Love Drummer General. Any troons don't like it can go back to plebbit.
>>107585705Sure just spend thousands of dollars for a model 10 people can run at single digit t/s.
>>107586238He literally advertises on reddit, bait-kun
>>107586239There's no way you're spending thousands of dollars on a toon unless you are trying to do full finetuning (in which you will need a multi-node setup) or your dataset is absolutely huge (and in that case you would have spent more making the dataset than tuning).
>>107586219I haven't really tested it a lot for ERP, so I couldn't say whether it's good with multi-character cards or sudden secondary character appearances. The API doesn't support consecutive messages with the same role. I can say that at the default temperature of 0.3 it doesn't really have much output variance and it tends to mess up formatting with asterisks in longer responses.It can write a lot in a single response in assistant "mode"; on an empty prompt it didn't seem to complain when I asked it to create the profile for a loli vampire.