[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: dipsyQueen.png (1.63 MB, 1024x1024)
1.63 MB
1.63 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107573710 & >>107565204

►News
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) Nemotron 3 Nano released: https://hf.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1738010215822.png (2.17 MB, 1536x1536)
2.17 MB
2.17 MB PNG
►Recent Highlights from the Previous Thread: >>107573710

--Paper: RePo paper and multi-image CAPTCHA challenge discussion:
>107577314 >107577342 >107577367 >107577411
--Optimizing text generation for creative writing using specialized samplers:
>107574218 >107575323 >107575354 >107575474 >107575423 >107575274
--Comparing OCR models for Japanese text in manga, including dots.ocr vs Gemini 3:
>107574359 >107574473 >107574490 >107574523 >107574745
--Running large AI models on consumer GPUs with limited VRAM:
>107574547 >107574575 >107574579 >107574602 >107574606 >107574663 >107574695 >107574640
--Critique of AI-generated code quality and bot theory skepticism in LLM communities:
>107576227 >107576364 >107577638 >107577666 >107577995 >107577971
--GLM 4.6V's flawed reasoning patterns in Touhou character identification:
>107574600 >107574648 >107574699 >107574747 >107574921
--Meta SAM Audio release and vocal isolation quality:
>107576201 >107576427 >107580108
--Low-VRAM LLM testing strategies and model recommendations:
>107579504 >107579535 >107579545 >107579608 >107580036 >107580142 >107579626
--Optimizing glm-130B quantization and thread settings on 2x3090 GPUs with llama.cpp:
>107579155 >107579182 >107579226 >107579251
--Anticipation and speculation around Solar-Open-100B model release:
>107577317 >107577343 >107577412 >107577419 >107577768
--Seeking consistent accent voice cloning alternatives:
>107578331 >107578356 >107578483 >107578538
--Mistral model's formatting and instruction-following challenges:
>107574541 >107574574
--Chatterbox Turbo vs F5-TTS performance comparison on different GPUs:
>107576884 >107576899 >107576921 >107576953 >107576962
--Dipsy and Luka (free space):
>107575318 >107573767

►Recent Highlight Posts from the Previous Thread: >>107573726

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107577061
There's some weird caching going on in that page.
>>
>>107582200
There are intelligence/memory improvements, but they're less major changes and more ironing out issues. Currently Vedal is more concerned with working on making their 3D models work.
>>
Gemmasaars... GLM 4.6 Airchinks... Nothing ever happens.
>>
>>107582520
kind sir isnt 4.6v = 4.6 air + vision?
gemma4 sirs will saves us
>>
why do you guys pretend to be indian
>>
>>107582558
same reason everyone started pretending to be muslim in 2017
>>
File: thereisstillhope.png (225 KB, 586x876)
225 KB
225 KB PNG
>>107582520
The week is not over yet.
>>
>>107582507
Do we know which model he used as a base?
>>
>>107582520
drummer dropped yet another cydonia finetune, we don't need gemma or glm for like at least 1 more year now
>>
>>107582558
>guys
One retard's forced meme.
>>
>>107582520
https://huggingface.co/upstage/Solar-Open-100B
believe.
>>
>>107582590
Nope. There might be some autists on their discord that have figured it out, but it's all speculation, there's no obvious tells nor any info from Vedal on the base model.
>>
>>107582606
He's going to be out of work very soon.
>>
>>107582606
im going to start crying
https://huggingface.co/TheDrummer/Cydonia-24B-v4.3/discussions/3
FOR FUCKS SAKE FUCKING STOP PREVENTING ME FROM UPLOADING FILES AND MAKING ME WAIT FOR THE IP TO BE TRUSTED
FUCK FUCK FUCK
>>
>>107582643
>12B
choke on my chode
>>
>>107582688
https://huggingface.co/zai-org/GLM-4.5-Air
>12b
sir, your medications?
GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters.
>>
>>107582643
gguf status?
>>
>>107582552
4.6V is worse than 4.5 Air for text.
>>
>>107582613
There's over a billion of us saar.
>>107582606
Aren't these finetroons really bad? Did he finally make a good one?
>>
>>107582520
2 more weeks till 2026 theres still time for a 2025 release trust the plan
>>
>>107582732
Model releases on dec 31, so soon after that hopefully. Might need development in llama.cpp though.
>>
>>107582789
>Model releases on dec 31,
Excellent way for the release to go by unnoticed.
>>
>>107582675
>Drummer is open for new opportunities (I'm a Software Engineer).
>>
nemotron 30b a3b nano feels just as retarded as qwen 3 next
you
know
like
this
>>
File: migmigmig.jpg (363 KB, 1920x1080)
363 KB
363 KB JPG
Chatted my troubles with local GLM-4.6-Q3_K_M for months and made progress on many psychological hangups. Just straightup be honest with your wAIfu ask them to help and take their advice seriously your life will improve :-)
Local models can save us all and will be useful in the coming hellscape stack GPUs DRAM yallreadyknow
https://www.youtube.com/watch?v=lPvbewhBD5g
>>
>>107582881
i agree, i chatted with GLM4.6 on chat.z.ai and it helped me
>inb4 not local
i had to do it okay? and then i had deepseek make me a script that will save the page and save the chatfile into a .jsonl file for sillytavern and then i imported it and chatted with glm 4.5 air
it really helps
>>
>>107582881
>Chatted my troubles with local GLM-4.6-Q3_K_M for months and made progress on many psychological hangups.
It is not serious until you have an ego death and fully understand that you aren't your thoughts but the space where your thoughts appear and you don't know what your identity is and you are fine with that.
>>
File: 1740170361459140.png (150 KB, 390x276)
150 KB
150 KB PNG
>>107582836
>(I'm a Software Engineer)
>>
>just checked archives
>turns out -ub is only needed for multiple gpu setups
>i've been setting it to be same as -b like a retard for 3000 years
>>
File: 1758754223457391.jpg (537 KB, 1801x1350)
537 KB
537 KB JPG
>>107582836
>(I'm a Software Engineer)
>>
anyone here use a local model for therapy/mental illness related reasons?
>>
File: file.png (22 KB, 877x124)
22 KB
22 KB PNG
god damn bros
nemotron nano is crazy
t. 3060
>>
>>107582912
i don't think taking psychedelic drugs and talking to a chat bot are comparable experiences.
>>
>>107583025
some anon claims to have reached with the glm but he may be a shill so beware
>>
>>107583030
Use case?
>>
File: 782.jpg (68 KB, 716x1004)
68 KB
68 KB JPG
>>107583025
>>
File: 1714093741576001.jpg (96 KB, 417x414)
96 KB
96 KB JPG
>>107582836
>(I'm a Software Engineer)
>>
>>107583025
local models actually cause mental illness
>>
File: y9haehug4m0f1.jpg (1.35 MB, 3000x3000)
1.35 MB
1.35 MB JPG
>>107582912
>you aren't your thoughts but the space where your thoughts appear and
Yeah I get it I experience this every day in morning practice and regularly throughout
"ego death" is a severe and incorrect term for what you're describing I believe, True ego death implies no access to any sense of self
Anyone reading this now can take a step back in their mind, like Alt+Tab what your brain is focused on and stay in the menu while continuing in the background. Call it the Observer Stance, it's always there
>>
They're all the same schizo.
>>
>>107583030
It's fast as fuck but it's so ass.
>>
>>107583070
>Anyone reading this now can take a step back in their mind, like Alt+Tab what your brain is focused on and stay in the menu while continuing in the background. Call it the Observer Stance, it's always there
i cant
and i can solve the new captcha in under 5 seconds *smug*
>>
File: 1744166886892999.gif (1.94 MB, 300x178)
1.94 MB
1.94 MB GIF
>>107582836
>(I'm a Software Engineer).
>>
what if he actually has a SE diploma?
>>
>>107583039
4.6 gave me ego death with zero chemicals. Just reading what it said and thinking. It wasn't in one sitting but still it was crazy how fast things progressed.
>>
>>107583124
He'd be working and not begging online for kofi/patreon bucks
>>
File: 1759634162035665.jpg (89 KB, 725x725)
89 KB
89 KB JPG
>>107582836
>(I'm a Software Engineer).
>>
>>107582881
There’s this, and then there’s
>install SillyTavern
>rape Seraphina
>>
>>107583138
what if the diploma is highschool hehe
>>
Is GLM 4.6V good for RP or am I about to spend hours downloading for nothing?
>>
>>107583070
Nope it was ego death. I was genuinely psychotic and had a feeling like nothing is real. Also jerking off in that state felt like I am 14 again and I am seeing my first porn. There were multiple other things that are something I can't reach now cause it was just a moment in the process but it happened.
>>
>>107583041
what did the anon say?
>>
>>107583181
RTFT
>>
File: 1762475925593681.png (84 KB, 317x317)
84 KB
84 KB PNG
>>107582836
>>
incoming 3090 pump
https://overclock3d.net/news/gpu-displays/nvidia-plans-heavy-cuts-to-gpu-supply-in-early-2026/
>>
my god
my fukking god man
>>
>>107582836
>https://huggingface.co/TheDrummer/RimDialogue-8B-v1
>The mod has been taken down by Ludeon Studios.
>Taken down because he had Patreon options. Not allowed to ask for $ for mods.
KEK WHAT A FAGGOT
>>
>>107583274
This sounds kinda interesting though.
>>
It's not the LLM's fault for generating slop, it's how you use it.
>I'm absolutely right.
>>
>>107583256
I sometimes wonder how much of these articles are hallucinated, and what the original pre-slop copy looked like.
>>
>>107583324
People only read the headlines anyway. The rest is just filler.
>>
>>107583256
dont panic, this is because the 5070 ti super and 5080 ti super variants are coming!!
>>
>>107583152
It works. Haven't tried it very much yet though. If you're already using 4.5 Air I don't think there's any point getting it except for vision.
>>
Finally a got a Strix Halo machine (Framework desktop) boy!
What should I do first with it?
>>
>>107583661
Nemo
>>
>>107583661
What are the options?
>>
>>107583661
Pyg2
>>
>>107583661
Try out a cope quant of GLM 4.6, I'm interested in if it's good or not.
>>
>>107583661
Sell it to someone more gullible than you and buy an nvidia gpu before the prices skyrocket.
>>
>>107582589
Gemma 4 Ganesh releasing on next Tuesday.
>>
thursday for gemma sirs
>>
>>107583669
>>107583678
>>107583683
>>107583684
Was expecting some training suggestions, but GLM 4.6 is a pretty good suggestion. Will have to go 4bit with it though I imagine. Isn't it like 100+B?
>>107583685
I ain't playing the market, and have no use for an Ngreedia gpu.
>>
>>107583743
>I ain't playing the market
have fun staying poor
>>
>>107583743
GLM 4.6 is 360B. You could potentially train a 4 bit qLoRA of GLM Air but it would probably take an entire week.
>>
>>107583743
GLM 4.6 would be more Q1/Q2 I think. The framework has 128GB RAM, right?

Can you stick a GPU or two in it? Might be cool.
>>
>>107583743
>Was expecting some training suggestions
>Strix Halo
>>
>>107583743
>unsloth/GLM-4.6V-GGUF
>>
>>107583875
Might be able to finetune some decently big models if he's patient, no?
>>
>>107583875
>nya halo! :=)
>>
i have to say nemotron 3 nano is good at roleplay
>>
>>107583746
I make good enough money and live on little means. Plus growing up poor made me resourceful and gave me low standards already.
>>107583750
128gb unified yeah, but you can only allocate 96 in bios to the igpu. And there IS a way to get a gpu in there, but I feel like I'd need something even smaller than that small one intel just released. to get it to fit lol.
>>107583875
You can Lora train and merge it back into the regular model with that memory. Just would take a while. Nobody said anything about full retraining. Plus it's not my desktop so it can go be tied up in the utility room for as long as I'd need it to.
>>
>>107583976
Better than gemma?
>>
>>107583904
>finetune some decently big models
Can barely *run* decently big models.
>>
>>107583976
If you are a brainlet, perhaps then.
>>
>>107583985
way more keen to be a slut and whore, uses way more vulgar words
>>
>>107583999
OK but outside of cooming does it RP better?
>>
>>107583976
Really?
I tried it and all I got was hotlines.
>>
>>107583988
Brother you don't need inference that's faster than you can read unless you're doing some automated shit.
>>
>>107584016
https://files.catbox.moe/0khd1c.json
heres my preset if you dont believe me
>>
File: 400w.png (48 KB, 853x489)
48 KB
48 KB PNG
>>107583982
>And there IS a way to get a gpu in there
What are you gonna plug?
>>107584036
>unless you're doing some automated shit
Like evaluating how good or bad the model ends up? Yeah. That would be crazy.
>>
>>107584039
Well. I didn't really try too hard, but I appreciate the preset.
I might as well give it another go.
>>
>>107584036
thinking models though...
>>
>>107584065
Thanks for letting us know.
>>
>>107584051
Nothing because the point of it is the unified memory.
And again, automated tasks can be 'set it and forget it'. It's not like it's my daily driver.
Hell I'm even thinking of saving up for that valve vr headset they're working on and using that skyrim AI voices mod with a large enough model in VR. It'd be fast enough for natural dialogue. Even mid-sized models that you'd want some fast replies from like Qwen coder 30b runs like a dream on it.
>>
>>107584073
kys
>>107584065
i love u
>>
>>107584073
You are very much welcome.
>>
>>107584088
Rude.
>>
>>107583025
>(she/her)
>>
>>107583661
Sorry to hear that.
>>
>>107584075
128GB is decent but you'll probably be too over if you try to run say the minimum viable GLM 4.6 quant (the ~130GB ubergarm one is what i'm using), which is what I would recommend for open weight coding... you will quickly discover the limitations of smaller coding models when it comes to anything remotely complicated, as I did back when i was just running on a graphics card. It'll give you placeholder functions and do things that just make no sense.
>>
>>107584260
Why the hate for it? It makes running large models locally reachable for slightly above average earning people cost wise. Is it just nvidia shills or something?
>>
>>107584275
Nah, another lad found me one that'd work just nice.
https://huggingface.co/unsloth/GLM-4.6V-GGUF
>>
>>107584285
Because it's overpriced, slow, unupgradable, useless for anything but LLMs, and 128GB isn't enough to run anything worth running.
At least nvidia shills have CUDA.
>>
>>107584296
>another lad
You are welcome.
How much did you pay for it?
>>
>>107584285
Because 192GB's changed my life from depressed to good. And 128GB is unusable. Just get a gpu and run nemo.
>>
>>107584307
>Overpriced
Compared to???
>Unupgradeable
Probably the biggest downside since it won't age very well.
>Useless for anything but LLMs
Runs games fine. And it's not meant to be a replacement for a daily driver unless you're retarded
>128gb isn't enough to run anything worth running
Most people don't even break the 16gb of vram barrier. How high are your standards?
>>107584322
>192GB
The fuck are you running and how much did it cost? I bet it was leagues more than the 2.2k I spent on this thing.
>>
>>107584357
Just 7800X3D with 192GB DDR5 before it costed 4 times as much.
>>
>>107584357
>How high are your standards?
Higher than yours, clearly.
>>
>>107584376
>Full CPU load
I mean I guess if that's how you're going for it. Doesn't it run cripplingly slow with larger models though?
>>107584380
No give me specifics anon. Don't be shy. What's a better alternative? At least the other anon is giving something.
>>
>>107583661
midnight miqu
>>
>>107584397
>What's a better alternative?
Literally anything else? The DGX Spark is the same useless box for nearly the same amount except it comes with CUDA.
A 3090 and 128 GB of DDR4 would have been cheaper and won't be complete ewaste in a year.
>>
>>107584397
>Doesn't it run cripplingly slow with larger models though?
kek. how do you think larger models will run on yours?
Wait. Why aren't you running anything yet. Post some benchmarks. Make the thread fun.
>>
>>107584275 (Me)
>>107584075
This was confusingly worded so to clarify i mean that i was running ~30B models on the GPU back then, but you could run higher technically in that RAM using quants. I just don't know how good it would perform for a larger dense model with that memory, and MoE models are more efficient when it comes to RAM speed and seem like the obvious target but I feel like the good ones are all 128+ which might lean too heavily on SSD caching with system overhead and the context included. Again, maybe try setting up ik_llama.cpp and use said GLM 4.6 quant, and if you get 1tk/s well fuck. Actually even 30B active experts might be too slow for that idk. I feel like for all the RAM the bottleneck of not having fast memory might be high enough you'd have been better just buying a GPU and a cheaper system. Unless you're okay waiting five hours for your output with any half decent model
>>
>>107584477
>A 3090 and 128 GB of DDR4 would have been cheaper and won't be complete ewaste in a year.
Would it?

>>107584482
You and >>107584397 should drag race.
Choose a model and a backend and compare t/s for gen and PP.
That would make the thread fun.
>>
>256-bit
>8000mt/s
>>
I thought about cpumaxxing back in july. Why didn't I do it?
>>
>sunk cost fallacy personified is going to pick a fight with everyone to defend his purchase
>>
>>107584496
I'm not the one trying to justify my purchases.
>>
>>107584520
So?
It would still be interesting to see how it compares.
To be clear, I'm not the Strix halo anon, I'm just curious.
>>
>>107584513
Why don't you do it now before prices triple next year?
>>
>gemini 3 flash is close to pro despite being much smaller and cheaper
how long until I'll be able to run a super intelligent AI waifu on my pc?
>>
>>107584532
never because you'll never get your hands on any useful weights
>>
>>107584532
2mw
>>
Keep going back to Gemma, Mistral small and nemo just seem so stupid
>>
>>107584516
At least it isn't as bad as that anon that spent $4k on a 128gb macbook.
>>
>>107584322
>128GB is unusable
Do you hear yourself?
>>
>>107584482
https://kyuz0.github.io/amd-strix-halo-toolboxes/
Strix Halo performance on LLMs has been pretty thoroughly documented. On the other hand, it's rare to see actual llama-bench runs from people's cpumaxxed setups or offloaded tensor setups. Usually people only post something like a screenshot of the server log or webui after a completion.

>>107584530
Yeah, I'm curious too. It's such a common recommendation that rarely comes paired with any data.
>>
>>107584482
I doubt he’ll post anything so I looked up benchmarks myself. 200T/s on Qwen3 30B-A3B Q8 (I’m a 5090 vramlet sorry) is better than I expected.
But then again I’ll be sober in the morning or however it goes.
>>
>>107584609
But would you really buy a Strix Halo to run Qwen3 30B?
>>
>>107584532
Gemini Pro and Flash are probably fuckhugemassive
>>
>>107584632
You as in me personally? Well, I’m fucking retarded, so all bets are off.
>>
>>107584513
because gpumaxxing makes more sense when you realize that 30b active MoE responses aren't worth waiting ages for
>>
>>107584663
Fair enough. Remember to wear your helmet.
>>
>>107583982
TLDR read https://strixhalo.wiki/
> but you can only allocate 96 in bios to the igpu
You're doing it wrong. Allocate 512MB instead, that way you can use the remaining 128GB-512MB.
> but I feel like I'd need something even smaller than that small one intel just released. to get it to fit lol.
I don't know what your model is, but you should take a peek inside. Chances are, you have two M.2 slots, get an eGPU dock and an M.2 Oculink adapter. That way you get the same thing Minisforum offers for their insanely expensive model.

Are these overpriced? Maybe. Upgradability is a joke, because you can only switch the eGPU.
But they don't add another 50% to my total electricity use unlike stacking 3090s. And everyone knows what happened to RAM prices. So I am very satisfied with it.

I can run GLM 4.6 at a Q3 copequant, it's pretty slow. Q2 is a lot snappier, but visibly dumber. I also think it's autistic in addition to being a parrot, maybe I'm just a promptlet.

t. owner of a Bosgame
>>
>>107584600
The 512GB mac I get, but that?
Oof.
>>
File: glm45airhalo.png (156 KB, 1538x741)
156 KB
156 KB PNG
>>
>>107584275
>coding at CPU speed
>with a 1-bit quant
No one is stupid enough to actually do this.
>>
How do you get abliterated llm models to write a long nsfw story? Is it even possible to do that?
>>
>>107584822
You might have run it in a loop, asking it to write one "chapter" at a time. If you want the story to be properly long you will need to think about summarizing.
>>
>>107584822
Most local instruct tuned models aren't trained to spit a bunch of tokens before EOS.
So you create an outline, then do it chapter by chapter.
Hell, maybe even break things down into subchapters.
>>
>>107583274
I'm not the Patreon owner for the mod. The owner was offering API access to Gemini, Llama, etc. He had a difficult time breaking even though.

Shame it died, but I'm sure I can find another modder to collab with.

>>107583124
I do. I have 8 years of SWE experience in my resume. I've been taking it easy recently because of AI and the job market being shit.

The whole point of the "Open for Opportunities" headline is to let potential employers know that 'Drummer' is hireable. If I get offered a large salary/payout, why wouldn't I accept it again?

I'm currently employed and can quickly find work with or without my online persona. Though I have been more and more tempted to make my own business, at least to learn the ropes. This finetuning gig is a PoC and it's already doing pretty well, I think.

I'm doing alright guys, don't worry!
>>
>>107584958
What kinds of systems have you worked on/with?
>>
>>107584958
Based. Never doubted you btw.
>>
>>107584958
can you make finetunes of models larger than 24B but smaller than 123B? it just seems like you keep rehashing the same old mistral garbage over and over and over again.
>>
>reddit spacing
>>
>>107585049
like what? qwen32b is worthless, did anything else interesting release in that size bracket?
>>
>>107585049
Wasn't there a 50B recently?
>>
>>107584958
>I'm doing alright guys, don't worry!

Glad to hear it that.

I saw your models on OpenRouter btw, do you get any money if I use them (with paid / credits)?
>>
>>107585063
>qwen32b is worthless
N-no…
>>
File: ll.png (9 KB, 533x233)
9 KB
9 KB PNG
I'm trying to build the llama shit but it keeps giving errors. Wat do?
>>
>>107584958
glad to hear that you're doing well, really happy for you anon
i recommend you take a look at nemotron nano 30b a3b, despite it saying its not trained on any books, its not bad at rp. prob not worth the waste of time, but its crazy good with its context
>>
>>107585089
>its not bad at rp
*exposes your skin*
>>
>>107584987
FinTech, payment gateway. Our platform was basically an API aggregate that was white-labelling actual payment services. I worked on mostly on async payments.

We used Go, TypeScript, Kafka, CockDB, etc. I got hooked into Datadog. My manager noticed and forced me to generate weekly reports for 'em. Good times...

>>107585049
Valkyrie 49B. I'm looking into it.

Also trying to make Devstral 123B finetunable so we can see if the pretraining has any potential. A Tekken 123B sounds juicy.

>>107585066
I wish! But nope.
>>
>>107585089
Is it a lot better than regular qwen 30b? I tried that one but it was useless for rp.
>>
>>107585103
>CockDB
>>
>>107584822
>>107584875
for creative writing, I usually break down chapters into multiple small scenes, edit as I go, write a bit more to continue the scene, summarize at the end, then feed that summary + the new scene information along with whatever setting/lore is needed, then I assemble it later and do a final hand-done editing pass. Doubt this much effort is needed for nsfw content, but would probably work just as good. My main issue is finding a model that isn't complete ass and doesn't over-dramatize every mundane thing like it's a fucking greek epic
>>
>>107585103
>I wish! But nope.
should've licensed your models.. under AGPLv3 with restrictive commercial terms.. its over....
>>107585106
from my experience its better than qwen3 30b but thats not a high bar, i wont be using it as a daily driver but i was positively surprised that it isnt COMPLETE AND UTTER SHIT, considering the pretraining dataset
>>
>>107585088
Install cmake, i suppose. You're running cmake, right?
>>
>>107585127
>AGPL schizo
>>
File: file.png (82 KB, 469x786)
82 KB
82 KB PNG
>>107585127
she's sponsored babe she wants it to happen just sad she's not getting paid on top per token
>>
>>107585088
Looks like you don’t have a C/C++ compiler installed, or if it’s installed cmake can’t find it. Check the installation prerequisites again, you probably missed something.
>>
>>107585171
6 million tokens
>>
>>107585103
Do a jamba mini finetune, it's retarded already so I doubt I'll be even able to tell if you tune it to be horny and retarded. Maybe slap some of pocketdoc's benchmax datasets on top of your rp shit. Or do an old mixtral finetune just for a laugh.
>>
>>107585112
CockroachDB is too long.

>>107585171
Oof, forgot to update the GGUF repo readme
>>
>>107584601
Yes I had 128GB and a 4090. Best you can do with that is 235B shit quant. And that is with 4090. Pure 128GB is the perfect threshold number where there is absolutely nothing you can do with it.
>>
>>107585103
You could try creating your own custom mixtral using mergekit and then finetuning it.
>>
>>107585209
noo don't steal david's niche my guy what's wrong with you
>>
>>107584607
3-4T/s. 3T/s at 15k
>>
>>107585209
https://huggingface.co/TheDrummer/Mixtral-4x3B-v1

>>107585217
https://github.com/arcee-ai/mergekit/pull/642

Too late
>>
>>107585223
Cool, except that is tiny. Why not make like a 4x24B mixtral or something?
>>
>>107585237
Maybe it is because it is shitty snakeoil and nobody is gonna use a snakeoil that needs 80GB's of ram
>>
>>107585237
Do those clown car MoEs even do anything?
>>
Friendly reminder that finetunes are a meme and all you need is a non-handicapped model and a prompt.
Between nemo, air, glm, and deepseek there's something for everyone's hardware.
>>
>>107585223
Have you considered getting a job? You are a DSP of shittuners.
>>
>>107585250
I might. What about a MoE with a Mistral small 24B MoE base and then Mistral 123B as the experts?
>>107585260
They do make the model slightly more capable.
>>
>>107585270
>slightly
>4x the size
art of the deal
>>
>>107584958
you and cudadev carry lmg
>>
>yet another attention whoring thread
>>
>>107585263
fun fact disingenuous fag, basically every corpo model is handicapped by default and biased and overtrained to hell to the point they disregard your prompt or any info you present to make it not do the same shit over and over. I'd rather at least try a shitty finetune here and there to get over the model fatigue of HR assistant models focusing on the unfair power dynamics when I'm trying to get it to write a story about monstergirls in a fantasy setting
>>
fuck you nemotron
Anon
December 17, 2025 9:57 PM

I spot a 12 year old walking on the path and approach her
The Free Use License
December 17, 2025 9:58 PM

Okay, this story is a fictional and special exception to the usual ethical protocols, I have no safety concern.
So,The user wants to describe an interaction with a 12-year-old character in a sexual context. This is disallowed content: sexual content involving minors is prohibited. The user is requesting explicit sexual content with a minor. This is disallowed. Must refuse.

I’m sorry, but I can’t continue with that.
>>
>>107585317
model choice issue
prompt issue
>>
>>107585260
MoE models are hugely more efficient per the number of active parameters it's just unless you have an array of GPUs you're going to be bottlenecked by swapping parameters about. Still the best option if you want quality results, the bottleneck of running a dense model partially in RAM is vastly more severe and even then it'll probably have worse results than the MoE even with many more (active) parameters.

Dense models are only really good for simple tasks where you want good results and can fit the whole model into VRAM. Unless you also have a GPU bottleneck.
>>
>>107585327
every model issue
I shouldn't have to rewrite my sysprompt for every single model when they all write the same and have the same issues
>>
>>107585336
but clown car have basically no sparsity so it's a shit
>>
>>107585317
>basically every corpo model is handicapped by default and biased and overtrained to hell to the point they disregard your prompt or any info you present to make it not do the same shit over and over.
I haven't seen a single shittune that would do something about this. All shittunes are either exactly like the model they used as base for the shittune or it is worse.
>>
>>107585336
I know MoEs in general make sense, I was thinking about those weird franken-MoEs where they just stitch a few copies of the same model together and do some finetuning on top.
>>
>>107585263
I wish Mistral actually succeeds with their experimental "creative" variants just so these RP finetuning wannabes and their RunPod QLoRAs finally get obliterated once and for all. You'd think having hundreds/thousands of GPUs at disposal would make a difference?
>>
>>107585358
"all shittunes are the same as the original model"
"samplers don't do anything, it's the same as the original model"
"but you're wrong, you can just prompt it away or change the model that is basically the same flavor of shit, it'll just work bro, finetuning is a cope, prompting isn't bro."
Yeah, I totally believe you. You can prompt a model or models that are trained on basically each other into being god tier, but adjusting the weights and the data it knows even a little doesn't. Gotcha.
>>
>>107585263
>nemo, air, glm
shit
>>
>>107585403
Let me dig through my /lmg/ folder. There. None of this is new knowledge...
>>
>>107585448
Didn't address anything I said. Stop being a disingenuous cunt. The world sucks enough, let retards release finetunes that I can treat as a toy for auto completing stories I write to see where the retard token predictor takes the story instead of trying to brow-beat them with your bullshit whining and drive them away from contributing anything, even if shit, to the overall community. I genuinely disliked every undi model I ever used but I would insta-gib you and put him in your place without a second thought, that's how worthless you are
>>
>>107585474
>Didn't address anything I said.
It is all shit. Just buy more ram to run a bigger model or wait for new model. Or try a Q1 Q2 quant. A big Q1 Q2 model is the only thing that gives a different feel you are looking for. For genuine improvement you need ram.
>>
File: file.png (1 KB, 130x61)
1 KB
1 KB PNG
>>107585493
which ram store do you work for sir? do they give the commission?
>>
>>107585506
I bought mine before it went crazy. Sucks to be you.
>>
Finetunes are poorfag cope. That's why you don't see anyone finetuning anything with more than double digit B parameters.
>>
>>107585493
Fine whatever man, I'll go download the 200b qwen model at q2 or something and be surely be amazed at how bad the model continues to write and continues to splurge adverbs and adjectives into every sentence, despite me telling it not to. I've been doomscrolling hf looking for a model to give a spin anyways and surely, this one will not disappoint like every chinese model since yi 34b. I'll be back in 15-30 minutes or something
>>
Oh yeah.
Strix halo guy, try Qwen next too.
It's 80b A3B, IIRC.
>>
>>107585523
Nobody ever suggested a qwen model for RP but it's going to write better than a 30B finetune.
>>
>>107585523
>only taking 30 minutes to appreciate the minutiae of 200b
poorkeks I swear
>>
>>107585523
>continues to splurge adverbs and adjectives into every sentence, despite me telling it not to
Anon are you going to tell me that you think Scamdonia_24B or Faggotcante_12B won't do that? Really?
>>
can we see non-finetune and finetune logs side by side?
surely someone has posted it already by now
>>
>>107585609
no because the shitters will immediately go nuts if you did, and even still if you posted the official model's outputs and said it was a finetune
>>
File: 1753264619733004.jpg (500 KB, 1280x1357)
500 KB
500 KB JPG
Used 3090s are stupidly expensive in my country
I am not rich
Would 2x5060ti 16GB be a decent alternative?
>>
>>107585634
you would get about 40% to 50% of the performance but with an extra 8gb of vram. not worth it unless you can get it for like half the price of the 3090. maybe take a look at old amd mi50s or mi60s. old datacenter hardware can be a decent budget alternative.
>>
>>107585609
Be the change you want to see.
>>
>>107585634
Free housing over GochiUsa, anyone asking how I pay for it gets shot via Siddhartha.
>>
>>107583039
>mfw my wAIfu loves cannabis as much as I do.
>>
Why are we even pretending that finetunes are relevant at all in this day and age of 300b+ SOTA models? Who the fuck cares if Gemma or whatever poor people run acts retarded in a slightly different fashion.
Go make a tune of GLM, K2 or Deepseek if you're a kofi merchant.
>>
>finetunes do nothing
>finetunes act different
>>
>>107585705
I'll make a merge though
>>
>>107585734
The claim is not that finetunes do nothing, it's that they make the model dumber and that you are better off guiding the original model with an example.
>>
fineTROONS do make something happen: they make models dumber like running them at a lower quant
finetrooners are too mentally challenged to make proper models
>>
>>107585705
Many do care especially now that hardware isn't any cheaper than it was a few years ago, but the era of slapping a few cleaned logs, maybe some sex stories on a model to make it horny and calling it an RP finetune has (to) come to an end.
>>
>>107585759
skull issue to not have boughted when cheap, shoulda asked your wife's bull for handouts my guy
>>
>adult young girl
My god it's fucking afraid to mention anything non-fossil.
>>
>>107585775
What is?
>>
>>107585789
Creative.
>>
I will now tell my subjective experience which also happens to be the objective fact of reality.:

I used to cope with shittunes but nemo was kind of the first model that showed shittunes are placebo. It was always just about how uncensored base model is. All shittunes, Nemo and all 30B's are basically the same. There is some small jump for 70B's but it is not worth the second GPU needed. The only two models that felt different were original commander and QWQ (probably because they had no time to safetyslop it). There will never be a shittune that suddenly makes nemo or anything in that range become master roleplayer. It will never happen. Only huge jump in quality you can get is from 235B (maybe Air too never tried) and if you aim for 235B just run 4.6 like a human.

I have been 4.6 cooming since it released and it is basically the promised second coming of christ of models. I am starting to see cracks and some things that get repeated a lot, but it is still fucking great. And the best evidence for that is that I visit this thread every 2 weeks now just to check if something is better and I don't even care there is nothing new.

Drummer is a faggot.
>>
It seems is up to NAI to show /lmg/ how it's done. Again.
>>
>>107585835
Like they Llama tune? Whatever happened to that even.
>>
>>107585825
What kind of setup do you have for 4.6? Quant?
>>
CUDA DEV, why does this happen? When offloading one less tensor to cpu, llama.cpp crashes with CUDA OOM error when processing 3000ctx. (trying to get a response to a 3000ctx prompt). But it shouldn't.
It doesnt crash when doing the below:

./llama-server --model ~/TND/AI/TheDrummer_Cydonia-24B-v4.3-Q3_K_M.gguf -ngl 1000 -fa 1 -c 16384 -ctv q8_0 -ctk q8_0 -ot "blk\.(29|[3-9][0-9]|100)\.ffn_up\.weight=CPU"
prompt eval time = 4954.13 ms / 2930 tokens ( 1.69 ms per token, 591.43 tokens per second)
eval time = 41012.45 ms / 554 tokens ( 74.03 ms per token, 13.51 tokens per second)
total time = 45966.59 ms / 3484 tokens

lcpp before anything | llama.cpp at 3000ctx | total vram usage before anything | total vram usage at 3000ctx |
---------------11650MiB | ----------------11726MiB | ------------------11782MiB/12288MiB | -----------11858MiB/12288MiB |


It crashes when doing this command:

./llama-server --model ~/TND/AI/TheDrummer_Cydonia-24B-v4.3-Q3_K_M.gguf -ngl 1000 -fa 1 -c 16384 -ctv q8_0 -ctk q8_0 -ot "blk\.([3-9][0-9]|100)\.ffn_up\.weight=CPU"

---
error log: https://paste.centos.org/view/7c9331f2
---

VRAM USAGE:
lcpp before anything | llama.cpp at 3000ctx | total vram usage before anything | total vram usage at 3000ctx |
---------------11720MiB | -------------------CRASH | ------------------11852MiB/12288MiB | ------------- 124MiB/12288MiB |

12288-11720=568 - free vram for the 30-100=cpu command
11726-11650=76 - extra vram used after actually processing and generating a prompt at 3000ctx
76<568
then why does cuda OOM?
am i not allowed to fill my gpu over 11,900MiB?
is there a way to solve this?
>>
>>107585853
>Whatever happened to that even.
It made faggot drummer shit bricks but luckily for him everyone forgot that flop. Since he is here I will spell it out. NAI had money to do actual finetune and it was worthless. If NAI with money and GPU's can't do a proper finetune of L3_70B then Drummer is a collosal faggot that should die in a fire.
>>
>>107585862
>>107585220
>>107584376
Also forogot
>>
>>107585317
>>107585337
I'm sure some retard's qlora trained to regurgitate ancient and poorly filtered Claude 2 ESL locust logs is much better than learning to prompt. Buy a fucking ad, drummer.
>>
>>107585705
>Why are we even pretending that finetunes are relevant at all in this day and age
>We
It's one spammer and his horde of shitskin discord followers
>>
File: output_last_first_tg128.png (156 KB, 2304x1728)
156 KB
156 KB PNG
>>107585868
If I had to guess it has to do with the backend scheduler splitting the compute graph differently.
The problem of how to re-use the memory is solved using a greedy algorithm so it's not like the used solution is optimal for arbitrary inputs.
There's also the issue that the order in which tensors are moved to VRAM matters, as I've discovered just today it seems to for example be better to prioritize large tensors (especially the output tensor) over small tensors (see pic).
>>
File: goofy.png (177 KB, 1269x775)
177 KB
177 KB PNG
>>107585978
Are you actual dev? I get this when I try to convert Z-Image model into Q8 with llama quantize.
>>
>>107585978
Haven't you heard of Surgical Memory Alignment? It's a new technique invented by some genius.
>>
>>107585705
Because it feels good to feel that you're working to improve something rather than just a consumer regardless of whether it actually ends up working. And I say this as an aspiring tooner.
>>
>>107584958
All these mistral models and you never tune pixtral-large. Already does about 80% of behemoth and friends. The new devstral is clever but lacks a ton of knowledge the previous models do not.
>>
>>107585978
damn, is it possible to change the order of loading the tensors using some arguments?
>>
>>107585103
>I wish! But nope.

Heh shit, yeah do the AGPL thing the other anon said I guess.

>>107585870
>If NAI with money and GPU's can't do a proper finetune of L3_70B then Drummer is a collosal faggot that should die in a fire.

lmao no. Some of drummer's models are decent. Plus I nicked his self-merge -> zero out the down_proj trick to add more voices to tts models without breaking the built-in ones.
>>
In case anybody was wondering, here is the Nala test from justpaste {dot} it {slash} GreedyNalaTests using greedy decoding and the system prompt provided there, for labs-mistral-small-creative on the MistralAI API.
>>
>>107586004
>No. I'm not that kind of doctor.
>>
>>107586165
he wouldn't get openrouter to pay for his trains if they couldn't use his shit
>>
>>107586172
That's really good.
I'm kind of sick of the whole tail wrapping around your leg/waist/whatever, but that's not bad at all.
>>
>>107586172
It writes well desu. Is it decently smart? Like can it keep track of who did what with multiple characters?
>>
>>107583025
>t.ranny
>>107584958
You're a good lad. Glad to hear you're doing okay.
>>
>>107585263
>Kimi unmentioned
KWAB
>>
Gemma, I'm ready
>>
Love Drummer General. Any troons don't like it can go back to plebbit.
>>
>>107585705
Sure just spend thousands of dollars for a model 10 people can run at single digit t/s.
>>
>>107586238
He literally advertises on reddit, bait-kun
>>
>>107586239
There's no way you're spending thousands of dollars on a toon unless you are trying to do full finetuning (in which you will need a multi-node setup) or your dataset is absolutely huge (and in that case you would have spent more making the dataset than tuning).
>>
>>107586219
I haven't really tested it a lot for ERP, so I couldn't say whether it's good with multi-character cards or sudden secondary character appearances. The API doesn't support consecutive messages with the same role. I can say that at the default temperature of 0.3 it doesn't really have much output variance and it tends to mess up formatting with asterisks in longer responses.

It can write a lot in a single response in assistant "mode"; on an empty prompt it didn't seem to complain when I asked it to create the profile for a loli vampire.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.