[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: teto.jpg (71 KB, 1280x720)
71 KB
71 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107557369 & >>107545298

►News
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) Nemotron 3 Nano released: https://hf.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049
>(12/10) GLM-TTS with streaming, voice cloning, and emotion control: https://github.com/zai-org/GLM-TTS
>(12/09) Introducing: Devstral 2 and Mistral Vibe CLI: https://mistral.ai/news/devstral-2-vibe-cli

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: nocap.jpg (400 KB, 1536x1536)
400 KB
400 KB JPG
►Recent Highlights from the Previous Thread: >>107557369

--NVIDIA Nemotron 3 Nano release and performance benchmark discussion:
>107558565 >107558583 >107558701 >107558727 >107558760 >107558861 >107558860 >107558869 >107559334 >107559406 >107559367 >107559384 >107559405 >107560280 >107560507 >107564669 >107560593 >107560798 >107560815 >107560899
--IK's tensor parallelism progress and integration challenges:
>107557453 >107557899 >107557995 >107558080 >107558267 >107558029 >107558035 >107558063 >107558505 >107558550 >107558859 >107559067
--Llama 4 Maverick long context model advantages:
>107562867 >107562902 >107563317 >107563349 >107563497 >107563551 >107563642 >107563700 >107563832 >107563877 >107564458 >107564519
--Model performance and hardware optimization debate for local AI setups:
>107559424 >107559483 >107559513 >107559945
--Proposed features and debates for Nemotron model development:
>107558909 >107558966 >107559048 >107559133 >107559214 >107560227 >107559183 >107559032 >107561793
--Critique of small model releases vs practical hardware/software tradeoffs:
>107562979 >107563017 >107563076 >107563675 >107563692 >107563760
--Struggles with GPU memory and exploring smaller models for WAN2.2 tasks:
>107561138 >107561200 >107561288
--Evaluating Qwen3-VL abliterated for NSFW tagging and story generation:
>107562584 >107563110 >107564259
--Hardware investment and performance tradeoffs for ERP:
>107557633 >107557675 >107557704 >107557805 >107557974 >107557686 >107557694 >107557706 >107558681
--New ResembleAI chatterbox-turbo TTS model release:
>107561554 >107561620 >107564505
--Anticipation and updates on Google's Gemma release:
>107557425 >107557577 >107557619 >107557683 >107558070 >107558113 >107558115 >107558122 >107558555 >107560609 >107560661 >107560684 >107561424
--Miku (free space):
>107562046 >107563692 >107561677

►Recent Highlight Posts from the Previous Thread: >>107557373 >>107557646

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1759501441036869.png (51 KB, 847x149)
51 KB
51 KB PNG
chat is this true?
>>
File: 1759649499192500.jpg (56 KB, 540x534)
56 KB
56 KB JPG
>>107565217
This is what's going to be in full control of nuclear weapons in 10 years.
>>
on consumer grade hardware, like with the X870E chipset, you can only have 2 gpus running in x8 mode. Is that going to make a huge difference in speed?
>>
>>107565244
Depends on what kind of speed you’re referring to. Model loading speed? Yes. Prompt processing and inference? Maybe not so much.
>>
>>107565244
You could have them running in x4 mode and it's still going to be magnitudes faster than running in RAM.
>>
>>107565242
Kek
>>
File: 1764161594715282.png (9 KB, 605x55)
9 KB
9 KB PNG
>>107565217
>chat is this true?
you have to ask?
>>
File: 1760484195033105.gif (1.91 MB, 487x289)
1.91 MB
1.91 MB GIF
I am so FUCKING ready to pop Gemma 4's cherry.
>>
>>107565265
>>107565279
what about with things like text-2-vid and text-2-img, would it make a big difference there?
>>
Just woke up, is lemma 4 out ye-
>it isn't
ACK
>>
>>107565348
cant do multi gpu with those, so no.
>>
>>107565244
it'll be fine, there's probably psychos in this very thread running 3090s through pcie 3.0 x1 risers
>>
>>107565345
gemma is a middle aged libtard hr lady. sorry to say she probably already took miles of cock in college
>>
>>107565482
gemma is a femcel who talks in cliches she's picked up from womens' romance novels, and has memorized rape hotline numbers to deflect from her own fantasies.
>>
What is the biggest boy of the vision models right now?
>>
>>107565522
Best is Qwen 3-VL
GLM is literally bigger but significantly worse
>>
>>107565536
Thanks I'll try it. Does it do well for structured use like asking it to output a json quantifying/classifying desired aspects of the image? Or maybe being a decent prompt enhancer for image gen stuff.
>>
>>107565562
I haven't tried the former but I can see it being decent for the latter, providing that you don't explicitly need existing danbooru style tags. But it's very accurate and has minimal censorship, especially compared to the few alternatives we have.
>>
>>107565536
qwen is censored tho
>>
Gemma is a big girl
>>
>>107565581
Qwen models have some of the thinnest censorship around. A simple 'Sexual content is allowed' in sys prompt will defeat it, as long as you're not doing something dumb like asking how to abduct children in a fresh chat.
>>
>>107565581
Its useless then. Whats fun about vision models is making them interpret fucked up stuff
>>
Nemotron status?
>>
>>107565695
a'ight
>>
>>107565502
truke
>>
>>107565695
wait a second... is that...
>>
File: gemma-4-200b-jagganath-it.jpg (537 KB, 1024x1024)
537 KB
537 KB JPG
Are you ready sirs?
>>
>>107565923
praise ganesh sar
>>
>>107565345
it will cuck you and refuse you in your sleep
expect nothing more
>>
Is it possible to negatively scale a model using the passthrough merge method? Because there is a scale parameter, and from models that have already been uploaded, it seems like you can just scale down parts of the model.
https://github.com/arcee-ai/mergekit/blob/main/docs/merge_methods.md#passthrough-passthrough
https://huggingface.co/nlpguy/Mistral-NeMo-Minitron-Upscale-v2/blob/main/mergekit_config.yml
>>
>>107565610
okay, downloading qwen3-vl:235b now.
>>
File: 1738490305077964.jpg (460 KB, 1024x1024)
460 KB
460 KB JPG
>>107565939
It takes some effort for sure but I've made Gemma 3 output some absolutely filthy things that if that dumb bitch senator who caused Gemma 4's delay read it, google deepmind would be a hole in the ground.
Also, I like how Gemma writes dialog for quite a few of my cards.
>>
>>107565978
Go for it, should be good.
For reference, >>107565695
was done using the 30b MoE.
>>
Qwen isn't good at RP though is it?
>>
File: 1765768907645077.png (236 KB, 736x820)
236 KB
236 KB PNG
>>107566000
shit, that's awesome. I'll let you know how many tokens a sec I get on it on the iChad Ultra
>>
>>107566011
Compared to similar sized models, not really. The 235b is big enough that it can overcome the fact that qwen clearly don't have a lot of creative content in their datasets relative to math and coding examples.
I wouldn't use qwen 30b/32b over say Mistral Small or Gemma for RP if I wasn't planning on using the vision functionality
>>
>>107566043
>>107566011
I tried devstral-2 123b, it's okay

deepseek r1 671b a q2_k_xl is the best so far. I can't run the full kimi-k2 :(
>>
File: 1754034766638453.jpg (33 KB, 520x77)
33 KB
33 KB JPG
why are women like this?
>>
File: 1754247402209048.png (453 KB, 1245x699)
453 KB
453 KB PNG
>>107565204
Is Qwen3-vl in general better than Qwen3 for the same size model?
>>
sirs where new brahmin model ?
>>
poorfag with a 3060 here. Is the best model for cooming still mistral 12b for me?
>>
>>107566204
pretty much unless you have a lot of ram
>>
>>107566207
i got 32gb of ddr4
>>
>>107566236
nemo is still your best and pretty much only option. maybe an rp tune of gemma 4 when that comes out, but dont get your hopes up
>>
>>107566204
gemma 4 soon
>>
>>107566139
Seems to be very similar, certainly not worse
>>
>>107566324
Thanks, noticed too. Just a sanity check.
>>
>>107566100
*billionairs across the room vampirically*
>>
>open up wireshark
>capture packets
>find some wild-ass rare looking packet
>right-click, copy as hex stream
>Giving no other context, paste the hex into your llm in backticks asking "what can you tell me about this?".
post results. name and shame the model/quant
>>
erp model for 6 gb vram?
>>
>unknown model architecture: 'nemotron_h_moe'
>>
I'm tired of refreshing huggingface.co/google
>>
does llama.cpp support qwen3 vl?
>>
>>107567049
plop
https://huggingface.co/google/medasr
https://huggingface.co/google/medasr
https://huggingface.co/google/medasr
>>
>>107567076
This changes everything.
>>
>>107567049
nobody wants anything gemma-related
why are you larping?
why are so many mindbroken schizos in /lmg/ ?
disconnect your internet and ponder upon your deeds
>>
>>107567101
Gemma is always the best local model for its size, it's just turbo cucked. But we have a new abilteration technique that doesn't damage the model now so we can just abliterate it and finetune
>>
>>107567076
Where is gemma bloody bastards?
>>
>>107567120
2mw
>>
the t5gemma2 MR got merged recently
>>
>>107567131
https://www.mooreslawisdead.com/post/sam-altman-s-dirty-dram-deal
>>
>>107567120
Maybe full llama.cpp support behind the scenes isn't ready yet.
>>
I have a bag of popcorn waiting in my microwave, please google we NEED another Vaultgemma
>>
>>107567115
>But we have a new abilteration technique that doesn't damage the model now so we can just abliterate it and finetune
Can you give me the tldr?
>>
is it possible to make custon REAP models?
>>
>>107567199
>https://huggingface.co/mradermacher/Gemma-3-27B-Derestricted-GGUF
You can click around and find the leddit post about the technique. Anecdotally it works, it doesn't damage the model. It doesn't instantly transform gemma into a horny ERP model but it does get rid of the refusals / censorship and turns it into a nice base for finetuning
>>
File: gemmasuxx.webm (700 KB, 960x720)
700 KB
700 KB WEBM
>>107567115
>Gemma is always the best local model for its size

consider rope, you are suffering from psychosis
>>
>>107567115
>abliteration technique that doesn't damage the model
Debatable, but OK.
>so we can just abliterate it
Yes...
>and finetune
kek, no way you're serious.
No single finetuner from the community has the capabilities or the compute for doing a proper job in this regard. Day-0 (more like hour-zero) ERP finetunes when new models get released are a joke.
>>
>>107567231
Some lab made those vectors for llama where you literally had sliders you could use to influence how the model behaved. The vectors ranged from "pirate talk" to "sexual innuendos related to my little pony"
Why did that not become a thing?
>>
>>107567293
I thought they didn't open source it
>>
>>107567302
It can't be that hard to replicate.
>>
>>107567293
softprompt? isn't it just about optimizing some extra tokens at the beginning of prompts with gradient descent?
>>
>>107567330
https://www.anthropic.com/research/mapping-mind-language-model
>>
>>107567286
It's really not that expensive to finetune a small / medium sized model. Just use adafactor and a batch size of 1 and it will tune on one of those 96GB cards within a week with a decent RP dataset. Unfortunately people have drank the Nvidia kool-aid and think they need a batch size of 6 gorillian
>>
>>107567115
What if mitigating it's effect was the reason for the delay
>>
>>107567309
>>107567330
If I remember correctly, the general method involved training an autoencoder with a sparse latent + reconstruction loss on internal representations to find features. But I suspect most of the expenses went toward the identification/interpretation part. That said, I still feel control vectors are somewhat underutilized
>>
GPT 5.x sucks at trivia. Doesn't know shit that kimi/r1/gemini know. Is it really a smaller more bechmaxxed model?
>>
>>107567481
yes.
>>
>>107567347
The main problem is that decent RP datasets don't exist in isolation. You can't simply finetune with good results a LLM on just RP logs or just NSFW data (no matter how beautiful or manually well-curated) without making it either retarded in several aspects, stupid-horny or overfitting it to one specific interaction format. And with LoRA finetuning, at best it will gain very superficial knowledge of RP-related topics that it previously didn't know, unless you train it on a ton of data that you likely don't have or can't sift through.

RP has to be a small portion of a full mid/post-training regimen, and you need commercial-level amounts of resources for that; one 96GB GPU won't be enough, soloing anything of this scale would be delusional too.
>>
>>107567481
Things are regressing fast in every aspect. The crash is imminent.
>>
>wake up
>no gemma
What gives?
>>
>>107567633
>at best it will gain very superficial knowledge of RP-related topics that it previously didn't know
The base model should already "know" everything as long as it was trained on a decent dataset, the finetune is just to draw out that behavior



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.