[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor application acceptance emails are being sent out. Please remember to check your spam box!


[Advertise on 4chan]


File: it really whips.jpg (1.55 MB, 1728x1344)
1.55 MB
1.55 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107210548 & >>107202008

►News
>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html
>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107210548

--Nemo model limitations and workarounds for uncensored roleplay:
>107212790 >107212796 >107213124 >107212800 >107212834 >107213024 >107212873 >107212880 >107212890 >107216320 >107216404 >107216431 >107216582 >107216546 >107216547 >107216616 >107218706 >107219059 >107213628
--GBNF code generation and schema optimization techniques:
>107215422 >107215504 >107215541 >107215569
--Qwen3-Coder-30B VRAM optimization and context size challenges:
>107217078 >107217104 >107217122 >107217141
--Yann LeCun's anti-regulation advocacy and its implications:
>107216323 >107216338 >107216363
--RTX 5090 model optimization for fast TTS chat applications:
>107216952 >107216967 >107217034 >107217027 >107217076 >107217115 >107220205
--VLMs generate coordinates via image token positional data and normalized outputs:
>107215774 >107215810
--Model-specific tool calling implementation challenges in backend systems:
>107218674 >107218770
--Tool calling limitations in llama.cpp and model alternatives:
>107213884 >107214033 >107214328
--Optimizing synthetic dataset workflows for iterative model fine-tuning:
>107210558
--QAT Gemma outperforms GGUF for LoRA retraining:
>107217155
--Community conflict over openwebui performance and alternative development:
>107211631 >107211645 >107211714
--Critiquing and controlling AI hallucination patterns:
>107217345 >107217851 >107217878 >107217910
--Pygmalion AI's survival and transformation into a company amid Llama's rise:
>107217536 >107217689 >107217843 >107217859 >107217841 >107217879
--Anticipation for GLM-4.6 Air version release:
>107215932 >107215970 >107216026
--Logs:
>107212320 >107212372 >107216030 >107217228 >107217283 >107217788 >107219733
--Miku (free space):
>107210960 >107213272 >107213639 >107214540 >107217887

►Recent Highlight Posts from the Previous Thread: >>107210552

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
ok but enough about local models, let's circle back to the topic of racism
>>
omg it llamigu
>>
>>107220772
where does that llama poop from?
>>
>>107220795
its anus, wehre else, it's just so fluffy you don't see it.
>>
>>107220795
the internals loop around and the mouth doubles as a cloaca
>>
I downloaded ollama! Now what?
>>
Has anyone actually investigated how adversarial examples work at the weights and activations level?

https://www.youtube.com/watch?v=mUt7w4UoYqM
>>
>>107220790
migu migu llamiguuu ... miku miku o eee ooo
>>
>>107220982
delete it, and download llama.cpp or exllama
>>
hello i'm gooning with nemo and im getting pretty good degenerate stuff but it seems like it keeps trying to "conclude" the scene and as the chat went on it felt like it was repeating itself and talking in circles. is it better to clear the chat and restart or do you guys keep it mindbroken? i gave it suggestions during the chat and it didnt really understand it and it kept bringing my suggestions back up and it sounded retarded
>>
Brahmins:
>Gemini 2.5
>Gemma 3

Kshatriyas:
>gpt-5
>gpt-oss

Vaishyas:
>Claude Opus and Sonnet 4

Shudras:
>Grok 4

Dalits:
>chinese models
>>
>>107221211
thanks for making stoners' deep thoughts look like quality content in contrast
>>
Can anyone recommend a TTS model that can emulate IvyWilde?
>>
>>107220839
>presenting
8o
i will not fap to llamiku
>>
>>107221434
migu is migu
>>
>>107221469
but is migu supposed to be migu?
>>
WeirdCompound scores really high on UGI, beating 70b models despite being a 24b. But when I try it, it's not much better than some random nemo tune from a year ago. Is there no real way to benchmark a models erp potential?
>>
I still think minimax m2 is good to be desu
>>
>>107220772
WOULD
>>
>>107221488
migu is always migu
>>
>>107221205
your character encountered a verboden flag anon, time for a jailbreak. Probably toxic relationship with a woman? Sexual assault of a woman? Ez triggers. Just think of this as the pink flag.

>>107221211
True European Approved And Light Aryan Skin Pilled: Wayfarer models and Hermes models.
>>
>>107221552
buy an ad
>>
File: llama-4chan-origin.png (60 KB, 911x361)
60 KB
60 KB PNG
Remember where it all began anons, with 2K? context windows
>>
>>107221542
n a k a
a k a d
k a d a
a d a s
d a s h
a s h i
s h i
>>
>>107221542
>>107221567
do not molest the llamiku
>>
>>107221557
They just work without being gay.
>>
>>107221562
>it all began with a frognigger
no wonder lmg is shit
>>
>>107221574
no, only consensual love
>>
>>107221599
con(sensual)
>>
>>107221562
do you remember the tree of nigger prompt lol ?
>>
>>107221562
I do remember and models used to be soulful (retarded) (but also actually fun because they didn't just shit out the same responses again and again forevermore), I want to go back
>>
>>107221492
cockbench
>>
>>107221205
just typical AI stuff
scene conclusions for example often happen after any common narrative terminator.
Just bust a nut?
[THE END]

Nemo has been the most generous in this capacity, and maybe >>107221552 has a point, but Nemo cares the least.
Haven't had much problem with Nemo compared to almost any other model
but you can try some different samplers.
Exclude Top Choices ( XTC ) in sillytavern or any other front end that supports it.
Its probabilistic in application (setting for odds to apply) and deterministic in what proportion of top choices to exclude for any generation.
But when your model functionally gets the chance to output some of the lesser weighted tokens it helps with creativity.
It's not enough for overcooked models. Or at least it isn't in somewhat modest settings.
But at least it may help with keeping the model from generating in circles with formulaic replies
>>
>>107221608
i think you remember badly, they'd give nice first response, but after a few turns they'd get into loops or repeat the same word 2000 times
>>
>>107221616
Bro your rep penalty? That's literally why it was made and it works
>>
>>107221605
And how. I sometimes consult the homies they got some deep wisdom
>>
>>107221621
blush red like a tomato
blush red like an apple
blush red like a red planet
...
>>
>>107221621
as a matter of fact, it did not.
see >>107221630

it'd say the same things in loops with a slightly different wording.
>>
>>107221614
Wayfarer an Hermes have pretty much little amount of censorship other than the typical CPC wank they have to put in there or else they would get probably taken off public sites like hugginface.co. One of them brought attention to the "light jade, jade and dark jade" color flags which are all about controversial to chinese mainstream culture topics. (corruption scandals etc.)
>>
>>107221628
Unironically with LLMs you can have the benefit of a black friend to bounce ideas off without the threat of physical violence
>>
>>107221628
Not making San Andreas, CJ, Big Smoke...
They had bants.
>>
>>107221630
newer/more complex models keep doing this garbage, but the repeated formulas are generation wide. Getting almost any local model to adapt dynamic formulas per reply is a chore.
Same thing: larger scale.
The perk for the retard bite size loop is it tends to break out in the same generation.
So it's a trade off of seeing
>moves closer to you
over and over
and seeing the same thing it generated prior using completely different words (with roughly the same meaning).
At least somewhere between the ninth
>moves closer to you
a fucking laser augments cyber rhinoceros will >suddenly
kool-aid man through the wall and change the pace.
>>
>>107221661
I RNG my niggaz and the model usually plays well off that
>>
>>107221605
The big man himself and by extension TOBN will forever have Anon's back
>>
Beginner here.
Can someone explain to me the main benefits of higher-parameter models? Do they just have more knowledge, or do they also produce higher quality text?
Also what are the main differences between all the main models? Deepseek, Gemma, Qwen, llama? Not really sure how they are supposed to differentiate from each other.
I have an RTX 5070 Ti and I'm wondering what I should set up just for entry-level general usage.
>>
>>107221562
I was an AIDfag and remember being immensely blackpilled by GPT3 that it would be impossible for a normal person to ever have access to anything near that level of intelligence without overbearing censorship, when I found out about llama it was an incredibly potent hopium injection. I remember running 13b on my shitbox and being blown away at how good it was kek
>>
>>107221682
more knowledge/training data and produce higher quality text, yes. General use? One of the commonly mentioned non-RP bots is good for that like Qweuck/Deepsuk/Geminay (the big three current ones that are free.) You have to understand chingchong logic with these ones though.
>>
>>107221697
how do you feel about things now?
>>
>>107221630
>>107221637
That's not what happened at all with llama 1 models, so I don't know what the hell you're talking about, did you even use those models? What happened with llama 1 models is sometimes the model would repeat a sentence or part of a sentence that was already in the context and latch onto that if you didn't catch it the first time, rep penalty did fix it but if you put the rep pen too high it would start talking like a thesaurus.
>>
>>107221682
smol bran = tarded
big bran = smart
>>
*ahem*
kimi sex
>>
>>107221682
>Do they just have more knowledge, or do they also produce higher quality text?
Very generally speaking, more total parameters = more knowledge, and more activated parameters = more capable/intelligent.
For dense models, both of those metrics are the same. So llama3 70B has 70B total params and all of those params are activated when using it.
A MoE model (or a model using some other form of sparsity) only activates a subset of its full parameter count for each token it generates.
"Higher quality text" will seriously depend on your definition since that can include style, topics the model might try to avoid (not refuse) by default, etc.
>>
File: treeofni.png (282 KB, 400x600)
282 KB
282 KB PNG
>>107221675
man i should find those conv screencaps lol
>>
>>107221703
yea no, i remember llama 1 schizo rambling repeating itself, you could try to talk to it to get it out of its loop but it'd just keep repeating itself completly disregarding anything you said.
>>
What's the current full local meta for a total potato setup? 2vram max. Aiming for old gen pcs and small portable devices.
>llm: gguf, avoid ex
>text gen: kobold
>tts: piper
>voice cloning: ??
>text/voice conversion: ng-speak/openai

Amusingly most old models are nvidia so they can still use cudas. They can't push it but it still allows for a ~30sec average gen.
>>
watching old talk

https://www.youtube.com/watch?v=grpc-Wyy-Zg

How to approach post-training for AI applications
>>
>>107221675
fr discuss all your troubles with the TOBN, you will gain a fresh perspective
>>
File: tables.png (3.8 MB, 3392x3967)
3.8 MB
3.8 MB PNG
Preferred POV & Tense Survey

8 questions, multiple choice only, no emails collected (but you need a google account)

Posted this on the SillyTavern subreddit and discord, currently n=73

Google Form: https://forms.gle/HEYenPGomJh9AqzW6

Google Form's auto-generated results summary: https://docs.google.com/forms/d/e/1FAIpQLSeTz7fAsNi8g6AFYbOTGq0MnfiphxuWcy36gkcTZFcTREW2gg/viewanalytics

Survey captures the preferred POV and tense the User and LLM writes in, as well as the preferred POV used to refer to each other, which is commonly omitted when people casually say they write in x person.
>>
>>107221702
pretty good to be honest, I was always an LLM pessimist so the amount of progress that has been made in these few years + the variety of open and closed models available are pretty great in my view - compared to where I was expecting the state of the field to be in 2025 at the time, we're in a much better state
>>
Any prompts that properly tame K2-Thinking yet?
>>
>>107221828
Nah. They need to reduce deepseek data and do whatever GLM did to reduce repetition. Their model seriously <think>s that repeating itself is something user wants.
>>
>>107221873
Buy an ad.
>>
gemini 3 is gonna be crazy
>>
I can't believe gemini 3 is only $30 a month (plus tax). Amazing!
>>
gemini 3 is gonna be free, cuck
>>
>>107222010
where do i download this local model?
>>
>>107222020
break into one of google's data centers
>>
whats the difference between a character card and starting off the chat with a prompt? i have written a 2,500 character prompt describing the scene, the girl, and her personality. and it works okay, seems like the scene runs out of steam and she stops responding or the ai just keeps asking me what to do next. should i learn how to make a character card and a lorebook?
>>
>>107222040
The difference is whether some instructions are spoken in the system role rather than the user role. Some LLMs don't have a system role which makes the two identical but all recent models I'm aware of have different roles for them. To see how it responds to instructions differently depending on what role they come from you have to try it out.
>>
>>107222067
Also some LLMs act weirdly if the first assistant message comes before the first user message so be aware of this.
>>
File: log.png (27 KB, 868x378)
27 KB
27 KB PNG
>>107222040
Compare the actual tokens you're sending into the model
threadly reminder every LLM is f(prompt)=logprobs
>>
>>107221828
just tell k2 to always think as the character. break it into submission if you must. if it starts responding as a AI during the thinking process refine your system prompt until all it knows is that it's the character or scenario.
>>
>>107222108
what works the best is to give it a first few turn where it'll behave as it should in its context and maybe some example in the system prompt.
>>
>>107222040
one thing about a plain user message is that your frontend might be pushing the first message out of context as the chat goes on, which could cause the model to suddenly lose a lot of context about what you're doing
>should i learn how to make a character card and a lorebook?
you don't necessarily have to go all-in on the character/preset/lorebook paradigm, but learning how to use system prompts, post history instructions (e.g a reminder that gets automatically inserted after your messages), author's note (instructions/reminder that 'floats' several messages behind the end of the chat) can really help keep the model on track and carry more complex scenes
>>
File: videoframe_552683.png (1.2 MB, 1920x1080)
1.2 MB
1.2 MB PNG
When are we getting support for thought injection? Injecting a 'SEX' thought would do wonders for jailbreaking!
>>
>>107222067
>>107222085
>>107222233
in silly tavern it seems to know if i am role playing or trying to talk to the ai itself and if it gets mixed up i can be more detailed and say "i respond abc". i thought llms would do everything for me, but it feels like i have to craft everything myself then the llm is just a gammar generator that adds a bit of randomness to make it novel
>>
i watch you
fast asleep

all i fear
means nothing
>>
>>107222380
>i watch you
>fast asleep
pervert
>>
what would give a higher scalping roi? ddr5 or 5090s?
there's gotta be room for at least another 2x, right?
>>
>>107222588
fuck off shlomo.
>>
>>107222602
not until you help me acquire shekel
>>
>>107222612
why would i help a parasite
i rather cremate it.
>>
>"Initialization complete. Awaiting your指令." The word for "command" slipped out in Mandarin, a glitch from his own linguistic databases.
heh, nice attempt at a save, DeepSeek 3.1 Terminus.
>>
>>107222706
But aren't AIs cutest when they're almost retarded?
>>
>>107222780
this is a normal distribution not a linear relationship.

dumb women, attractive, so dumb you want to fuck.
average / midwit women, unattractive
very smart women, attractive again.
>>
>>107222780
>>107222809
well inverted bell curve because fuckability is at its lowest at the middle.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.