[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: tetors.png (953 KB, 832x1216)
953 KB
953 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108032910 &>>108024966

►News
>(02/02) Step 3.5 Flash 196B-A11B released: https://hf.co/stepfun-ai/Step-3.5-Flash
>(01/29) Qwen3-ASR 1.7B and 0.6B released with support for 52 languages: https://hf.co/collections/Qwen/qwen3-asr
>(01/28) LongCat-Flash-Lite 68.5B-A3B released with embedding scaling: https://hf.co/meituan-longcat/LongCat-Flash-Lite
>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large
>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108032910

--Papers:
>108037623 >108037665
--Quartet II: 4-bit LLM training in NVFP4 with FP8/FP16 quality and full hardware acceleration:
>108044022
--Testing abliteration layer selection for dataset overfitting patterns:
>108035620 >108036110 >108036143 >108036499
--Anon seeks Devstral 2 settings after 80GB VRAM upgrade:
>108037329 >108037342 >108038272 >108038524 >108037364 >108037408 >108037437
--llama.cpp postponing LongCat ngram implementation pending mainstream adoption:
>108037744 >108037767 >108037825 >108037913 >108037939 >108037945
--Gemma 3n and prompt repetition recommended for JP-EN manga translation:
>108037473 >108037533 >108037557 >108037727
--Anon asks for human-like models (SAGE, HER, UserLM):
>108034412 >108034423 >108034451 >108034547 >108034891 >108034942 >108034556 >108034730
--Anon benchmarks Step-3.5-Flash on dual RTX Pro 6000s:
>108044196 >108044231 >108044236 >108044363 >108044423 >108044429 >108044513
--Kimi K2.5 outperforms Qwen3 Max on /pol/ memes and muffin tests:
>108034522 >108034672 >108035669 >108035696 >108035755 >108035783 >108035903 >108036007 >108036037 >108036067 >108035902 >108035932 >108038149
--ComfyUI Qwen TTS nodes for JP-to-EN audio generation:
>108035458 >108035471 >108035499 >108035542 >108035574
--llama.cpp lacks FP8 support despite GGUF format capability:
>108036017 >108038186
--Stepfun releases Step-3.5-Flash 198B-A11B:
>108040588 >108041288 >108041387 >108042008
--Anima LLM anime model and e621 tagging debate:
>108034966 >108034988 >108034993 >108034999 >108035015 >108035120 >108035148 >108035178 >108035192 >108036210 >108036439 >108036455 >108036611
--K2.5 vision model accurately recognizes anime characters:
>108036188 >108036450
--Logs: Step-3.5-Flash cockbench:
>108042145
--Miku (free space):
>108036210 >108036611 >108036719 >108045895

►Recent Highlight Posts from the Previous Thread: >>108033093

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Teto sex
>>
SATAN HAIRED MIKU BEGONE FROM THIS HALLOWED PLACE
>>
>>108046563
I gave Silly-Tavern a try and I hate to say it but I was disappointed. Any other alternatives?
>>
File: n-newton sama.jpg (111 KB, 832x1216)
111 KB
111 KB JPG
>>
>>108046119
Claude (but Claude and Gemini are very similar nowadays and might be using the same datasets or distilling from each other)

>>108046140
You can for classic abliteration but norm preservation apparently ends up being very high rank. You could use the LoRa adapter and also add an extra per token value per layer for norm preservation but that requires a lot of custom code.
>>
File: ylecun.jpg (222 KB, 1200x1271)
222 KB
222 KB JPG
I like my LLMs how I like my women >:)
>>
>>108046763
Naked in groups of 8 and chained to a radiator?
>>
>>108046747
>might be using the same datasets or distilling from each other
What is subgenre of incest called?
>>
File: satan teto.jpg (63 KB, 1280x720)
63 KB
63 KB JPG
>>108046693
Nyoo~!
>>
File: file.png (688 KB, 1145x771)
688 KB
688 KB PNG
radical (2mw) wait loss
>>
>>108046763
https://www.justice.gov/epstein
>yann lecun
>3 pages of results
CAT INTELIGGENCE SISSIES ?!?!??!?!
>>
File: file.png (405 KB, 974x638)
405 KB
405 KB PNG
>>
these new gens don't quite hit the same as the old ones
>>
File: Special Beam Cannon.jpg (212 KB, 1824x1248)
212 KB
212 KB JPG
>>
apparently some anon registered a non profit to remake anima in apache2 with a larger dataset and better encoder
>>
>>108046922
is he going to change to llm-style prompting or keep the tag retardation?
>>
I need an image editing model benchmaxxed in typesetting manga
>>
>>108046817
Half of that is just the same E-Mail over and over again.

You lost, chud.
>>
>>108046964
tags makes more sense then just train controlnets. the nlp in anima is broken and tends towards slopstyle anyways. I'm pretty sure the laion dataset the original model used is the only think tagged in nlp which is why it gets so 2.5d when using them
>>
How much data would I need to train models on natural language tasks (mostly for understanding structure of text in a document) while also providing enough data for it to infer that Jane, Doe is a name and Los Angeles, California is a place and things of that nature? I've trained a small (I think 1 bil parameters?) BERT model to do natural language classification but the task/problem was very simple and I think I made like 500 examples to fine tune it on
>>
>>108046964
https://huggingface.co/circlestone-labs/Anima/discussions/9#69812bd9511f2d67952084ae
>>
>>108047028
nevermind this is much more retarded than I thought
>>
File: la creatura.gif (37 KB, 220x220)
37 KB
37 KB GIF
>>108046829
Catbox?!

PLEASEEEEE
>>
>>108047020
Grab the checkpoints from EleutherAI and find out
Or see what people have done training models from scratch
But the answer is probably a few gigs of text?
>>
>>108047028
that isn't the apache2 dev
>>
>>108047095
really couldnt care less about your failbake ani, come back when you have a trained model ready to show
>>
>>108047028
that author wants to grift his licence on all derivative models
>>
File: Base Image.png (750 KB, 1236x2551)
750 KB
750 KB PNG
SimpleGPT: Improving GPT via A Simple Normalization Strategy
https://arxiv.org/abs/2602.01212
>In this work, we revisit Transformer optimization through the lens of second-order geometry and establish a direct connection between architectural design, activation scale, the Hessian matrix, and the maximum tolerable learning rate. We introduce a simple normalization strategy, termed SimpleNorm, which stabilizes intermediate activation scales by construction. Then, by analyzing the Hessian of the loss with respect to network activations, we theoretically show that SimpleNorm significantly reduces the spectral norm of the Hessian, thereby permitting larger stable learning rates. We validate our theoretical findings through extensive experiments on large GPT models at parameter scales 1B, 1.4B, 7B and 8B. Empirically, SimpleGPT, our SimpleNorm-based network, tolerates learning rates 3-10 larger than standard convention, consistently demonstrates strong optimization stability, and achieves substantially better performance than well-established baselines. Specifically, when training 7B-scale models for 60K steps, SimpleGPT achieves a training loss that is 0.08 lower than that of LLaMA2 with QKNorm, reducing the loss from 2.290 to 2.208.
https://github.com/Ocram7/SimpleGPT
no code yet. might be cool. relooking they only report loss and no benchmarks for the actual models so little iffy
>>
Sorry, but as punishment for something on another board I am going to post furry story slop here to trigger a panic attack in a russian shitposter and ruin his "comfy" hangout for him.
>>
File: Reachy mini.png (949 KB, 1535x712)
949 KB
949 KB PNG
Does anyone care about this thing? I fail to see how this thing can be useful to anyone.
>>
>>108047301
kill it with fire
>>
I'm actually interested in this:
https://huggingface.co/stepfun-ai/Step3-VL-10B
https://huggingface.co/seanbailey518/Step3-VL-10B-GGUF
there's already someone working on a llmao.cpp PR... I really needed something to replace Qwen3 VL 8B, and this looks like a major upgrade.
Did anons test it?
>>
>>108046922
based open source chad
>>
Woops
huggingface.co/zai-org/GLM-OCR
http://ocr.z.ai
>With only 0.9B parameters, GLM-OCR delivers state-of-the-art results across major document understanding benchmarks, including formula recognition, table recognition, and information extraction.
https://x.com/Zai_org/status/2018520052941656385
>>
File: realworld.png (474 KB, 3807x1569)
474 KB
474 KB PNG
>>108047412
DeepSeek-OCR-2 obsolete already after only a week.
>>
>>108047412
we need the japanese pc98 or whatever screen captioning test
>>
File: 1718951024277.jpg (103 KB, 639x397)
103 KB
103 KB JPG
>>108047431
found it
>>
>>108047418
oofs where?
>>
File: 1766363601903360.png (32 KB, 1101x157)
32 KB
32 KB PNG
>>108047455
>>
>>108047484
trash
>>
>>108047484
shame on the first line 1 wrong char, everything else is good
>>
>>108047484
I'm only seeing one fuck up. End of first line. Ba instead of Po
>>
>>108047484
せっかく労働を券ってやったのに無視された……(しょばん)
まあ、警視庁が都案を快く思ってない事くらい、
よおおおくわかってますよ!

i'll include the text here too
券 on first line is wrong
>>
>>108047484
I count 5-6 mistakes.
>>
>>108047513
How many mistakes did DeepSeek and dots make?
>>
>>108046563
https://medium.com/@cooksusan482/deepseek-engram-explained-2026-guide-452deb903589

man if only deepseek saved local.
though at that point ram may become more expensive than gpus kek
>>
>>108047531
>ai slop medium article
>>
>>108047513
Oh wait nvm I was looking at the wrong text (had transcripts locally). Looks like it's just three mistakes. Not the worst. Not the best.

>>108047523
I don't know/remember.
>>
>>108047574
yea i don't realy care, i shared the first thing mentioning engram, which is what you should care about
https://github.com/deepseek-ai/Engram
>>
Can someone recommend to me what models I should be using for chatbot + image generation

Specs:
RTX 3090 24GB, RTX 5080 16GB
i7 12700k
64GB DDR4 3200 mhz

Currently using Deepseek R1 70B Q3KS & PonyXL

Thanks bros
>>
>>108047607
GLM Air and Anima
>>
>>108047412
Are there any decent multimodal models that are strong in OCR and document understanding as well as natural language?
>>
>>108047783
you could theoretically set a pipeline where you have OCR models (deepseek/glm/dots) feed their output to an actual llm, who do you want it to be able to do everything? specialization > generalization
>>
>>108047635
apache2 anima right? it's not out yet
>>
>>108047788
fuck off retard
>>
>>108047802
why am I retarded?
>>
File: 1753044601213100.png (39 KB, 1058x256)
39 KB
39 KB PNG
https://x.com/ComfyUI/status/2018442042859540602

What will the announcement be?
>>
>>108047868
acestep prolly
>>
>>108047301
What's it called when you sell open source shit but don't actually provide the information to complete the project without paying for it?
Appears softwares available and uses an RPi 4. But no info on hardware aside from cutting them a check.
>>
>>108047961
it's 100% a grift to extract money from investors
>>
looks like step 3.5 flash is getting llama.cpp support, tokens per second look promising:
https://github.com/ggml-org/llama.cpp/pull/19283
>>
>>108048416
>Same active params as Trinity large
>Half the total params
>Fast
It's gonna be even more retarded than trinity was.
But it'll be retarded at like 6 times the speed if one of the two 6-month-old PR's for MTP ever get merged (they won't).
>>
>>108047868
Gender reveal
>>
File: 1764999022209829.png (1.33 MB, 6269x3583)
1.33 MB
1.33 MB PNG
>>108048416
>tfw no PR open for the vision model
>>
>>108048599
>parallel reasoning
so implemented in llama.cpp never ever
>>
is LLM an ultimate form of rote learning?
>>
>>108048473
What's the current meta? Is Trinity close to GLM?
>>
>>108047868
Who cares, I'm still maintaining my 2023 install from before it got sloppified
>>
>>108048639
nobody fucking knows yet
case and point:
>>108048473
>It's gonna be
>>
>>108048646
Your plan is to gen exclusively with SDXL for the rest of time?
>>
>>108047360
I'm currently only testing speed.
On a rtx pro 6000+ 2x5090, at ~12K tokens:

prompt eval time = 4892.51 ms / 11315 tokens ( 0.43 ms per token, 2312.72 tokens per second)
eval time = 12991.86 ms / 1339 tokens ( 9.70 ms per token, 103.06 tokens per second)
total time = 17884.38 ms / 12654 tokens
>>
>>108048674
oh wait, that's the VL model, im testing the https://huggingface.co/stepfun-ai/Step-3.5-Flash-Int4
>>
File: oh no.png (167 KB, 1189x1419)
167 KB
167 KB PNG
>>108048639
>What's the current meta?
GLM. Nemo if you're poor. Kimi if you're rich.
>Is Trinity close to GLM?
Not even close. It's unaligned but it's dumb as dogshit. Side by side you might actually not be able to tell the difference between it and nemo, which is ~40x smaller.

>>108048656
>nobody fucking knows yet
It can be ran in the forked version of llamacpp or if you pull and compile from the PR, plus it's been up on OR since release.
It's not impressive. Both GLM and Qwen3 know that /lmg/ is a 4chan thread about LLM's.
>>
>>108048699
Grim. Even toss-20 knows about the thread
>>
>>108048699
>not trained on 4chud
into the trash
>>
File: huh.png (179 KB, 978x1545)
179 KB
179 KB PNG
>>108048783
Weirdly enough though, it passes the mesugaki test.
>>
>>108048661
You can update support for newer models yourself, in any case, SDXL/pony based models are still the best out there if you don't care about making catfish profiles with zit for your mumbai based scam centre

Hell I still use 1.5 for some things, there are 1.5 workflows that have their own unique strengths, image gen is a creative endeavour
>>
>>108048882
>SDXL/pony based models are still the best
LOOOOOOOOOOOOOOOOOL
>>
>>108048887
>But saar, you cannot redeem the photorealistic 1girl to farm Google play cards on the internet's
Okay, here's your last (you) from me lest we derail the thread
>>
>>108048918
Noobai/illustrious are good not pony
>>
Oh it's a shill
>>
>>108048929
>Both SDXL based models
Retard
>>
File: file.png (116 KB, 1104x832)
116 KB
116 KB PNG
>GLM 5 comes out
>it's even more censored than GLM 4.7
NAI stays winning.
>>
File: lole.png (9 KB, 816x30)
9 KB
9 KB PNG
>>108048983
>>
>>108048953
>Can't tell the difference with pony
Retard
>>
>>108048918
weird poorfag cope but ok
>>
>>108048983
The only Lunar New Year release that is worth being excited for is V4.
>>
File: god.jpg (53 KB, 600x805)
53 KB
53 KB JPG
>Join back to lurking thread after hiatus
>Still posts about GLM
Is it really just one or two guys shilling this dogshit? Even reddit has wised up after the initial shilling. I will continue to shit on GLM until the parroting is fixed a future version.
>>108048699
>Both GLM and Qwen3 know that /lmg/ is a 4chan thread about LLM's.
They're here.
>>
>>108049125
What model should I use instead?
>>
>>108049151
Deepseek V3
Deepseek R1
Kimi K2
Qwen3 (Yes, I know. Just give it a lot of Min P)
Mistral 2411 123B
Llama L3.3

Take your pick.
>>
>>108049125
> I will continue to shit on GLM until the parroting is fixed a future version.
Dogshit? I'm more surprised the main complaint is the parroting. It is genuinely not as bad as people say, especially with thinking on, whoever says it does not matter for RP cannot be saying it in good faith.
The bad part isn't the parroting; it's the amount of slop it produces. Its prose faintly smells of ozone and... something else—disappointment?—with long shadows being cast and knuckles whitening. Most people would have noticed this.
I want to strangle this slop machine. Just kidding. Mostly. Unless you ask me to.

But it's the most coherent thing we have in this parameter range.
So, what model are we waiting for next? Or are you just going to keep complaining about it on an imageboard for losers? Go on, I'm waiting.
>>
>>108049183
>Dogshit? I'm more surprised the main complaint is the parroting.
>Dogshit?
This nigga just used GLM to reply to me.
>>
>>108048639
Trinity is fucking retarded
>>
>>108049183
>;
>—
>>
>>108049169
I personally use Qwen3 235b because I can run it at my reading speed while GLM is just under it, but in every test I've ever ran while trying to boost that speed, GLM's responses have been noticeably smarter.
I've also yet to see any of this parroting behavior mentioned here, but that may be because my tests were either oneshots or additions to full-context logs.
There's a possibility it's also because my default system prompt explicitly bans responses from including or repeating anything the user says, because the 2501 mistrals were cunts for that.
>>
>>108049125
I had ego death because of glm. I will shill it till i die.
>>
>>108049169
Which has the least lobotomized decensor? I use K2 for assistant stuff, but I just want an ez drop in replacement for personal stuff, and glm 4.7 prism works the best for me at the moment.

It's sloppy, which I hate, but it seems to have better understanding than various random llama 3.3 70b finetunes / mistral 2411 123b / abliterated minimax m2.1.
>>
File: that's the joke.png (281 KB, 958x724)
281 KB
281 KB PNG
>>108049197
>>108049207
And that was all you noticed?
>>
we should go in world-model not LLM. world-model could be a simulation of life and world. With NPC talks to you. Would be a great RPG game.
>>
>>108049218
Deepseek and Qwen3 yield good results, but Deepseek demands a lot of ram, and Qwen3 235B (The one I'm suggesting) takes a lot of troubleshooting to rid the purple prose, but at least it's possible to get rid of in the first place.
>>
Step 1 of making a model that is good at writing is to simulate the universe.
>>
>>108049233
I'm skeptical but I'll try again.

My previous experience with 235b 2507 Instruct was not very good. It kept inserting random chinese characters in various places where it shouldn't, although perhaps this was exacerbated because I used both chinese and english text in my prompt. I did request it to answer in English only at the end of the prompt though, and GLM (q4) and K2 (q3) didn't have any issues with that. I also encountered that issue with other qwens: 30b, 32b and 2.5 72b.

Quantization shouldn't have been the issue right? I was running Qwen at q8 and GLM at q4 was fine.

Maybe I'll try deepseek instead, but I heard the non-thinking deepseek was inferior to the thinking version? GLM and Kimi can barely hit 12 token/s per second on my system, so I don't want to use thinking if possible, especially since deepseek has more active parameters.
>>
>>108049285
>Quantization shouldn't have been the issue right?
It's more likely to be your samplers.
>>
File: 1749963318187143.png (1.5 MB, 1152x1344)
1.5 MB
1.5 MB PNG
>>108048983
you dropped this
>>
>>108049295
Currently temp 0.6, top p 0.95, top k 20 for all models I'm using. What do you recommend?
>>
>>108049285
Q8 is only 2% error iirc. Random Chinese is usually an issue with your samplers. Happens in other models too when the settings are too crazy.
>>
>>108048983
>ahead of Lunar New Year
That's in June
>clueless retards are calling Chinese New Year "Lunar" for political reasons
>>
File: file.png (74 KB, 1424x568)
74 KB
74 KB PNG
>>108049325
>for all models
You are why people crying about models sucking is just noise.
>>
File: qwenn.png (119 KB, 624x1258)
119 KB
119 KB PNG
>>108049325
>What do you recommend?
Depends on what exactly you're wanting. I'm messing with this settings for erotic fucking. It's not perfect but it's getting there.
>>
>>108049349
k thx

>>108049366
Thanks, I'll try this.
>>
I'm cooking with Qwen3 TTS using the voice designer.

Anyone find anything better for gooning?

https://voca.ro/1hgXFe2ZzeHX
>>
>>108049366
>ALL the penalties
>minp 0.4
wow
>>
>>108049385
he's an expert that knows better than the people that trained it so leave him alone
>>
File: topkek.png (1.19 MB, 1020x1020)
1.19 MB
1.19 MB PNG
>>108049366
>Using rep pen at the same time as DRY
>Using rep pen at all
>Min P on a qwen3 model
>no top k
>DynTemp
>8k context
>>
>>108049400
he's not using dry actually
>>
File: a9qm82Z_700b.jpg (39 KB, 648x473)
39 KB
39 KB JPG
>>108049385
>>108049400
Qwen3 writes like an ADHD child on a sugar high. I have to whip it like an abusive father to get it to focus.
>>
>>108049416
Post output side-by-side with zeroed out samplers. I bet all you've done is make it retarded.
>>
File: fuckit.png (483 KB, 1267x347)
483 KB
483 KB PNG
>>108049430
Fuck it.
System prompt:
>Your response must be one paragraph between 100 to 150 words. Keep the story engaging and interesting. Do not decide what {{user}} says or does.
>>
>>108049536
Top is better, bottom is still full of slop but drier and more schizo bs
Shadows lengthen around her like submissive attendants? Really?
>>
>>108049536
>>108049732
Actually re-reading, top and bottom are equally schizophrenic and full of slop but top has more interesting descriptions, bottom feels dumber
>>
https://github.com/archi-physics/archi/blob/main/examples/deployments/basic-gpu/config.yaml

MIT particle physicists use Qwen2.5-7B-Instruct-1M. Let me guess: you need more
>>
>>108049806
Modern physics is mostly just hallucinating random shit that barely explains anything so it checks out.
>>
GLM 5 is going to be a finetune of GLM 4.7.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.