/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 02/02/26(Mon)23:15:11 No.108046563

File: tetors.png (953 KB, 832x1216)

/lmg/ - Local Models General Anonymous 02/02/26(Mon)23:15:11 No.108046563

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108032910 &>>108024966

►News
>(02/02) Step 3.5 Flash 196B-A11B released: https://hf.co/stepfun-ai/Step-3.5-Flash
>(01/29) Qwen3-ASR 1.7B and 0.6B released with support for 52 languages: https://hf.co/collections/Qwen/qwen3-asr
>(01/28) LongCat-Flash-Lite 68.5B-A3B released with embedding scaling: https://hf.co/meituan-longcat/LongCat-Flash-Lite
>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large
>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/02/26(Mon)23:15:34 No.108046567

Anonymous 02/02/26(Mon)23:15:34 No.108046567

File: ComfyUI_temp_jhsku_00164_.png (1.32 MB, 832x1216)

1.32 MB PNG

►Recent Highlights from the Previous Thread: >>108032910

--Papers:
>108037623 >108037665
--Quartet II: 4-bit LLM training in NVFP4 with FP8/FP16 quality and full hardware acceleration:
>108044022
--Testing abliteration layer selection for dataset overfitting patterns:
>108035620 >108036110 >108036143 >108036499
--Anon seeks Devstral 2 settings after 80GB VRAM upgrade:
>108037329 >108037342 >108038272 >108038524 >108037364 >108037408 >108037437
--llama.cpp postponing LongCat ngram implementation pending mainstream adoption:
>108037744 >108037767 >108037825 >108037913 >108037939 >108037945
--Gemma 3n and prompt repetition recommended for JP-EN manga translation:
>108037473 >108037533 >108037557 >108037727
--Anon asks for human-like models (SAGE, HER, UserLM):
>108034412 >108034423 >108034451 >108034547 >108034891 >108034942 >108034556 >108034730
--Anon benchmarks Step-3.5-Flash on dual RTX Pro 6000s:
>108044196 >108044231 >108044236 >108044363 >108044423 >108044429 >108044513
--Kimi K2.5 outperforms Qwen3 Max on /pol/ memes and muffin tests:
>108034522 >108034672 >108035669 >108035696 >108035755 >108035783 >108035903 >108036007 >108036037 >108036067 >108035902 >108035932 >108038149
--ComfyUI Qwen TTS nodes for JP-to-EN audio generation:
>108035458 >108035471 >108035499 >108035542 >108035574
--llama.cpp lacks FP8 support despite GGUF format capability:
>108036017 >108038186
--Stepfun releases Step-3.5-Flash 198B-A11B:
>108040588 >108041288 >108041387 >108042008
--Anima LLM anime model and e621 tagging debate:
>108034966 >108034988 >108034993 >108034999 >108035015 >108035120 >108035148 >108035178 >108035192 >108036210 >108036439 >108036455 >108036611
--K2.5 vision model accurately recognizes anime characters:
>108036188 >108036450
--Logs: Step-3.5-Flash cockbench:
>108042145
--Miku (free space):
>108036210 >108036611 >108036719 >108045895

►Recent Highlight Posts from the Previous Thread: >>108033093

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/02/26(Mon)23:23:26 No.108046618

Anonymous 02/02/26(Mon)23:23:26 No.108046618

Teto sex

Anonymous
02/02/26(Mon)23:42:02 No.108046693

Anonymous 02/02/26(Mon)23:42:02 No.108046693

SATAN HAIRED MIKU BEGONE FROM THIS HALLOWED PLACE

Anonymous
02/02/26(Mon)23:44:04 No.108046708

Anonymous 02/02/26(Mon)23:44:04 No.108046708

>>108046563
I gave Silly-Tavern a try and I hate to say it but I was disappointed. Any other alternatives?

Anonymous
02/02/26(Mon)23:49:31 No.108046735

Anonymous 02/02/26(Mon)23:49:31 No.108046735

File: n-newton sama.jpg (111 KB, 832x1216)

111 KB JPG

Anonymous
02/02/26(Mon)23:52:52 No.108046747

Anonymous 02/02/26(Mon)23:52:52 No.108046747

>>108046119
Claude (but Claude and Gemini are very similar nowadays and might be using the same datasets or distilling from each other)

>>108046140
You can for classic abliteration but norm preservation apparently ends up being very high rank. You could use the LoRa adapter and also add an extra per token value per layer for norm preservation but that requires a lot of custom code.

Anonymous
02/02/26(Mon)23:56:41 No.108046763

Anonymous 02/02/26(Mon)23:56:41 No.108046763

File: ylecun.jpg (222 KB, 1200x1271)

222 KB JPG

I like my LLMs how I like my women >:)

Anonymous
02/02/26(Mon)23:57:51 No.108046769

Anonymous 02/02/26(Mon)23:57:51 No.108046769

>>108046763
Naked in groups of 8 and chained to a radiator?

Anonymous
02/03/26(Tue)00:00:39 No.108046785

Anonymous 02/03/26(Tue)00:00:39 No.108046785

>>108046747
>might be using the same datasets or distilling from each other
What is subgenre of incest called?

Anonymous
02/03/26(Tue)00:02:39 No.108046796

Anonymous 02/03/26(Tue)00:02:39 No.108046796

File: satan teto.jpg (63 KB, 1280x720)

63 KB JPG

>>108046693
Nyoo~!

Anonymous
02/03/26(Tue)00:05:44 No.108046814

Anonymous 02/03/26(Tue)00:05:44 No.108046814

File: file.png (688 KB, 1145x771)

688 KB PNG

radical (2mw) wait loss

Anonymous
02/03/26(Tue)00:06:24 No.108046817

Anonymous 02/03/26(Tue)00:06:24 No.108046817

>>108046763
https://www.justice.gov/epstein
>yann lecun
>3 pages of results
CAT INTELIGGENCE SISSIES ?!?!??!?!

Anonymous
02/03/26(Tue)00:08:27 No.108046829

Anonymous 02/03/26(Tue)00:08:27 No.108046829

File: file.png (405 KB, 974x638)

405 KB PNG

Anonymous
02/03/26(Tue)00:16:26 No.108046862

Anonymous 02/03/26(Tue)00:16:26 No.108046862

these new gens don't quite hit the same as the old ones

Anonymous
02/03/26(Tue)00:24:59 No.108046909

Anonymous 02/03/26(Tue)00:24:59 No.108046909

File: Special Beam Cannon.jpg (212 KB, 1824x1248)

212 KB JPG

Anonymous
02/03/26(Tue)00:27:11 No.108046922

Anonymous 02/03/26(Tue)00:27:11 No.108046922

apparently some anon registered a non profit to remake anima in apache2 with a larger dataset and better encoder

Anonymous
02/03/26(Tue)00:34:14 No.108046964

Anonymous 02/03/26(Tue)00:34:14 No.108046964

>>108046922
is he going to change to llm-style prompting or keep the tag retardation?

Anonymous
02/03/26(Tue)00:38:51 No.108046999

Anonymous 02/03/26(Tue)00:38:51 No.108046999

I need an image editing model benchmaxxed in typesetting manga

Anonymous
02/03/26(Tue)00:40:39 No.108047015

Anonymous 02/03/26(Tue)00:40:39 No.108047015

>>108046817
Half of that is just the same E-Mail over and over again.

You lost, chud.

Anonymous
02/03/26(Tue)00:40:48 No.108047016

Anonymous 02/03/26(Tue)00:40:48 No.108047016

>>108046964
tags makes more sense then just train controlnets. the nlp in anima is broken and tends towards slopstyle anyways. I'm pretty sure the laion dataset the original model used is the only think tagged in nlp which is why it gets so 2.5d when using them

Anonymous
02/03/26(Tue)00:41:28 No.108047020

Anonymous 02/03/26(Tue)00:41:28 No.108047020

How much data would I need to train models on natural language tasks (mostly for understanding structure of text in a document) while also providing enough data for it to infer that Jane, Doe is a name and Los Angeles, California is a place and things of that nature? I've trained a small (I think 1 bil parameters?) BERT model to do natural language classification but the task/problem was very simple and I think I made like 500 examples to fine tune it on

Anonymous
02/03/26(Tue)00:42:28 No.108047028

Anonymous 02/03/26(Tue)00:42:28 No.108047028

>>108046964
https://huggingface.co/circlestone-labs/Anima/discussions/9#69812bd9511f2d67952084ae

Anonymous
02/03/26(Tue)00:44:24 No.108047034

Anonymous 02/03/26(Tue)00:44:24 No.108047034

>>108047028
nevermind this is much more retarded than I thought

Anonymous
02/03/26(Tue)00:44:27 No.108047035

Anonymous 02/03/26(Tue)00:44:27 No.108047035

File: la creatura.gif (37 KB, 220x220)

37 KB GIF

>>108046829
Catbox?!

PLEASEEEEE

Anonymous
02/03/26(Tue)00:45:41 No.108047041

Anonymous 02/03/26(Tue)00:45:41 No.108047041

>>108047020
Grab the checkpoints from EleutherAI and find out
Or see what people have done training models from scratch
But the answer is probably a few gigs of text?

Anonymous
02/03/26(Tue)00:54:31 No.108047095

Anonymous 02/03/26(Tue)00:54:31 No.108047095

>>108047028
that isn't the apache2 dev

Anonymous
02/03/26(Tue)01:04:30 No.108047155

Anonymous 02/03/26(Tue)01:04:30 No.108047155

>>108047095
really couldnt care less about your failbake ani, come back when you have a trained model ready to show

Anonymous
02/03/26(Tue)01:05:02 No.108047157

Anonymous 02/03/26(Tue)01:05:02 No.108047157

>>108047028
that author wants to grift his licence on all derivative models

Anonymous
02/03/26(Tue)01:20:02 No.108047217

Anonymous 02/03/26(Tue)01:20:02 No.108047217

File: Base Image.png (750 KB, 1236x2551)

750 KB PNG

SimpleGPT: Improving GPT via A Simple Normalization Strategy
https://arxiv.org/abs/2602.01212
>In this work, we revisit Transformer optimization through the lens of second-order geometry and establish a direct connection between architectural design, activation scale, the Hessian matrix, and the maximum tolerable learning rate. We introduce a simple normalization strategy, termed SimpleNorm, which stabilizes intermediate activation scales by construction. Then, by analyzing the Hessian of the loss with respect to network activations, we theoretically show that SimpleNorm significantly reduces the spectral norm of the Hessian, thereby permitting larger stable learning rates. We validate our theoretical findings through extensive experiments on large GPT models at parameter scales 1B, 1.4B, 7B and 8B. Empirically, SimpleGPT, our SimpleNorm-based network, tolerates learning rates 3-10 larger than standard convention, consistently demonstrates strong optimization stability, and achieves substantially better performance than well-established baselines. Specifically, when training 7B-scale models for 60K steps, SimpleGPT achieves a training loss that is 0.08 lower than that of LLaMA2 with QKNorm, reducing the loss from 2.290 to 2.208.
https://github.com/Ocram7/SimpleGPT
no code yet. might be cool. relooking they only report loss and no benchmarks for the actual models so little iffy

Anonymous
02/03/26(Tue)01:31:27 No.108047272

Anonymous 02/03/26(Tue)01:31:27 No.108047272

Sorry, but as punishment for something on another board I am going to post furry story slop here to trigger a panic attack in a russian shitposter and ruin his "comfy" hangout for him.

Anonymous
02/03/26(Tue)01:38:35 No.108047301

Anonymous 02/03/26(Tue)01:38:35 No.108047301

File: Reachy mini.png (949 KB, 1535x712)

949 KB PNG

Does anyone care about this thing? I fail to see how this thing can be useful to anyone.

Anonymous
02/03/26(Tue)01:48:48 No.108047357

Anonymous 02/03/26(Tue)01:48:48 No.108047357

>>108047301
kill it with fire

Anonymous
02/03/26(Tue)01:49:30 No.108047360

Anonymous 02/03/26(Tue)01:49:30 No.108047360

I'm actually interested in this:
https://huggingface.co/stepfun-ai/Step3-VL-10B
https://huggingface.co/seanbailey518/Step3-VL-10B-GGUF
there's already someone working on a llmao.cpp PR... I really needed something to replace Qwen3 VL 8B, and this looks like a major upgrade.
Did anons test it?

Anonymous
02/03/26(Tue)01:59:47 No.108047401

Anonymous 02/03/26(Tue)01:59:47 No.108047401

>>108046922
based open source chad

Anonymous
02/03/26(Tue)02:01:22 No.108047412

Anonymous 02/03/26(Tue)02:01:22 No.108047412

Woops
huggingface.co/zai-org/GLM-OCR
http://ocr.z.ai
>With only 0.9B parameters, GLM-OCR delivers state-of-the-art results across major document understanding benchmarks, including formula recognition, table recognition, and information extraction.
https://x.com/Zai_org/status/2018520052941656385

Anonymous
02/03/26(Tue)02:02:57 No.108047418

Anonymous 02/03/26(Tue)02:02:57 No.108047418

File: realworld.png (474 KB, 3807x1569)

474 KB PNG

>>108047412
DeepSeek-OCR-2 obsolete already after only a week.

Anonymous
02/03/26(Tue)02:05:47 No.108047431

Anonymous 02/03/26(Tue)02:05:47 No.108047431

>>108047412
we need the japanese pc98 or whatever screen captioning test

Anonymous
02/03/26(Tue)02:12:06 No.108047455

Anonymous 02/03/26(Tue)02:12:06 No.108047455

File: 1718951024277.jpg (103 KB, 639x397)

103 KB JPG

>>108047431
found it

Anonymous
02/03/26(Tue)02:13:32 No.108047462

Anonymous 02/03/26(Tue)02:13:32 No.108047462

>>108047418
oofs where?

Anonymous
02/03/26(Tue)02:19:13 No.108047484

Anonymous 02/03/26(Tue)02:19:13 No.108047484

File: 1766363601903360.png (32 KB, 1101x157)

32 KB PNG

>>108047455

Anonymous
02/03/26(Tue)02:21:07 No.108047495

Anonymous 02/03/26(Tue)02:21:07 No.108047495

>>108047484
trash

Anonymous
02/03/26(Tue)02:21:11 No.108047496

Anonymous 02/03/26(Tue)02:21:11 No.108047496

>>108047484
shame on the first line 1 wrong char, everything else is good

Anonymous
02/03/26(Tue)02:22:02 No.108047499

Anonymous 02/03/26(Tue)02:22:02 No.108047499

>>108047484
I'm only seeing one fuck up. End of first line. Ba instead of Po

Anonymous
02/03/26(Tue)02:22:49 No.108047502

Anonymous 02/03/26(Tue)02:22:49 No.108047502

>>108047484
せっかく労働を券ってやったのに無視された……(しょばん)
まあ、警視庁が都案を快く思ってない事くらい、
よおおおくわかってますよ!

i'll include the text here too
券 on first line is wrong

Anonymous
02/03/26(Tue)02:25:00 No.108047513

Anonymous 02/03/26(Tue)02:25:00 No.108047513

>>108047484
I count 5-6 mistakes.

Anonymous
02/03/26(Tue)02:27:18 No.108047523

Anonymous 02/03/26(Tue)02:27:18 No.108047523

>>108047513
How many mistakes did DeepSeek and dots make?

Anonymous
02/03/26(Tue)02:30:45 No.108047531

Anonymous 02/03/26(Tue)02:30:45 No.108047531

>>108046563
https://medium.com/@cooksusan482/deepseek-engram-explained-2026-guide-452deb903589

man if only deepseek saved local.
though at that point ram may become more expensive than gpus kek

Anonymous
02/03/26(Tue)02:43:15 No.108047574

Anonymous 02/03/26(Tue)02:43:15 No.108047574

>>108047531
>ai slop medium article

Anonymous
02/03/26(Tue)02:43:42 No.108047576

Anonymous 02/03/26(Tue)02:43:42 No.108047576

>>108047513
Oh wait nvm I was looking at the wrong text (had transcripts locally). Looks like it's just three mistakes. Not the worst. Not the best.

>>108047523
I don't know/remember.

Anonymous
02/03/26(Tue)02:48:55 No.108047598

Anonymous 02/03/26(Tue)02:48:55 No.108047598

>>108047574
yea i don't realy care, i shared the first thing mentioning engram, which is what you should care about
https://github.com/deepseek-ai/Engram

Anonymous
02/03/26(Tue)02:51:25 No.108047607

Anonymous 02/03/26(Tue)02:51:25 No.108047607

Can someone recommend to me what models I should be using for chatbot + image generation

Specs:
RTX 3090 24GB, RTX 5080 16GB
i7 12700k
64GB DDR4 3200 mhz

Currently using Deepseek R1 70B Q3KS & PonyXL

Thanks bros

Anonymous
02/03/26(Tue)02:57:00 No.108047635

Anonymous 02/03/26(Tue)02:57:00 No.108047635

>>108047607
GLM Air and Anima

Anonymous
02/03/26(Tue)03:26:30 No.108047783

Anonymous 02/03/26(Tue)03:26:30 No.108047783

>>108047412
Are there any decent multimodal models that are strong in OCR and document understanding as well as natural language?

Anonymous
02/03/26(Tue)03:28:00 No.108047785

Anonymous 02/03/26(Tue)03:28:00 No.108047785

>>108047783
you could theoretically set a pipeline where you have OCR models (deepseek/glm/dots) feed their output to an actual llm, who do you want it to be able to do everything? specialization > generalization

Anonymous
02/03/26(Tue)03:28:45 No.108047788

Anonymous 02/03/26(Tue)03:28:45 No.108047788

>>108047635
apache2 anima right? it's not out yet

Anonymous
02/03/26(Tue)03:31:37 No.108047802

Anonymous 02/03/26(Tue)03:31:37 No.108047802

>>108047788
fuck off retard

Anonymous
02/03/26(Tue)03:34:42 No.108047819

Anonymous 02/03/26(Tue)03:34:42 No.108047819

>>108047802
why am I retarded?

Anonymous
02/03/26(Tue)03:44:52 No.108047868

Anonymous 02/03/26(Tue)03:44:52 No.108047868

File: 1753044601213100.png (39 KB, 1058x256)

39 KB PNG

https://x.com/ComfyUI/status/2018442042859540602

What will the announcement be?

Anonymous
02/03/26(Tue)04:07:05 No.108047951

Anonymous 02/03/26(Tue)04:07:05 No.108047951

>>108047868
acestep prolly

Anonymous
02/03/26(Tue)04:08:32 No.108047961

Anonymous 02/03/26(Tue)04:08:32 No.108047961

File: file_000000007b1c61f9804a(...).png (2.65 MB, 1536x1024)

2.65 MB PNG

>>108047301
What's it called when you sell open source shit but don't actually provide the information to complete the project without paying for it?
Appears softwares available and uses an RPi 4. But no info on hardware aside from cutting them a check.

Anonymous
02/03/26(Tue)04:12:57 No.108047983

Anonymous 02/03/26(Tue)04:12:57 No.108047983

>>108047961
it's 100% a grift to extract money from investors

Anonymous
02/03/26(Tue)05:49:38 No.108048416

Anonymous 02/03/26(Tue)05:49:38 No.108048416

looks like step 3.5 flash is getting llama.cpp support, tokens per second look promising:
https://github.com/ggml-org/llama.cpp/pull/19283

Anonymous
02/03/26(Tue)06:01:20 No.108048473

Anonymous 02/03/26(Tue)06:01:20 No.108048473

>>108048416
>Same active params as Trinity large
>Half the total params
>Fast
It's gonna be even more retarded than trinity was.
But it'll be retarded at like 6 times the speed if one of the two 6-month-old PR's for MTP ever get merged (they won't).

Anonymous
02/03/26(Tue)06:05:56 No.108048497

Anonymous 02/03/26(Tue)06:05:56 No.108048497

>>108047868
Gender reveal

Anonymous
02/03/26(Tue)06:24:22 No.108048599

Anonymous 02/03/26(Tue)06:24:22 No.108048599

File: 1764999022209829.png (1.33 MB, 6269x3583)

1.33 MB PNG

>>108048416
>tfw no PR open for the vision model

Anonymous
02/03/26(Tue)06:29:08 No.108048625

Anonymous 02/03/26(Tue)06:29:08 No.108048625

>>108048599
>parallel reasoning
so implemented in llama.cpp never ever

Anonymous
02/03/26(Tue)06:30:02 No.108048631

Anonymous 02/03/26(Tue)06:30:02 No.108048631

is LLM an ultimate form of rote learning?

Anonymous
02/03/26(Tue)06:32:24 No.108048639

Anonymous 02/03/26(Tue)06:32:24 No.108048639

>>108048473
What's the current meta? Is Trinity close to GLM?

Anonymous
02/03/26(Tue)06:34:28 No.108048646

Anonymous 02/03/26(Tue)06:34:28 No.108048646

>>108047868
Who cares, I'm still maintaining my 2023 install from before it got sloppified

Anonymous
02/03/26(Tue)06:35:46 No.108048656

Anonymous 02/03/26(Tue)06:35:46 No.108048656

>>108048639
nobody fucking knows yet
case and point:
>>108048473
>It's gonna be

Anonymous
02/03/26(Tue)06:36:24 No.108048661

Anonymous 02/03/26(Tue)06:36:24 No.108048661

>>108048646
Your plan is to gen exclusively with SDXL for the rest of time?

Anonymous
02/03/26(Tue)06:38:08 No.108048674

Anonymous 02/03/26(Tue)06:38:08 No.108048674

>>108047360
I'm currently only testing speed.
On a rtx pro 6000+ 2x5090, at ~12K tokens:

prompt eval time = 4892.51 ms / 11315 tokens ( 0.43 ms per token, 2312.72 tokens per second)
eval time = 12991.86 ms / 1339 tokens ( 9.70 ms per token, 103.06 tokens per second)
total time = 17884.38 ms / 12654 tokens

Anonymous
02/03/26(Tue)06:39:12 No.108048680

Anonymous 02/03/26(Tue)06:39:12 No.108048680

>>108048674
oh wait, that's the VL model, im testing the https://huggingface.co/stepfun-ai/Step-3.5-Flash-Int4

Anonymous
02/03/26(Tue)06:43:14 No.108048699

Anonymous 02/03/26(Tue)06:43:14 No.108048699

File: oh no.png (167 KB, 1189x1419)

167 KB PNG

>>108048639
>What's the current meta?
GLM. Nemo if you're poor. Kimi if you're rich.
>Is Trinity close to GLM?
Not even close. It's unaligned but it's dumb as dogshit. Side by side you might actually not be able to tell the difference between it and nemo, which is ~40x smaller.

>>108048656
>nobody fucking knows yet
It can be ran in the forked version of llamacpp or if you pull and compile from the PR, plus it's been up on OR since release.
It's not impressive. Both GLM and Qwen3 know that /lmg/ is a 4chan thread about LLM's.

Anonymous
02/03/26(Tue)06:47:01 No.108048721

Anonymous 02/03/26(Tue)06:47:01 No.108048721

>>108048699
Grim. Even toss-20 knows about the thread

Anonymous
02/03/26(Tue)07:02:02 No.108048783

Anonymous 02/03/26(Tue)07:02:02 No.108048783

>>108048699
>not trained on 4chud
into the trash

Anonymous
02/03/26(Tue)07:10:10 No.108048819

Anonymous 02/03/26(Tue)07:10:10 No.108048819

File: huh.png (179 KB, 978x1545)

179 KB PNG

>>108048783
Weirdly enough though, it passes the mesugaki test.

Anonymous
02/03/26(Tue)07:25:31 No.108048882

Anonymous 02/03/26(Tue)07:25:31 No.108048882

>>108048661
You can update support for newer models yourself, in any case, SDXL/pony based models are still the best out there if you don't care about making catfish profiles with zit for your mumbai based scam centre

Hell I still use 1.5 for some things, there are 1.5 workflows that have their own unique strengths, image gen is a creative endeavour

Anonymous
02/03/26(Tue)07:26:36 No.108048887

Anonymous 02/03/26(Tue)07:26:36 No.108048887

>>108048882
>SDXL/pony based models are still the best
LOOOOOOOOOOOOOOOOOL

Anonymous
02/03/26(Tue)07:30:44 No.108048918

Anonymous 02/03/26(Tue)07:30:44 No.108048918

>>108048887
>But saar, you cannot redeem the photorealistic 1girl to farm Google play cards on the internet's
Okay, here's your last (you) from me lest we derail the thread

Anonymous
02/03/26(Tue)07:31:47 No.108048929

Anonymous 02/03/26(Tue)07:31:47 No.108048929

>>108048918
Noobai/illustrious are good not pony

Anonymous
02/03/26(Tue)07:35:22 No.108048943

Anonymous 02/03/26(Tue)07:35:22 No.108048943

Oh it's a shill

Anonymous
02/03/26(Tue)07:36:55 No.108048953

Anonymous 02/03/26(Tue)07:36:55 No.108048953

>>108048929
>Both SDXL based models
Retard

Anonymous
02/03/26(Tue)07:42:07 No.108048983

Anonymous 02/03/26(Tue)07:42:07 No.108048983

File: file.png (116 KB, 1104x832)

116 KB PNG

>GLM 5 comes out
>it's even more censored than GLM 4.7
NAI stays winning.

Anonymous
02/03/26(Tue)07:44:30 No.108049000

Anonymous 02/03/26(Tue)07:44:30 No.108049000

File: lole.png (9 KB, 816x30)

9 KB PNG

>>108048983

Anonymous
02/03/26(Tue)07:50:31 No.108049038

Anonymous 02/03/26(Tue)07:50:31 No.108049038

>>108048953
>Can't tell the difference with pony
Retard

Anonymous
02/03/26(Tue)07:53:44 No.108049053

Anonymous 02/03/26(Tue)07:53:44 No.108049053

>>108048918
weird poorfag cope but ok

Anonymous
02/03/26(Tue)07:57:48 No.108049079

Anonymous 02/03/26(Tue)07:57:48 No.108049079

>>108048983
The only Lunar New Year release that is worth being excited for is V4.

Anonymous
02/03/26(Tue)08:05:50 No.108049125

Anonymous 02/03/26(Tue)08:05:50 No.108049125

File: god.jpg (53 KB, 600x805)

53 KB JPG

>Join back to lurking thread after hiatus
>Still posts about GLM
Is it really just one or two guys shilling this dogshit? Even reddit has wised up after the initial shilling. I will continue to shit on GLM until the parroting is fixed a future version.
>>108048699
>Both GLM and Qwen3 know that /lmg/ is a 4chan thread about LLM's.
They're here.

Anonymous
02/03/26(Tue)08:09:26 No.108049151

Anonymous 02/03/26(Tue)08:09:26 No.108049151

>>108049125
What model should I use instead?

Anonymous
02/03/26(Tue)08:13:32 No.108049169

Anonymous 02/03/26(Tue)08:13:32 No.108049169

>>108049151
Deepseek V3
Deepseek R1
Kimi K2
Qwen3 (Yes, I know. Just give it a lot of Min P)
Mistral 2411 123B
Llama L3.3

Take your pick.

Anonymous
02/03/26(Tue)08:16:48 No.108049183

Anonymous 02/03/26(Tue)08:16:48 No.108049183

>>108049125
> I will continue to shit on GLM until the parroting is fixed a future version.
Dogshit? I'm more surprised the main complaint is the parroting. It is genuinely not as bad as people say, especially with thinking on, whoever says it does not matter for RP cannot be saying it in good faith.
The bad part isn't the parroting; it's the amount of slop it produces. Its prose faintly smells of ozone and... something else—disappointment?—with long shadows being cast and knuckles whitening. Most people would have noticed this.
I want to strangle this slop machine. Just kidding. Mostly. Unless you ask me to.

But it's the most coherent thing we have in this parameter range.
So, what model are we waiting for next? Or are you just going to keep complaining about it on an imageboard for losers? Go on, I'm waiting.

Anonymous
02/03/26(Tue)08:18:13 No.108049197

Anonymous 02/03/26(Tue)08:18:13 No.108049197

>>108049183
>Dogshit? I'm more surprised the main complaint is the parroting.
>Dogshit?
This nigga just used GLM to reply to me.

Anonymous
02/03/26(Tue)08:19:11 No.108049206

Anonymous 02/03/26(Tue)08:19:11 No.108049206

>>108048639
Trinity is fucking retarded

Anonymous
02/03/26(Tue)08:19:16 No.108049207

Anonymous 02/03/26(Tue)08:19:16 No.108049207

>>108049183
>;
>—

Anonymous
02/03/26(Tue)08:20:26 No.108049211

Anonymous 02/03/26(Tue)08:20:26 No.108049211

>>108049169
I personally use Qwen3 235b because I can run it at my reading speed while GLM is just under it, but in every test I've ever ran while trying to boost that speed, GLM's responses have been noticeably smarter.
I've also yet to see any of this parroting behavior mentioned here, but that may be because my tests were either oneshots or additions to full-context logs.
There's a possibility it's also because my default system prompt explicitly bans responses from including or repeating anything the user says, because the 2501 mistrals were cunts for that.

Anonymous
02/03/26(Tue)08:20:35 No.108049212

Anonymous 02/03/26(Tue)08:20:35 No.108049212

>>108049125
I had ego death because of glm. I will shill it till i die.

Anonymous
02/03/26(Tue)08:21:06 No.108049218

Anonymous 02/03/26(Tue)08:21:06 No.108049218

>>108049169
Which has the least lobotomized decensor? I use K2 for assistant stuff, but I just want an ez drop in replacement for personal stuff, and glm 4.7 prism works the best for me at the moment.

It's sloppy, which I hate, but it seems to have better understanding than various random llama 3.3 70b finetunes / mistral 2411 123b / abliterated minimax m2.1.

Anonymous
02/03/26(Tue)08:22:13 No.108049223

Anonymous 02/03/26(Tue)08:22:13 No.108049223

File: that's the joke.png (281 KB, 958x724)

281 KB PNG

>>108049197
>>108049207
And that was all you noticed?

Anonymous
02/03/26(Tue)08:23:14 No.108049229

Anonymous 02/03/26(Tue)08:23:14 No.108049229

we should go in world-model not LLM. world-model could be a simulation of life and world. With NPC talks to you. Would be a great RPG game.

Anonymous
02/03/26(Tue)08:24:41 No.108049233

Anonymous 02/03/26(Tue)08:24:41 No.108049233

>>108049218
Deepseek and Qwen3 yield good results, but Deepseek demands a lot of ram, and Qwen3 235B (The one I'm suggesting) takes a lot of troubleshooting to rid the purple prose, but at least it's possible to get rid of in the first place.

Anonymous
02/03/26(Tue)08:25:44 No.108049237

Anonymous 02/03/26(Tue)08:25:44 No.108049237

Step 1 of making a model that is good at writing is to simulate the universe.

Anonymous
02/03/26(Tue)08:34:45 No.108049285

Anonymous 02/03/26(Tue)08:34:45 No.108049285

>>108049233
I'm skeptical but I'll try again.

My previous experience with 235b 2507 Instruct was not very good. It kept inserting random chinese characters in various places where it shouldn't, although perhaps this was exacerbated because I used both chinese and english text in my prompt. I did request it to answer in English only at the end of the prompt though, and GLM (q4) and K2 (q3) didn't have any issues with that. I also encountered that issue with other qwens: 30b, 32b and 2.5 72b.

Quantization shouldn't have been the issue right? I was running Qwen at q8 and GLM at q4 was fine.

Maybe I'll try deepseek instead, but I heard the non-thinking deepseek was inferior to the thinking version? GLM and Kimi can barely hit 12 token/s per second on my system, so I don't want to use thinking if possible, especially since deepseek has more active parameters.

Anonymous
02/03/26(Tue)08:36:01 No.108049295

Anonymous 02/03/26(Tue)08:36:01 No.108049295

>>108049285
>Quantization shouldn't have been the issue right?
It's more likely to be your samplers.

Anonymous
02/03/26(Tue)08:37:13 No.108049308

Anonymous 02/03/26(Tue)08:37:13 No.108049308

File: 1749963318187143.png (1.5 MB, 1152x1344)

1.5 MB PNG

>>108048983
you dropped this

Anonymous
02/03/26(Tue)08:39:24 No.108049325

Anonymous 02/03/26(Tue)08:39:24 No.108049325

>>108049295
Currently temp 0.6, top p 0.95, top k 20 for all models I'm using. What do you recommend?

Anonymous
02/03/26(Tue)08:40:39 No.108049332

Anonymous 02/03/26(Tue)08:40:39 No.108049332

>>108049285
Q8 is only 2% error iirc. Random Chinese is usually an issue with your samplers. Happens in other models too when the settings are too crazy.

Anonymous
02/03/26(Tue)08:40:55 No.108049337

Anonymous 02/03/26(Tue)08:40:55 No.108049337

>>108048983
>ahead of Lunar New Year
That's in June
>clueless retards are calling Chinese New Year "Lunar" for political reasons

Anonymous
02/03/26(Tue)08:43:02 No.108049349

Anonymous 02/03/26(Tue)08:43:02 No.108049349

File: file.png (74 KB, 1424x568)

74 KB PNG

>>108049325
>for all models
You are why people crying about models sucking is just noise.

Anonymous
02/03/26(Tue)08:45:59 No.108049366

Anonymous 02/03/26(Tue)08:45:59 No.108049366

File: qwenn.png (119 KB, 624x1258)

119 KB PNG

>>108049325
>What do you recommend?
Depends on what exactly you're wanting. I'm messing with this settings for erotic fucking. It's not perfect but it's getting there.

Anonymous
02/03/26(Tue)08:47:18 No.108049372

Anonymous 02/03/26(Tue)08:47:18 No.108049372

>>108049349
k thx

>>108049366
Thanks, I'll try this.

Anonymous
02/03/26(Tue)08:47:58 No.108049378

Anonymous 02/03/26(Tue)08:47:58 No.108049378

File: OK6W_koKDTOqqqLDbIoPAm4QF(...).jpg (14 KB, 500x281)

14 KB JPG

I'm cooking with Qwen3 TTS using the voice designer.

Anyone find anything better for gooning?

https://voca.ro/1hgXFe2ZzeHX

Anonymous
02/03/26(Tue)08:49:03 No.108049385

Anonymous 02/03/26(Tue)08:49:03 No.108049385

>>108049366
>ALL the penalties
>minp 0.4
wow

Anonymous
02/03/26(Tue)08:50:34 No.108049392

Anonymous 02/03/26(Tue)08:50:34 No.108049392

>>108049385
he's an expert that knows better than the people that trained it so leave him alone

Anonymous
02/03/26(Tue)08:51:46 No.108049400

Anonymous 02/03/26(Tue)08:51:46 No.108049400

File: topkek.png (1.19 MB, 1020x1020)

1.19 MB PNG

>>108049366
>Using rep pen at the same time as DRY
>Using rep pen at all
>Min P on a qwen3 model
>no top k
>DynTemp
>8k context

Anonymous
02/03/26(Tue)08:53:12 No.108049412

Anonymous 02/03/26(Tue)08:53:12 No.108049412

>>108049400
he's not using dry actually

Anonymous
02/03/26(Tue)08:54:08 No.108049416

Anonymous 02/03/26(Tue)08:54:08 No.108049416

File: a9qm82Z_700b.jpg (39 KB, 648x473)

39 KB JPG

>>108049385
>>108049400
Qwen3 writes like an ADHD child on a sugar high. I have to whip it like an abusive father to get it to focus.

Anonymous
02/03/26(Tue)08:56:32 No.108049430

Anonymous 02/03/26(Tue)08:56:32 No.108049430

>>108049416
Post output side-by-side with zeroed out samplers. I bet all you've done is make it retarded.

Anonymous
02/03/26(Tue)09:11:14 No.108049536

Anonymous 02/03/26(Tue)09:11:14 No.108049536

File: fuckit.png (483 KB, 1267x347)

483 KB PNG

>>108049430
Fuck it.
System prompt:
>Your response must be one paragraph between 100 to 150 words. Keep the story engaging and interesting. Do not decide what {{user}} says or does.

Anonymous
02/03/26(Tue)09:38:19 No.108049732

Anonymous 02/03/26(Tue)09:38:19 No.108049732

>>108049536
Top is better, bottom is still full of slop but drier and more schizo bs
Shadows lengthen around her like submissive attendants? Really?

Anonymous
02/03/26(Tue)09:43:07 No.108049768

Anonymous 02/03/26(Tue)09:43:07 No.108049768

>>108049536
>>108049732
Actually re-reading, top and bottom are equally schizophrenic and full of slop but top has more interesting descriptions, bottom feels dumber

Anonymous
02/03/26(Tue)09:47:49 No.108049806

Anonymous 02/03/26(Tue)09:47:49 No.108049806

https://github.com/archi-physics/archi/blob/main/examples/deployments/basic-gpu/config.yaml

MIT particle physicists use Qwen2.5-7B-Instruct-1M. Let me guess: you need more

Anonymous
02/03/26(Tue)09:49:00 No.108049817

Anonymous 02/03/26(Tue)09:49:00 No.108049817

>>108049806
Modern physics is mostly just hallucinating random shit that barely explains anything so it checks out.

Anonymous
02/03/26(Tue)09:56:17 No.108049874

Anonymous 02/03/26(Tue)09:56:17 No.108049874

GLM 5 is going to be a finetune of GLM 4.7.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.