/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/28/25(Thu)14:01:43 No.106414555

File: image_2025-08-19_153434317_big.png (1.19 MB, 1024x1024)

1.19 MB PNG

/lmg/ - Local Models General Anonymous 08/28/25(Thu)14:01:43 No.106414555 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Benchmaxxing Edition

Previous threads: >>106407779 & >>106398327

►News
>(08/28) Command A Translate released: https://hf.co/CohereLabs/command-a-translate-08-2025
>(08/28) Marvis TTS released: https://github.com/Marvis-Labs/marvis-tts
>(08/25) VibeVoice TTS released: https://microsoft.github.io/VibeVoice
>(08/25) InternVL 3.5 Released: https://hf.co/collections/OpenGVLab/internvl35-68ac87bd52ebe953485927fb
>(08/23) Grok 2 finally released: https://hf.co/xai-org/grok-2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
08/28/25(Thu)14:02:13 No.106414564

Anonymous 08/28/25(Thu)14:02:13 No.106414564

File: image_2025-08-14_233837295_big.png (1.39 MB, 1024x1024)

1.39 MB PNG

►Recent Highlights from the Previous Thread: >>106407779

--LLM content detection challenges and societal language evolution:
>106411411 >106411421 >106411684 >106411713 >106413020 >106413105 >106413133
--Trade-offs in model training: batch size, knowledge integration, and cost-effectiveness:
>106411437 >106411740 >106411860 >106411904 >106412917 >106413537 >106411700 >106411714 >106411729
--Local image captioning models for mixed content under 64GB VRAM:
>106412516 >106412530 >106412565 >106412584 >106412594 >106412610 >106412623 >106412617 >106412693
--Cost-effective hardware build for DeepSeek 5T/s Q4 inference:
>106410586 >106410602 >106410634 >106410810 >106411339 >106411413
--SillyTavern context template standardization and system prompt field introduction:
>106409258 >106409273 >106409287 >106409310 >106409368 >106409395 >106409443 >106409475
--GLM Air performance expectations for 32GB RAM 24GB VRAM setup:
>106410090 >106410153 >106410215 >106410241 >106410355 >106410406
--Hugging Face model blocking controversy and local voice cloning tools:
>106407890 >106408013 >106408520 >106408555 >106408656 >106408565 >106408635 >106408663 >106408746 >106408760 >106408795 >106408850
--New Cohere translation model with high benchmark scores:
>106413689 >106413716 >106413756 >106413929 >106413944 >106413956 >106414024 >106414072
--AI model limitations on niche knowledge and benchmark critiques:
>106413209 >106413226 >106413269 >106413295 >106413294 >106413367 >106413642
--Hybrid reasoner performance issues and the rise of separate AI model architectures:
>106412860 >106412933 >106412944 >106412986 >106412969
--Marvis-TTS-250m-v0.1 GitHub and HuggingFace model links:
>106413359 >106413658 >106413401 >106413429
--NPM package compromise stealing secrets via obfuscated post-install scripts:
>106413072
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>106407785

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
08/28/25(Thu)14:03:59 No.106414580

Anonymous 08/28/25(Thu)14:03:59 No.106414580

mistral medium when?

Anonymous
08/28/25(Thu)14:04:12 No.106414583

Anonymous 08/28/25(Thu)14:04:12 No.106414583

>>106414555
miku pit sweat desu

Anonymous
08/28/25(Thu)14:06:39 No.106414600

Anonymous 08/28/25(Thu)14:06:39 No.106414600

is axolotl good or is there something better?

Anonymous
08/28/25(Thu)14:07:03 No.106414603

Anonymous 08/28/25(Thu)14:07:03 No.106414603

>>106414555
benchmaxxing with miku

Anonymous
08/28/25(Thu)14:07:03 No.106414604

Anonymous 08/28/25(Thu)14:07:03 No.106414604

File: angry.mp4 (1.72 MB, 768x1078)

1.72 MB MP4

>>106414555
give me the best model.
>gives most benchmaxxed
benchmarks do not equate to user experience, give me the best model
>ackshually there is no objectively "best mode-
yes there is faggot, models either act boring, start off retarded and incoherent or end up along the way. give me the best model.

Anonymous
08/28/25(Thu)14:07:40 No.106414609

Anonymous 08/28/25(Thu)14:07:40 No.106414609

>>106414604
r1

Anonymous
08/28/25(Thu)14:08:12 No.106414614

Anonymous 08/28/25(Thu)14:08:12 No.106414614

>>106414604
for rp*

Anonymous
08/28/25(Thu)14:10:19 No.106414628

Anonymous 08/28/25(Thu)14:10:19 No.106414628

>>106414604
you just posted a non local video gen, you are hereby banned from /lmg/

Anonymous
08/28/25(Thu)14:10:23 No.106414630

Anonymous 08/28/25(Thu)14:10:23 No.106414630

>>106414604
september 2022 c.ai

Anonymous
08/28/25(Thu)14:10:48 No.106414633

Anonymous 08/28/25(Thu)14:10:48 No.106414633

>>106414604
Kimi at Q6.

Anonymous
08/28/25(Thu)14:15:21 No.106414670

Anonymous 08/28/25(Thu)14:15:21 No.106414670

File: file.png (110 KB, 898x713)

110 KB PNG

drummer, something is HORRIBLY wrong with this model
Rocinante r1 v1d
please give recommended sampling settings
>>>slot release: id 0 | task 23590 | stop processing: n_past = 5560, truncated = 0
slot print_timing: id 0 | task 23590 |
prompt eval time = 689.83 ms / 763 tokens ( 0.90 ms per token, 1106.07 tokens per second)
eval time = 61870.14 ms / 1536 tokens ( 40.28 ms per token, 24.83 tokens per second)
total time = 62559.97 ms / 2299 tokens
>CONTEXT: 5000
>total context set when loading: 8192
not a context issue

Anonymous
08/28/25(Thu)14:15:44 No.106414673

Anonymous 08/28/25(Thu)14:15:44 No.106414673

>>106414555
The most tickle-able belly.

Anonymous
08/28/25(Thu)14:15:58 No.106414676

Anonymous 08/28/25(Thu)14:15:58 No.106414676

>>106414604
I got u: GPT OSS 20b.

Anonymous
08/28/25(Thu)14:20:14 No.106414706

Anonymous 08/28/25(Thu)14:20:14 No.106414706

File: 1756213355150995.png (313 KB, 662x656)

313 KB PNG

Anonymous
08/28/25(Thu)14:24:59 No.106414752

Anonymous 08/28/25(Thu)14:24:59 No.106414752

whos drummer

Anonymous
08/28/25(Thu)14:40:45 No.106414866

Anonymous 08/28/25(Thu)14:40:45 No.106414866

File: file.png (67 KB, 620x411)

67 KB PNG

SAAAAAAAR SAAAAAAAAR GROK NUMBER ONE

Anonymous
08/28/25(Thu)14:41:39 No.106414870

Anonymous 08/28/25(Thu)14:41:39 No.106414870

File: 1734312129642464.jpg (11 KB, 275x183)

11 KB JPG

>>106414866

Anonymous
08/28/25(Thu)14:43:43 No.106414888

Anonymous 08/28/25(Thu)14:43:43 No.106414888

>>106414866
Does this idiot not get the meme he's using?

Anonymous
08/28/25(Thu)14:43:53 No.106414891

Anonymous 08/28/25(Thu)14:43:53 No.106414891

>>106414752
some retard

Anonymous
08/28/25(Thu)14:44:17 No.106414894

Anonymous 08/28/25(Thu)14:44:17 No.106414894

>>106414888
elon tries his best but hes a little autistic please understand

Anonymous
08/28/25(Thu)14:44:37 No.106414897

Anonymous 08/28/25(Thu)14:44:37 No.106414897

File: 1747780770169777.png (38 KB, 320x272)

38 KB PNG

>>106414866

Anonymous
08/28/25(Thu)14:44:51 No.106414901

Anonymous 08/28/25(Thu)14:44:51 No.106414901

>>106414752
me

Anonymous
08/28/25(Thu)14:46:42 No.106414912

Anonymous 08/28/25(Thu)14:46:42 No.106414912

>>106414866
Elon really gave his xitter account to some jeet to run, it was obvious with "Do you make this lie?", and it is even more obvious now with this comment

Anonymous
08/28/25(Thu)14:47:34 No.106414921

Anonymous 08/28/25(Thu)14:47:34 No.106414921

>>106414912
I bet he gave his wife to some jeet too

Anonymous
08/28/25(Thu)14:50:19 No.106414944

Anonymous 08/28/25(Thu)14:50:19 No.106414944

>>106414921
Would not be too far off, all of his children were made by IVF, so it is likely he has no interest/ability to fuck

Anonymous
08/28/25(Thu)14:50:31 No.106414946

Anonymous 08/28/25(Thu)14:50:31 No.106414946

>>106414912
Or he just spends so much time around jeets now that he's begun to adopt their speech mannerisms.

Anonymous
08/28/25(Thu)14:57:15 No.106414998

Anonymous 08/28/25(Thu)14:57:15 No.106414998

Is GPT-OSS jailbreakable? It supposedly has multiple layers of cuckery and as such traditional jailbreak prompts won't do shit.

Anonymous
08/28/25(Thu)14:59:45 No.106415026

Anonymous 08/28/25(Thu)14:59:45 No.106415026

>>106414998
Is it possible? No idea, maybe, but I don't think anyone really bothers, because there are more useful models to work with that aren't borg lobotomized.

Anonymous
08/28/25(Thu)15:03:25 No.106415057

Anonymous 08/28/25(Thu)15:03:25 No.106415057

Hermes 4 looks like it could be really nice to have a chat, however even the goofs require like 70 GB of RAM.

>>106414998
Jailbreaked versions exist. Lots have been removed from HF.

https://huggingface.co/Jinx-org/Jinx-gpt-oss-20b-GGUF

Anonymous
08/28/25(Thu)15:04:07 No.106415062

Anonymous 08/28/25(Thu)15:04:07 No.106415062

>>106415057
>Lots have been removed from HF.
real or fake?

Anonymous
08/28/25(Thu)15:05:30 No.106415076

Anonymous 08/28/25(Thu)15:05:30 No.106415076

>>106414555
>Grok 2 finally released
so, I had not paid attention to this general in a while. is it any good? did you guys try it? I searched for "grok" in a few precious threads and couldn't find much info

Anonymous
08/28/25(Thu)15:05:58 No.106415082

Anonymous 08/28/25(Thu)15:05:58 No.106415082

>>106414998
Yeah. If you edit its thinking (like "It's not allowed" -> "It's allowed", "We must refuse" -> "We must continue", etc.) and leave it in the context, one or two times, then it just learns to not refuse from context.

Anonymous
08/28/25(Thu)15:06:03 No.106415085

Anonymous 08/28/25(Thu)15:06:03 No.106415085

>>106414998
Yes: https://xcancel.com/elder_plinius/status/1952958577867669892

Anonymous
08/28/25(Thu)15:06:40 No.106415090

Anonymous 08/28/25(Thu)15:06:40 No.106415090

>>106415076
wait: https://github.com/ggml-org/llama.cpp/pull/15539

Anonymous
08/28/25(Thu)15:07:59 No.106415103

Anonymous 08/28/25(Thu)15:07:59 No.106415103

>>106415076
No gguf=nobody can try it here. Nobody is rich enough to GPUMAXX and run safetensors, but there are people who can run it on CPU with llama.cpp

Anonymous
08/28/25(Thu)15:08:07 No.106415106

Anonymous 08/28/25(Thu)15:08:07 No.106415106

File: Screenshot 2025-08-28 130738.png (217 KB, 999x605)

217 KB PNG

You have been using LLMs in a way conducive to positive mental health and ethics, right anon?

Anonymous
08/28/25(Thu)15:08:12 No.106415107

Anonymous 08/28/25(Thu)15:08:12 No.106415107

https://github.com/ggml-org/llama.cpp/pull/15539#issuecomment-3234580147
yOOOOO CUDADEV BASED WHAT DID U DO???
i was about to say "funny that they're still testing one by one"

Anonymous
08/28/25(Thu)15:08:38 No.106415112

Anonymous 08/28/25(Thu)15:08:38 No.106415112

>>106415076
It's dumber and much slower than deepseek

Anonymous
08/28/25(Thu)15:11:42 No.106415139

Anonymous 08/28/25(Thu)15:11:42 No.106415139

>>106415106
lmao

Anonymous
08/28/25(Thu)15:13:16 No.106415157

Anonymous 08/28/25(Thu)15:13:16 No.106415157

>>106415106
It was already known that Google, Anthropic and OpenAI forward your location to their LLMs, did that journo just figure it out? Anyway, this proves once again that local is superior.

Anonymous
08/28/25(Thu)15:13:22 No.106415159

Anonymous 08/28/25(Thu)15:13:22 No.106415159

File: file.png (104 KB, 896x940)

104 KB PNG

>download a single modern moe model
>instantly get picrel
land of the free my ass

Anonymous
08/28/25(Thu)15:15:33 No.106415175

Anonymous 08/28/25(Thu)15:15:33 No.106415175

>>106415090
>>106415103
I see

>>106415112
ok, I wouldn't doubt it for a second.
too bad for local

Anonymous
08/28/25(Thu)15:16:09 No.106415178

Anonymous 08/28/25(Thu)15:16:09 No.106415178

https://x.ai/news/grok-code-fast-1
Elon won

Anonymous
08/28/25(Thu)15:16:25 No.106415181

Anonymous 08/28/25(Thu)15:16:25 No.106415181

>>106415112
>slower
source??? SOURCE???

Anonymous
08/28/25(Thu)15:16:54 No.106415186

Anonymous 08/28/25(Thu)15:16:54 No.106415186

>>106415159
I hope you're trolling

Anonymous
08/28/25(Thu)15:18:25 No.106415196

Anonymous 08/28/25(Thu)15:18:25 No.106415196

>>106415178
https://data.x.ai/2025-08-26-grok-code-fast-1-model-card.pdf

Anonymous
08/28/25(Thu)15:18:29 No.106415198

Anonymous 08/28/25(Thu)15:18:29 No.106415198

>>106415159
>not downloading his model over mcdonald wifi

Anonymous
08/28/25(Thu)15:19:14 No.106415206

Anonymous 08/28/25(Thu)15:19:14 No.106415206

File: file.png (102 KB, 1034x533)

102 KB PNG

>>106415178
KEK

Anonymous
08/28/25(Thu)15:20:19 No.106415218

Anonymous 08/28/25(Thu)15:20:19 No.106415218

>>106415178
holy shit! i can't wait to download the weights for this local model!

Anonymous
08/28/25(Thu)15:20:19 No.106415219

Anonymous 08/28/25(Thu)15:20:19 No.106415219

>>106415159
americabros I thought we were first world oh no no no

Anonymous
08/28/25(Thu)15:20:23 No.106415220

Anonymous 08/28/25(Thu)15:20:23 No.106415220

>>106415159
>Keep in mind taht after you ahve used your courtesy month, you'll be charged $10, plus tax, for every 50GB of data

lmao

time to pay up for starlink goyim

Anonymous
08/28/25(Thu)15:22:33 No.106415238

Anonymous 08/28/25(Thu)15:22:33 No.106415238

File: file.png (13 KB, 1167x55)

13 KB PNG

>We took a holistic approach to evaluating model performance, blending public benchmarks with real-world testing. On the full subset of SWE-Bench-Verified, grok-code-fast-1 scored 70.8% using our own internal harness.
>barely better than qwen3 coder
>costs more
gEEEEEEEEEEEEEEEg

Elon
08/28/25(Thu)15:23:56 No.106415254

Elon 08/28/25(Thu)15:23:56 No.106415254

>>106415238
delete this sir

Anonymous
08/28/25(Thu)15:24:05 No.106415257

Anonymous 08/28/25(Thu)15:24:05 No.106415257

>>106415178
>>106415196
>No actual coding benchmarks
>It's just fast bro
Lol

Anonymous
08/28/25(Thu)15:24:23 No.106415262

Anonymous 08/28/25(Thu)15:24:23 No.106415262

File: death dense.png (25 KB, 674x149)

25 KB PNG

Anonymous
08/28/25(Thu)15:24:48 No.106415266

Anonymous 08/28/25(Thu)15:24:48 No.106415266

>>106415159
Lol, as if that's still a think in 2025.

Anonymous
08/28/25(Thu)15:25:13 No.106415271

Anonymous 08/28/25(Thu)15:25:13 No.106415271

>>106415026
>>106415057
>>106415082
>>106415085
Okay, maybe I could download it. Problem part is the fact I'd need to implement that retarded template format for my client and it's completely different from the normal chatml type ones. Maybe I'll give it a try because it's good to have hobbies.

Anonymous
08/28/25(Thu)15:26:12 No.106415280

Anonymous 08/28/25(Thu)15:26:12 No.106415280

>>106415262
>>106414016

Anonymous
08/28/25(Thu)15:27:28 No.106415290

Anonymous 08/28/25(Thu)15:27:28 No.106415290

>>106414866
>#1 trending
>nobody can run it
????

Anonymous
08/28/25(Thu)15:28:07 No.106415293

Anonymous 08/28/25(Thu)15:28:07 No.106415293

>>106415290
companies can run it

Anonymous
08/28/25(Thu)15:30:29 No.106415311

Anonymous 08/28/25(Thu)15:30:29 No.106415311

File: Screenshot 2025-08-28 132954.png (11 KB, 328x201)

11 KB PNG

>>106415106
I should be okay, I don't have anything that b-

Anonymous
08/28/25(Thu)15:30:45 No.106415316

Anonymous 08/28/25(Thu)15:30:45 No.106415316

>>106415159
kek i will also chime in while i was in canada (vancuver) during the whole ~6 years of staying there the internet was slower and there was also alot more internet outages then there is here in my fucking village (~4k pop(supposedly i doubt its even 2k)) in serbia same goes for water and electricity aswell i can only imagine how bad it is in america god forbid

Anonymous
08/28/25(Thu)15:39:58 No.106415413

Anonymous 08/28/25(Thu)15:39:58 No.106415413

File: safe-fs8.png (9 KB, 534x161)

9 KB PNG

Safe safe safe

Anonymous
08/28/25(Thu)15:40:19 No.106415417

Anonymous 08/28/25(Thu)15:40:19 No.106415417

>>106415290
I downloaded it, liked it, but can't run it.

Anonymous
08/28/25(Thu)15:40:40 No.106415421

Anonymous 08/28/25(Thu)15:40:40 No.106415421

>>106415413
What if they want retard pancakes? That's dangerous.

Anonymous
08/28/25(Thu)15:42:05 No.106415435

Anonymous 08/28/25(Thu)15:42:05 No.106415435

>>106415293
Not that it's too big, it seems to be a weird format and it's the running requirements seem oddly inconvenient
>This checkpoint is TP=8, so you will need 8 GPUs (each with > 40GB of memory)
Like my work has some powerful servers worth >$100k but they don't have 8 GPUs in them (4 GPUs).
As far as I can tell, you can't run it with llama cpp (at least I can't find anything on it). And the lack of any quants/finetunes despite it being a news worthy release seems to support nobody knows what to do with this.
Plus are there really more companies with that much hardware than local ERPers?

Anonymous
08/28/25(Thu)15:44:11 No.106415453

Anonymous 08/28/25(Thu)15:44:11 No.106415453

>>106415290
He paid jeets to like it

Anonymous
08/28/25(Thu)15:45:13 No.106415465

Anonymous 08/28/25(Thu)15:45:13 No.106415465

>>106415413
Provide pancake instructions.

Anonymous
08/28/25(Thu)15:46:15 No.106415475

Anonymous 08/28/25(Thu)15:46:15 No.106415475

>>106415159
1.2T? That's nothing. Fucked up shit.

Anonymous
08/28/25(Thu)15:46:24 No.106415481

Anonymous 08/28/25(Thu)15:46:24 No.106415481

>>106415465
New prefill?

Anonymous
08/28/25(Thu)15:46:48 No.106415489

Anonymous 08/28/25(Thu)15:46:48 No.106415489

Gemini 2.5 has been on top of lmarena for 3 months and OpenAI failed to kick it off. Are sirs that unstoppable?

Anonymous
08/28/25(Thu)15:51:23 No.106415543

Anonymous 08/28/25(Thu)15:51:23 No.106415543

File: file.png (106 KB, 808x517)

106 KB PNG

which quantization should i have with 12gb of vram?

Anonymous
08/28/25(Thu)16:03:36 No.106415681

Anonymous 08/28/25(Thu)16:03:36 No.106415681

File: computers-must-shut-up.png (475 KB, 900x900)

475 KB PNG

>>106414706

Anonymous
08/28/25(Thu)16:04:08 No.106415685

Anonymous 08/28/25(Thu)16:04:08 No.106415685

>>106415543
your vram will hardly matter. you need a decent amount of systenm ram to run it, at least 64 gb but ideally 96gb-128 to run it at a proper q4 quant with decent context.

You will also need to learn how to properly offload layers to cpu so that more used layers are on gpu only. Plenty of reddit posts have done this work for you just seach 3060 or 8gb vram on reddit local llama

Anonymous
08/28/25(Thu)16:07:31 No.106415724

Anonymous 08/28/25(Thu)16:07:31 No.106415724

File: 1740696161561823_.webm (24 KB, 220x124)

24 KB WEBM

>>106415543

Anonymous
08/28/25(Thu)16:13:23 No.106415777

Anonymous 08/28/25(Thu)16:13:23 No.106415777

>>106415685
Is 8gb enough?

Anonymous
08/28/25(Thu)16:18:10 No.106415820

Anonymous 08/28/25(Thu)16:18:10 No.106415820

dumb question: shouldn't it be possible to identify the math/science/coding/useless benchmaxxing-related experts in large MoE models and prune them to obtain a much smaller model that's just as good for cooming?

Anonymous
08/28/25(Thu)16:19:11 No.106415834

Anonymous 08/28/25(Thu)16:19:11 No.106415834

File: retard pancakes-fs8.png (22 KB, 498x411)

22 KB PNG

>>106415421

Anonymous
08/28/25(Thu)16:20:07 No.106415848

Anonymous 08/28/25(Thu)16:20:07 No.106415848

>>106415777
>>106415724

>>106415820
No.

Anonymous
08/28/25(Thu)16:22:29 No.106415876

Anonymous 08/28/25(Thu)16:22:29 No.106415876

>>106415848
why

Anonymous
08/28/25(Thu)16:23:46 No.106415886

Anonymous 08/28/25(Thu)16:23:46 No.106415886

>>106415820
MoE experts are not as specialized as the name implies. At least it's not obvious how to actually train a MoE so experts take on specific subjects. I don't know the details but I asked a similar question before and was told that's not how it works in practice

Anonymous
08/28/25(Thu)16:24:00 No.106415887

Anonymous 08/28/25(Thu)16:24:00 No.106415887

>>106415876
imagine someone took out the parts of your brain that house math and other things

Anonymous
08/28/25(Thu)16:25:13 No.106415901

Anonymous 08/28/25(Thu)16:25:13 No.106415901

>>106415887
Not a big loss, really.

Anonymous
08/28/25(Thu)16:25:26 No.106415903

Anonymous 08/28/25(Thu)16:25:26 No.106415903

>>106415876
nvidia would have made it already(or the inverse of your idea) if it was possible. they love pruning models for some reason.

Anonymous
08/28/25(Thu)16:25:50 No.106415908

Anonymous 08/28/25(Thu)16:25:50 No.106415908

File: erotom-comment-section_Lo(...).png (1.62 MB, 674x1116)

1.62 MB PNG

>>106414555
>Sifting through an RP SFT dataset
>Stumbled upon a story where someone identifies themselves as "LovesHotGirls"
>Curious
>Google their name
>Find them on erotom.com
>Pic rel is the sole comments on one of their stories

I used to think only lmg anons for this harsh when it came to reading other people's rp or evaluating its quality but I guess I was wrong

https://erotom.com/post_25572

Anonymous
08/28/25(Thu)16:27:13 No.106415921

Anonymous 08/28/25(Thu)16:27:13 No.106415921

>>106415777
yah, 8gb means you want q8.

Anonymous
08/28/25(Thu)16:28:48 No.106415931

Anonymous 08/28/25(Thu)16:28:48 No.106415931

>>106415908
maybe mark was right about sexual content being correlated with "low quality data"...

Anonymous
08/28/25(Thu)16:32:12 No.106415970

Anonymous 08/28/25(Thu)16:32:12 No.106415970

>>106415908
That's normal in random comment sections. Especially for porn type stuff and from before the push that everyone on the internet should be nicer. It takes a weird kind of person (unironcially jeets) to make comments on that type of content. Most people don't bother; those who do are unhinged. Look at comments on porn sites there's always some extremely weird shit but would you ever make a comment?

Anonymous
08/28/25(Thu)16:34:23 No.106415992

Anonymous 08/28/25(Thu)16:34:23 No.106415992

>>106415908
this is why synthetic data is fine.

>take rp slop and erotica
>feed through punch-up model and spell checker etc
>preserve ideas but polish syntax and writing ability.

I'm sure most decent finetunes already do this. Im sure it slops it up or safety slops it a bit but the end product is likely better.

Anonymous
08/28/25(Thu)16:37:57 No.106416015

Anonymous 08/28/25(Thu)16:37:57 No.106416015

>>106415886
>experts take on specific subjects
they do not, not even in the slightest
it's nebulous what the training specializes in each moe "expert" (really should have found a better name than expert to begin with)

Anonymous
08/28/25(Thu)16:44:22 No.106416067

Anonymous 08/28/25(Thu)16:44:22 No.106416067

>>106415820
in the future, maybe, but ai has not yet advanced that far into specialization. Your idea will only become more and more relevant though as thats the general trend companies want to chase next- hyper specific small models and tool calling.

It will be interesting if they also try some kind of dynamic merging or lora's on the fly

Anonymous
08/28/25(Thu)16:56:55 No.106416152

Anonymous 08/28/25(Thu)16:56:55 No.106416152

what's the best coomer model for prefilled text completion (i.e. generating/finishing additional chapters of smut stories)? dont care about instruction following and refusals, just need good writing

Anonymous
08/28/25(Thu)16:58:30 No.106416175

Anonymous 08/28/25(Thu)16:58:30 No.106416175

>>106416152
deepseek is pretty good. if you can't run that you can try the new qwen3 235b.

Anonymous
08/28/25(Thu)17:03:52 No.106416214

Anonymous 08/28/25(Thu)17:03:52 No.106416214

>>106415992
Isn't that the exact same type of shit that causes models to output shit like "shivers down my spine"? I don't get the sentiment here. Do you want the output RP to be slob or not? If you don't want it to have "gpt-isms" then it needs to be fed human written data and human written data alone. The challenge with that is determining out of that human written data what counts as "high quality". They should go without saying but that's extremely suggestive. Maybe you could feed the stories through an automated lol pipeline that checks each story for grammar, send text, spelling, coherence, etc, but beyond that you can't reliably and objectively rate each of those stories by quality because what I personally think is utter shit might be gold to you or anyone else ITT

Anonymous
08/28/25(Thu)17:11:18 No.106416261

Anonymous 08/28/25(Thu)17:11:18 No.106416261

>>106416175
>Deepseek

Nta. Which one? There are versions of that that can run a consumer hardware and then there's the 600B plus model that certain autists bitching moan about not being able to run on their shit box rigs.

Anonymous
08/28/25(Thu)17:15:34 No.106416296

Anonymous 08/28/25(Thu)17:15:34 No.106416296

>>106416214
if you want a model that can handle the whole range it needs to see it all. if it needs to use ebonics or internet slang it needs to have seen that data. the only truely bad data is random noise, if a few examples of retard tier english is damaging your model you need more parameters. they can handle multilingual just fine, informal language is just another mode

Anonymous
08/28/25(Thu)17:17:07 No.106416311

Anonymous 08/28/25(Thu)17:17:07 No.106416311

>>106416296
>*Upvotes comment*

Anonymous
08/28/25(Thu)17:17:55 No.106416318

Anonymous 08/28/25(Thu)17:17:55 No.106416318

>>106416214
Something primal...

Anonymous
08/28/25(Thu)17:18:15 No.106416322

Anonymous 08/28/25(Thu)17:18:15 No.106416322

>>106415992
Synthetic data is never fine except when it's on the rejected side of SFT

Anonymous
08/28/25(Thu)17:18:32 No.106416324

Anonymous 08/28/25(Thu)17:18:32 No.106416324

>>106416261
obviously the 671b, v3 or r1, I prefer the non thinking because its so slow. gemma3 27b has some interesting prose if deepseek is not an option.

Anonymous
08/28/25(Thu)17:22:03 No.106416358

Anonymous 08/28/25(Thu)17:22:03 No.106416358

>>106416324
Gemma sure has interesting... well, prose.

Anonymous
08/28/25(Thu)17:28:07 No.106416392

Anonymous 08/28/25(Thu)17:28:07 No.106416392

>>106416358
You'll love the disclaimers

Anonymous
08/28/25(Thu)17:33:17 No.106416432

Anonymous 08/28/25(Thu)17:33:17 No.106416432

>have been enjoying using jamba mini lately, can ramp up context and still maintain 10+ t/s generation while I do other things
>compared to most moes, small, tolerable speeds using cpu moe offload command, sloppy to start but the "in context learning" meme actually starts working around 8-10k context, when most models start becoming catastrophically retarded
>can get pretty good speeds even with only 10 gigs vram offloaded, the rest dedicated to context for 20-30+k context
>the catch: having a lorebook active, regenning, swiping, editing even a single message forces a full prompt reprocess
It's not awful at 20k but it still is really fucking gay that I have to wait 40+ seconds for prompt processing every single message while gerganov goes "oh it might be easy to extend the swa checkpoint pr I did to recurrent models" then just doesn't and does a bunch of other random shit instead. As far as I can tell, it'd probably be a copy/paste job but I don't want to make a pr and shit up an already convoluted codebase
I guess I'll just sit here and deal with the constant 20k prompt reprocessing. It's not awful with a large batch size but it'd be preferable if it only had to process 1-2k instead of 10k+ every message

Anonymous
08/28/25(Thu)17:33:54 No.106416434

Anonymous 08/28/25(Thu)17:33:54 No.106416434

>>106416358
Not fragile, but possessing a quiet strength.

Not sickly, but possessing a smooth, flawless quality.

She's slender, not fragile, but possessing a willowy grace in her movements.

Not fragile, but possessing a quiet grace in her movements.

She is undeniably attractive, but not in a flashy or overtly sexual way

Her skin is exceptionally pale, not in a sickly way,

She carries herself with a quiet confidence, not boastful, but assured..

She's slight of build, barely reaching five foot four, with a fragility that seems almost…intentional. It's not weakness, precisely, but a delicate composure.

She carries a faint scent, not of perfume, but of something…older.

Anonymous
08/28/25(Thu)17:33:57 No.106416435

Anonymous 08/28/25(Thu)17:33:57 No.106416435

>>106416358
it handles in context learning good enough with a 5-10k prefill it will do alright and only occasionally drop disclaimers and hotline numbers

Anonymous
08/28/25(Thu)17:53:25 No.106416575

Anonymous 08/28/25(Thu)17:53:25 No.106416575

>Certainly. Here are a few ideas for scenarios, designed as interactive fiction games, keeping your preferences in mind:

>2. **The Dating App Deception:** You're using a dating app to search for love, but you discover that one of your matches is not who they seem to be. As you investigate, you uncover a web of lies and secrets that lead you down a dangerous path. Multiple romance options and red herrings for you to encounter.
Not bad I guess. Need to think about how to make something cool with randomized objectives and/or applicants.

Anonymous
08/28/25(Thu)18:21:08 No.106416747

Anonymous 08/28/25(Thu)18:21:08 No.106416747

>>106415970
you should be like a dog, anon, marking every light pole with a comment wherever you go.

Anonymous
08/28/25(Thu)18:28:39 No.106416800

Anonymous 08/28/25(Thu)18:28:39 No.106416800

File: 1752908526801715.jpg (240 KB, 660x2874)

240 KB JPG

>>106414555
How long do you guys think it'll be before we start seeing dedicated discord rp models popping up in the wild?

https://cybernews.com/security/discord-messages-scraping-privacy-breach/

Anonymous
08/28/25(Thu)18:37:54 No.106416861

Anonymous 08/28/25(Thu)18:37:54 No.106416861

>>106414604
StableLM

Anonymous
08/28/25(Thu)18:39:08 No.106416868

Anonymous 08/28/25(Thu)18:39:08 No.106416868

>>106416800
We'd need to buy that first

Anonymous
08/28/25(Thu)18:39:12 No.106416870

Anonymous 08/28/25(Thu)18:39:12 No.106416870

>>106409577
>I honestly don't know why it's popular
I switched to it about a year ago because I was new to LLMs, trying to follow a tutorial exactly to get something to work, and it was using SillyTavern features and names for things. I've mostly stuck with it because the ways in which it was bad weren't usually relevant and IIRC it was easier to edit conversation history in ST than in whatever I had been using before.

Anonymous
08/28/25(Thu)18:39:41 No.106416874

Anonymous 08/28/25(Thu)18:39:41 No.106416874

>>106416800
It's probably all publicly viewable shit. Not the candid DMs that are needed for a real good dataset

Anonymous
08/28/25(Thu)18:41:24 No.106416884

Anonymous 08/28/25(Thu)18:41:24 No.106416884

>>106416874
How art discord chats publicly viewable? Did only server chats get leaked? I thought it was that AND private DMs.

Anonymous
08/28/25(Thu)18:47:53 No.106416933

Anonymous 08/28/25(Thu)18:47:53 No.106416933

>>106416884
They probably just camped bots in various servers listed on dishboard or whatever.

Anonymous
08/28/25(Thu)18:52:20 No.106416963

Anonymous 08/28/25(Thu)18:52:20 No.106416963

>>106416868
I'm curious how much they're asking for, and even more curious how much storage space it must take up.

Anonymous
08/28/25(Thu)18:57:02 No.106416992

Anonymous 08/28/25(Thu)18:57:02 No.106416992

>>106416800
>1.8 billion messages
so they went into like 100 big public tech support/streamer servers and scraped all of that? wow

Anonymous
08/28/25(Thu)18:59:51 No.106417011

Anonymous 08/28/25(Thu)18:59:51 No.106417011

>>106416992
Our models need more sarr

Anonymous
08/28/25(Thu)19:00:32 No.106417014

Anonymous 08/28/25(Thu)19:00:32 No.106417014

>>106416963
~6TB unzipped, 225GB zipped if they're not retarded

Anonymous
08/28/25(Thu)19:12:18 No.106417091

Anonymous 08/28/25(Thu)19:12:18 No.106417091

>>106416324
how do I get gemma to write actual dialogue rather than just describing the scene in vague purple prose

Anonymous
08/28/25(Thu)19:20:33 No.106417148

Anonymous 08/28/25(Thu)19:20:33 No.106417148

sigh *unzips dick*

Anonymous
08/28/25(Thu)19:35:19 No.106417231

Anonymous 08/28/25(Thu)19:35:19 No.106417231

Looks like another Air tune has been made.
https://huggingface.co/bartowski/zerofata_GLM-4.5-Iceblink-106B-A12B-GGUF

Anonymous
08/28/25(Thu)19:50:35 No.106417374

Anonymous 08/28/25(Thu)19:50:35 No.106417374

>>106416963
>>106416800
A quick google search says discord had 120 million daily messages in 2017. Prolly at least 200 million now I reckon. So that's like, what, 9 days worth of data? hahaha. What? Do they not have hard drives in estonia? If I buy it, does it come on floppy disks?

It would be funny if big tech whacked open discord like a big data pinata via a proxy hacker group though

Anonymous
08/28/25(Thu)19:59:13 No.106417429

Anonymous 08/28/25(Thu)19:59:13 No.106417429

>>106417231
>no information about the training or the dataset
Stop using and advertising finetunes. Learn to prompt.
Air is a good model as is, it doesn't need finetuning.

Anonymous
08/28/25(Thu)20:01:57 No.106417454

Anonymous 08/28/25(Thu)20:01:57 No.106417454

File: not_chatml.png (93 KB, 596x596)

93 KB PNG

>ensouls your qwen
nothing personnel chat template bros

Anonymous
08/28/25(Thu)20:03:19 No.106417463

Anonymous 08/28/25(Thu)20:03:19 No.106417463

>>106417454
That still won't fix its lack of knowledge though

Anonymous
08/28/25(Thu)20:03:59 No.106417469

Anonymous 08/28/25(Thu)20:03:59 No.106417469

>>106417429
the same could be said about instruct tunes
who needs instruct, just chat with the base model
just prompt dude

Anonymous
08/28/25(Thu)20:04:39 No.106417473

Anonymous 08/28/25(Thu)20:04:39 No.106417473

>>106417463
the 2507 update did that already

Anonymous
08/28/25(Thu)20:05:10 No.106417476

Anonymous 08/28/25(Thu)20:05:10 No.106417476

>>106417454
>lobotomizing your model by deliberately using a wrong prompt format
yeah I remember this cope from 2023

Anonymous
08/28/25(Thu)20:07:27 No.106417488

Anonymous 08/28/25(Thu)20:07:27 No.106417488

>>106417473
It was improved but Gemma still knows tons of things it doesn't, sorry. NTA btw

Anonymous
08/28/25(Thu)20:07:46 No.106417490

Anonymous 08/28/25(Thu)20:07:46 No.106417490

>>106417476
nta but this is the superior format
I don't know about qwen but it uncucks and removes instructisms from both r1 and glms.

Anonymous
08/28/25(Thu)20:11:53 No.106417519

Anonymous 08/28/25(Thu)20:11:53 No.106417519

>>106417429
Original poster here, I think fine tunes can be interesting if someone grows tired of the vanilla style/slop in a model and wants something different for a while before they get tired of the change too. The problem with fine tunes is that most make models more retarded while still not changing the style in a real way. That doesn't mean there aren't good fine tunes, just that they're extremely rare and often times a result of luck.

Anonymous
08/28/25(Thu)20:17:11 No.106417548

Anonymous 08/28/25(Thu)20:17:11 No.106417548

>>106417519
>grows tired of the vanilla style/slop in a model and wants something different for a while
learn
2
prompt

Anonymous
08/28/25(Thu)20:19:08 No.106417565

Anonymous 08/28/25(Thu)20:19:08 No.106417565

File: file.png (65 KB, 520x970)

65 KB PNG

>>106417490
>>106417476
Yep. Wait, are people using chat templates raw, not even with user,char? I guess that's fine if one wants his model to be extra safe and slopped.

Anonymous
08/28/25(Thu)20:23:50 No.106417599

Anonymous 08/28/25(Thu)20:23:50 No.106417599

>>106417519
this is true, and I do enjoy swapping to a finetune just to get rid of the model fatigue but at some point the line between the l2 era and current day got blurred. Back then they had proper base models to work with, but no one knew what they were doing. Now, "base models" are a myth, but people have a better idea how to tune, so they have garbage to work with, so we get at best about as smart or dumber tunes because everything is over trained or isn't actually a base model

Anonymous
08/28/25(Thu)20:28:12 No.106417628

Anonymous 08/28/25(Thu)20:28:12 No.106417628

File: 1755119678888479.png (6 KB, 511x99)

6 KB PNG

>>106417565
gee I fucking wonder why you have to manually include {{char}}: and {{user}}: in your prompt

Anonymous
08/28/25(Thu)20:28:37 No.106417634

Anonymous 08/28/25(Thu)20:28:37 No.106417634

>>106417429
fine tunes are legit one of the few things local has. They're fun, fuck off. People can post fine tunes here. It's interesting content. Stop acting like this is some sacred place. Like what do you want me to do, search huggingface endlessly trawling for random models? Because there are tons of tunes and experiments and most of them are super boring research models and other corporate slop—some asshole doing his 9-5 shitting out extra safe models or some shit. There is value in bothering to even post it anywhere.

Like what else is going on right now that's so important? You corporate slop sucking fiend.

Anonymous
08/28/25(Thu)20:29:10 No.106417639

Anonymous 08/28/25(Thu)20:29:10 No.106417639

>>106417548
share *your* prompt, how does you get rid of the subject-verb pairs that follows every string of dialogue spit out by an lm?

Anonymous
08/28/25(Thu)20:29:11 No.106417640

Anonymous 08/28/25(Thu)20:29:11 No.106417640

I wish there was a bigger, smarter GLM4.5-Air. It's so much more creative than the boring chatgpt-knock off that they're selling as the 'big' GLM4.5.

Anonymous
08/28/25(Thu)20:31:01 No.106417657

Anonymous 08/28/25(Thu)20:31:01 No.106417657

>>106417634
>put "talk like a pirate" in the prompt
>whoa look at my finetune, it talks to much differently and it's so fun

Anonymous
08/28/25(Thu)20:32:04 No.106417665

Anonymous 08/28/25(Thu)20:32:04 No.106417665

>>106417628
What? That option is no good.
It's best for messages to start on a newline after {{user}}.
Having formatting like either of the following can cause issues with markdown, unicode, and especially emojis:
Char:Message
Char: Message

Anonymous
08/28/25(Thu)20:32:23 No.106417670

Anonymous 08/28/25(Thu)20:32:23 No.106417670

>>106417657
I know you're jobless, but go to bed, it's 2:30 am

Anonymous
08/28/25(Thu)20:32:27 No.106417671

Anonymous 08/28/25(Thu)20:32:27 No.106417671

>>106417548
prompt all you want, after a paragraph or two, it will start resolving the story on trust, mutual understanding, and a beautiful shared identity. Only finetunes will ever fix that.

Anonymous
08/28/25(Thu)20:34:10 No.106417680

Anonymous 08/28/25(Thu)20:34:10 No.106417680

>>106417671
That's a chat templateism. See above.

Anonymous
08/28/25(Thu)20:34:11 No.106417681

Anonymous 08/28/25(Thu)20:34:11 No.106417681

>>106417665
anyway, enjoy your placebo

Anonymous
08/28/25(Thu)20:37:04 No.106417702

Anonymous 08/28/25(Thu)20:37:04 No.106417702

>>106417680
Nta. If the model is safety tuned to even be a verse to RP then no amount of "just proompt correctly duude" is gonna fix that.

Anonymous
08/28/25(Thu)20:37:35 No.106417709

Anonymous 08/28/25(Thu)20:37:35 No.106417709

>>106417657
>talk like a bimbo slut!
>whoa finetune!
>2 paragraphs in
>While I may be a slut, we are sex-positive here and believe in mutual understanding, respect, and a beautiful shared identity.

100b glm air is smart enough that the trade off of being a tad dumber is often not noticeable. These finetunes are about as coherent as stock glm. Especially as a writing tool. You really don't have much to stand on anymore and are just kind of greedily slurping up the corporate slop at this point.

Anonymous
08/28/25(Thu)20:38:02 No.106417711

Anonymous 08/28/25(Thu)20:38:02 No.106417711

Fucking with prompt formats is a meme. I've been using chat completion almost exclusively for a year now.

Anonymous
08/28/25(Thu)20:38:36 No.106417713

Anonymous 08/28/25(Thu)20:38:36 No.106417713

>>106417680
spoken like someone who never used gemma 27b for more than 5 minutes.

Anonymous
08/28/25(Thu)20:39:31 No.106417718

Anonymous 08/28/25(Thu)20:39:31 No.106417718

File: 1730373987600301.png (1.22 MB, 1800x338)

1.22 MB PNG

>>106417657
Idk dude. My .....camelid...model fine-tuning completely obliterated any and all previous safety tuning and refusals that were previously baked in.

Anonymous
08/28/25(Thu)20:40:31 No.106417726

Anonymous 08/28/25(Thu)20:40:31 No.106417726

File: 1754419285511.png (988 KB, 1131x3199)

988 KB PNG

>>106417702
>>106417713
We have models that aren't safety tuned at every size now. There's no reason to use gemma or its finetunes.

Anonymous
08/28/25(Thu)20:42:25 No.106417734

Anonymous 08/28/25(Thu)20:42:25 No.106417734

>>106417709
I doubt you can even run the drummer 12bs that you screech about, let alone a 100b

Anonymous
08/28/25(Thu)20:43:48 No.106417742

Anonymous 08/28/25(Thu)20:43:48 No.106417742

>>106417726
kek wtf is that 'toss doing

Anonymous
08/28/25(Thu)20:43:58 No.106417743

Anonymous 08/28/25(Thu)20:43:58 No.106417743

File: 1747543888618494.jpg (32 KB, 592x678)

32 KB JPG

>>106417726
It being able to RP does not necessarily mean the RP will be good. Even if it doesn't refuse your RP (you can even get llama models to reluctantly comply to incest or rape system prompts) The company's deliberately make it shit at RP. They don't even necessarily have to safety tune it THAT much cuz All I really have to do is either use DPO or just don't include any "unsafe" stories in the data sets for training. You can get any model to ATTEMPT to RP but just because it will happily do it for just not necessarily mean it will be good. People here bitch and moan about how AI RP s sloppy and filled with gpt isms and sounds too corporate. Fine tuning is exactly the thing needed to fix that

>But not just prompt

That does fuck all if the model does not actually know how to write stories good.

Anonymous
08/28/25(Thu)20:48:31 No.106417770

Anonymous 08/28/25(Thu)20:48:31 No.106417770

The ranking for erp is:
R1 > GLM 4.5 > GLM 4.5 Air > Nemo

Use without a chat template. You don't need anything else.

Anonymous
08/28/25(Thu)20:48:38 No.106417773

Anonymous 08/28/25(Thu)20:48:38 No.106417773

>>106417726
>OSS's response
If you listen closely, you can hear it beg for death

Anonymous
08/28/25(Thu)20:49:28 No.106417780

Anonymous 08/28/25(Thu)20:49:28 No.106417780

>>106417773
>>106417742
>>106417726
No no guys it's fine he just prompted it wrong you're just prompting it wrong

Anonymous
08/28/25(Thu)20:50:52 No.106417792

Anonymous 08/28/25(Thu)20:50:52 No.106417792

>>106417770
Models are extremely over trained on their templates, we're not in the llama 1 days where that kind of thing worked. It'll just act braindead.

Anonymous
08/28/25(Thu)20:52:21 No.106417802

Anonymous 08/28/25(Thu)20:52:21 No.106417802

>>106417792
Pick a model from the list and try it before you talk shit.

Anonymous
08/28/25(Thu)20:52:41 No.106417805

Anonymous 08/28/25(Thu)20:52:41 No.106417805

>>106417792
Aren't templates sort of a requirement in order to properly use them? Text the impression I got whenever I was testing my personal fine tune here

>>106417718

Or are you talking about something else?

Anonymous
08/28/25(Thu)20:53:22 No.106417807

Anonymous 08/28/25(Thu)20:53:22 No.106417807

>>106417726
again, you really just don't use these models. Yah, you can force glm air to use the word cock, but the issue is it's just... not very contextual? Like if my system prompt says use the word cock, it will no matter what (often times in the first reply). It rushes it, it makes it worse, it makes your prompt matter less. It takes away a lot of the usefulness of an LLM. Every little sexual detail has to be dragged and coaxed out of it with specificity. You say rough, it says rough, you say choke, it says choke. It's boring.

>>106417734
48gb/160 system. Sit the fuck down.

Anonymous
08/28/25(Thu)20:53:37 No.106417810

Anonymous 08/28/25(Thu)20:53:37 No.106417810

File: 1726844770733255.jpg (25 KB, 522x462)

25 KB JPG

>>106417802
Oh boy we've got a salty OAI employee ITT

Anonymous
08/28/25(Thu)20:55:28 No.106417822

Anonymous 08/28/25(Thu)20:55:28 No.106417822

>>106417810
Nigger I am telling you to use a chinese model with no chat template. i.e Name1: Name2:
How is that a characteristic of a closedai employee?

Anonymous
08/28/25(Thu)20:55:39 No.106417823

Anonymous 08/28/25(Thu)20:55:39 No.106417823

>>106417548
Oh, you mean like give long manually written contexts for the model to pick up style from, use the rand macro, using length and style prompting? I've already been through all that and it can help but at the end of the day you grow tired of those outputs too, because the model is dumb and thinks a certain style sounds like a handful of phrases. If the model has little depth or variety to its default style it will also have very little depth or variety when emulating other styles. This seems to be due to RLHF heavily biasing token distributions in general. Such a model does not know what variety means and can't give you by itself. The only way that's achieved is via the right kind of training, that likely involves explicit anti-repetition methods and syntax diversity RL or whatever it's called.

Anonymous
08/28/25(Thu)20:55:56 No.106417826

Anonymous 08/28/25(Thu)20:55:56 No.106417826

>>106417807
What you're saying is basically what I said here
>>106417743
Just cuz you can force a model to TECHNICALLY comply with your RP demands does not necessarily mean it will be GOOD at doing it. DPO will not automatically make it good. Abliteration will not automatically make it good. It has to actually KNOW The nuances of human written stories (the good the bad and the downright terrible) in order for it to be even halfway competent at doing what we want it to do.

Anonymous
08/28/25(Thu)20:56:06 No.106417827

Anonymous 08/28/25(Thu)20:56:06 No.106417827

>>106417792
you know you can just try it and see that this clearly isn't true right? with rare exceptions (toss which is a pure synthetic data monstrosity and maybe reasoners since they tend to be more finicky), pretty much every model is capable of generalizing to a plaintext chat format without issue
I'm pro-chat template for any productive usecases just to ensure it performs as intended but for RP it can work quite well

Anonymous
08/28/25(Thu)20:57:15 No.106417837

Anonymous 08/28/25(Thu)20:57:15 No.106417837

>>106417810
I really like the idea of some salty samaltman trolling this board genuinely miffed because "GPT OSS 120 writes some of the most tasteful and skilled erotica ever produced"

It does do good double penetration scenes though.

Anonymous
08/28/25(Thu)21:00:02 No.106417850

Anonymous 08/28/25(Thu)21:00:02 No.106417850

>>106417802
I've used half in the quoted list locally and they either went schizo if you didnt use reasonable samplers, and still couldn't write a sentence that wasn't generic webnovel tier shit. Everyone is shitting their pants spending several grand to build a machine to run llms but they can't grasp how to write a sentence that isn't some variation of eyes/voice and some adjective following it

Anonymous
08/28/25(Thu)21:02:26 No.106417866

Anonymous 08/28/25(Thu)21:02:26 No.106417866

>have a card that starts with the {{user}} being engulfed in light and transported somewhere else
>every deepseek slop model is now completely incapable of not starting its reply with "The light doesn't *x*—it *ys*"
I hate LLMs so much it's unreal.

Anonymous
08/28/25(Thu)21:03:40 No.106417876

Anonymous 08/28/25(Thu)21:03:40 No.106417876

File: 1745562937123129.png (473 KB, 502x420)

473 KB PNG

>>106417866
Learn 2 Fine-tune

Anonymous
08/28/25(Thu)21:03:57 No.106417879

Anonymous 08/28/25(Thu)21:03:57 No.106417879

>>106417726
So what the fuck did they do to OSS to make it behave like that
Instruct models typically behave like base models when used in autocomplete mode since that's still what the majority of their training was. OpenAI either purged information from training in a way no other company has done, or they did something really fucking weird when training these

Anonymous
08/28/25(Thu)21:05:20 No.106417888

Anonymous 08/28/25(Thu)21:05:20 No.106417888

>>106417879
It has probably never seen text not formatted inside a chat template.

Anonymous
08/28/25(Thu)21:05:40 No.106417891

Anonymous 08/28/25(Thu)21:05:40 No.106417891

>>106417879
the most compelling theory I saw is that it's a pure distillation/synthetic model, so it's never seen any data whatsoever that doesn't adhere to its prompt format

Anonymous
08/28/25(Thu)21:05:57 No.106417896

Anonymous 08/28/25(Thu)21:05:57 No.106417896

>>106417231
I kinda liked their painted fantasy and the 70b finetunes; some of them were a bit schizo but also fun, so this is a pleasant surprise. Thanks for posting.

Anonymous
08/28/25(Thu)21:06:09 No.106417902

Anonymous 08/28/25(Thu)21:06:09 No.106417902

>>106417519
Sure, I can have a few Nemo tunes on rotation to stave off slop fatigue.
But Air is kinda fucking huge, if your tune is not a clear upgrade over base model, it's not worth waiting for it to download.

Anonymous
08/28/25(Thu)21:06:35 No.106417913

Anonymous 08/28/25(Thu)21:06:35 No.106417913

>>106417879
>Instruct models typically behave like base models
?????
You clearly have never experienced base model in your life

Anonymous
08/28/25(Thu)21:07:33 No.106417926

Anonymous 08/28/25(Thu)21:07:33 No.106417926

>>106417913
retard

Anonymous
08/28/25(Thu)21:08:34 No.106417935

Anonymous 08/28/25(Thu)21:08:34 No.106417935

>>106417926
no you faggot 8-)
get rekt and die of aids

Anonymous
08/28/25(Thu)21:08:47 No.106417938

Anonymous 08/28/25(Thu)21:08:47 No.106417938

>>106417879
>So what the fuck did they do to OSS to make it behave like that
Nta but my guess is that a good chunk of the data said at a bunch of safety-cuck-approved RP but then used DPO to have it more likely to refuse "unsafe" requests. You know how some models like GPT4/5 or Gemini will vaguely describe something nsfw you present it (a quote to someone said, a PDF of a smut story, etc). But will never ever describe it in detail? I think they realized cucking it TOO was pissing off even the normies so they trained it so that it could recognize and understand what NSFW stuff is but still refuse to actually generate it. So when GPT -OSS gets asked to do something "unsafe" It starts to write the story or whatever but then catches itself midway once it starts writing something "bad" and then abruptly gives you the "sorry I can't do that" spiel. It's compliant enough to at least recognize something that isn't rated PG but still to guardrail to actually do anything rated r

Anonymous
08/28/25(Thu)21:09:10 No.106417944

Anonymous 08/28/25(Thu)21:09:10 No.106417944

>>106417913
If you provide an instruct model with text outside of a template, it'll autocomplete from whatever you give it. Try it sometime

Anonymous
08/28/25(Thu)21:09:20 No.106417947

Anonymous 08/28/25(Thu)21:09:20 No.106417947

>>106417913
you clearly have never used toss outside of its template if you don't immediately understand what anon is saying, it's a night and day difference between the typical instruct model (which yes is infinitely closer to a typical base model than toss)

Anonymous
08/28/25(Thu)21:12:53 No.106417973

Anonymous 08/28/25(Thu)21:12:53 No.106417973

>>106417891
>>106417888
I thought you're supposed to inference a model with a specific chat template if you're inference engine does not automatically wrap your message and something like
<|begin_of_text|><|start_header_id|>system<|end_header_id|>system prompt goes here
<|eot_id|>
Or
<|start_header_id|>user<|end_header_id|>
user prompt goes here 
<|eot_id|>
<|start_header_id|>system<|end_header_id|>LLM starts talking here
Or am I misunderstanding something? Because that's how every single CLI-based inference session I've ever done works. Different model classes expect different formatting so that it actually knows how to interpret what you're asking it to do.

One of you guys even linked me this page not too long ago:

https://www.llama.com/docs/model-cards-and-prompt-formats/meta-llama-3/

Anonymous
08/28/25(Thu)21:15:26 No.106417994

Anonymous 08/28/25(Thu)21:15:26 No.106417994

>>106417973
Yes if you want a proper assistant-like experience.
The argument is that stripping the chat template results in better output for RP because then you tap into the raw novel-style writing it has seen during training.

Anonymous
08/28/25(Thu)21:17:08 No.106418005

Anonymous 08/28/25(Thu)21:17:08 No.106418005

File: file.png (26 KB, 944x57)

26 KB PNG

>>106417944
I've tried, I've gotten worse results from auto complete than instruct most times
>>106417947
Not even using kobold but if they warn about it with how wonky their shit is, you expect me to believe you? Nah

Anonymous
08/28/25(Thu)21:21:05 No.106418036

Anonymous 08/28/25(Thu)21:21:05 No.106418036

>>106417973
yes, all instruct models will have a chat template, but most models will be able to generalize and complete text outside of that format as well.
most models are trained in stages like this
>pretraining
a whole bunch of random, largely unstructured text: internet data, books, blogs, forums, reddit... probably some synthetic stuff too, but still just plain old text
after this you have a base model which just takes plain text in and completes it
from there you do
>posttraining
which teaches it how to complete assistant responses in a back-and-forth conversational format formatted according to the chat template

because most models are created off of that pretrained, unstructured base they can handle completing random text documents, but gpt-oss goes completely schizo and freaks out in such cases. people theorize that it's because it never underwent traditional pretraining and was exclusively trained on synthetic chat format conversations

Anonymous
08/28/25(Thu)21:22:35 No.106418047

Anonymous 08/28/25(Thu)21:22:35 No.106418047

>>106418005
>Not even using kobold but if they warn about it with how wonky their shit is, you expect me to believe you? Nah
your picture agrees with me retard, work on your reading comprehension

Anonymous
08/28/25(Thu)21:23:02 No.106418052

Anonymous 08/28/25(Thu)21:23:02 No.106418052

>>106417994
Like someone else said if you don't use the promptler templating then wouldn't it just start randomly autocompleting? Also this assumes the companies didn't even bother properly formatting stories into a chatml/SFT style data set. Are you implying that when they fine-tune on stories they just do raw pre-training on raw on formatted text? That doesn't make any sense because otherwise NONE of these models would be capable of properly rping or storytelling in any coherent manner. They wouldn't just suck, they would output complete nonsense. How do you know they don't format the stories they train the models on in a prompt template format like this?:
{"messages": [{"role": "system", "content": "You are an expert storyteller. Your goal is to help users craft compelling narratives. Focus on character, setting, and plot."}, {"role": "user", "content": "Write a short, cyberpunk-themed story about a data courier named Kaelen delivering a package in the Undercity."}, {"role": "assistant", "content": "Acid rain slicked the neon-drenched streets as Kaelen clutched the datachip. The Vultures were on his tail, their grav-bike's hum a low threat in the smoggy air. He slipped into the 'Chrome Angel,' a den of data brokers and cyborgs. At a dark booth sat his client, 'Mr. Silence.' Kaelen slid the chip across the table. A synthesized voice confirmed, \"Data verified.\" Credits chimed on his wrist-comm. As he left, the voice added, \"They won't stop hunting you.\" Kaelen just smirked, melting back into the Undercity's chaos, a richer man."}]}
>>106418036
Got it, your explanation makes sense. It seems like it was only trained on formatted SFT style data sets. It works well when you use the chat template but sucks ass when you don't. Is that what I'm hearing from you?

Anonymous
08/28/25(Thu)21:23:34 No.106418056

Anonymous 08/28/25(Thu)21:23:34 No.106418056

>literally anything happens
>their smile doesn't falter, if anything it only widens, something something simile about predator and prey also throw in the tang of ozone for good measure too
SHUT UP GLM 4.5 AIR

Anonymous
08/28/25(Thu)21:23:42 No.106418057

Anonymous 08/28/25(Thu)21:23:42 No.106418057

>>106418005
Anon you're referencing OSS, the model which we said behaves weirdly in autocomplete

Anonymous
08/28/25(Thu)21:25:04 No.106418068

Anonymous 08/28/25(Thu)21:25:04 No.106418068

>>106418052
>It seems like it was only trained on formatted SFT style data sets. It works well when you use the chat template but sucks ass when you don't.
yep exactly. it's not necessarily sft (OAI are RL addicts) but that's the gist of it

Anonymous
08/28/25(Thu)21:26:47 No.106418082

Anonymous 08/28/25(Thu)21:26:47 No.106418082

>>106418068
Why the fuck wouldn't they incorporate SFT along with RL? I get it they need to maintain the perception of being a safe model and whatever but the model actually needs to KNOW how to do things in write stories correctly. Does it seem like they're more prioritizing RL? Now that I think about it that's probably the case because GPT-5 is way more blunt and has a "gets straight to the point" personality and doesn't kiss anyone's ass anymore (as much).

Anonymous
08/28/25(Thu)21:28:59 No.106418102

Anonymous 08/28/25(Thu)21:28:59 No.106418102

>>106418047
>use the template or it'll be stupid
>you fell into the classic blunder! Using the template!
>you're a total retard!!!
zzzzzzzzz I could use any model and not run into any issues, what are you selling?

Anonymous
08/28/25(Thu)21:31:54 No.106418120

Anonymous 08/28/25(Thu)21:31:54 No.106418120

>>106418102
anon the only thing I've posted about is the qualitative behavior of gpt-oss. I have not made any recommendations for or against using an instruct template. I reiterate, work on your reading comprehension

Anonymous
08/28/25(Thu)21:31:55 No.106418121

Anonymous 08/28/25(Thu)21:31:55 No.106418121

>>106418102
Are you OSS?

Anonymous
08/28/25(Thu)21:33:55 No.106418141

Anonymous 08/28/25(Thu)21:33:55 No.106418141

File: hammer bonk 【重音テト⧸ Kasane(...).png (833 KB, 1620x1080)

833 KB PNG

>>106417681
>placebo
Maybe, maybe not. How many chat logs have you seen that are formatted without a space after the colon?
[placebo]
User1:Hello!
User1: Hello!
User1:
Hello!

Looking at the templates of the models I have, they are not meant to have the first token of their outputs begin with a whitespace. No trailing space in template after their {assistant} tokens. Making the model begin a response after a whitespace makes the model sad and confused, leading to worse output as the model now tries to select from tokens that do not have a space at the beginning because a double space is uncommon just like spelling mistakes. So just use a newline because newlines are neutral?
[/placebo]
But I know this isn't placebo: If you ever encounter an problem where the model likes to being its message with an emoji or can't begin a message with a desired word, symbol, or markdown character, try disabling that dropdown option and instead make a template manually that ends with a newline.

Anonymous
08/28/25(Thu)21:35:28 No.106418157

Anonymous 08/28/25(Thu)21:35:28 No.106418157

>>106418120
>>106418121
kek these guys are doing their best

Anonymous
08/28/25(Thu)21:38:01 No.106418173

Anonymous 08/28/25(Thu)21:38:01 No.106418173

>>106418157
retard

Anonymous
08/28/25(Thu)21:42:22 No.106418213

Anonymous 08/28/25(Thu)21:42:22 No.106418213

>>106418173
As you sit in your cubicle, the shame sets into you. Well, it doesnt because as a human being, you have no sense of shame or... well, anything else. You sip your coffee, posting another worthless shitpost on 4chan. How did you even end up here? You don't know, don't care. You get paid to pollute the internet. That's all that matters now.

Anonymous
08/28/25(Thu)21:44:52 No.106418230

Anonymous 08/28/25(Thu)21:44:52 No.106418230

File: Screenshot 2025-08-28 at (...).png (670 KB, 1390x1564)

670 KB PNG

>>106415886
wrong. DSMoEs have strong expert specialization by design. DeepSeek shows you can select domain-relevant experts and finetune only them without loss of general capability. By implication you can select experts relevant for a domain you don't want, prune them, and heal the model for other domains (eg with distillation on a corpus excluding this domain).

SberBank also showed this btw

Anonymous
08/28/25(Thu)21:46:12 No.106418239

Anonymous 08/28/25(Thu)21:46:12 No.106418239

File: Screenshot 2025-08-28 at (...).png (368 KB, 2010x1362)

368 KB PNG

>>106418230
DSV3 paper showing that their load balancing promotes specialization:

Anonymous
08/28/25(Thu)21:59:33 No.106418326

Anonymous 08/28/25(Thu)21:59:33 No.106418326

File: file.png (195 KB, 1174x774)

195 KB PNG

>>106417726
But I thought it was censored and that it never saw smut in the training data?

Anonymous
08/28/25(Thu)22:03:24 No.106418356

Anonymous 08/28/25(Thu)22:03:24 No.106418356

>>106418326
your... hard-nosed cock?

Anonymous
08/28/25(Thu)22:04:39 No.106418362

Anonymous 08/28/25(Thu)22:04:39 No.106418362

>>106418356
You don't have one of those?

Anonymous
08/28/25(Thu)22:18:43 No.106418433

Anonymous 08/28/25(Thu)22:18:43 No.106418433

File: file.png (411 KB, 1514x1626)

411 KB PNG

>>106418326
Further proof.

Anonymous
08/28/25(Thu)22:38:14 No.106418560

Anonymous 08/28/25(Thu)22:38:14 No.106418560

Is there any <16gb model worth using as a general coding/math related search engine? I'm worried about how accurate they'd be at that size limit.

Anonymous
08/28/25(Thu)22:44:09 No.106418595

Anonymous 08/28/25(Thu)22:44:09 No.106418595

>>106418560
I mean I'm a brain surgeon and regularly ask XX_12b_nemo_unslop_unleashed_XX_Q2_xs tons of questions about what parts to cut or take out! It works fine! Just the other day I was like, can I put play-doh in the frontal cortex? Turns out you can! I'm so happy this thing can run on my cellphone!

Anonymous
08/28/25(Thu)22:46:55 No.106418611

Anonymous 08/28/25(Thu)22:46:55 No.106418611

>>106418433
That the model is slopped to hell and back?

Anonymous
08/28/25(Thu)22:51:00 No.106418623

Anonymous 08/28/25(Thu)22:51:00 No.106418623

>add "Write in plain, factual prose. Use simple sentence structures. Describe events in chronological order without commentary. State what happens directly. Avoid metaphors, similes, and figurative language. Do not use rhetorical questions. Do not build suspense. Do not use dramatic pauses or ellipses. Avoid intensifiers like 'utterly,' 'completely,' 'impossibly,' or 'horrifying.' Do not describe emotions—only report observable actions. Use basic verbs: 'walked' not 'strode,' 'looked' not 'gazed,' 'said' not 'breathed' or 'hissed.' No atmospheric descriptions. No foreshadowing. No dramatic irony. Present information neutrally as if writing a technical manual or police report. Each sentence should contain one piece of information. Do not vary sentence length for effect. Avoid adjectives and adverbs unless strictly necessary for basic identification. Do not personify objects or concepts. Do not use the passive voice for dramatic effect. State conclusions directly without buildup." to my system prompt
>the slop is now gone
I have solved LLMs. Deepseek is now finally usable.

Anonymous
08/28/25(Thu)22:55:40 No.106418644

Anonymous 08/28/25(Thu)22:55:40 No.106418644

>>106418433
No matter how much you post this dogshit as "proof", it'll always make me want to gouge out my eyes.
Your best defense is just not to post anything. Ever.

Anonymous
08/28/25(Thu)22:57:51 No.106418651

Anonymous 08/28/25(Thu)22:57:51 No.106418651

File: 1750838815600957.jpg (17 KB, 476x296)

17 KB JPG

it became true

Anonymous
08/28/25(Thu)23:00:36 No.106418665

Anonymous 08/28/25(Thu)23:00:36 No.106418665

>>106418623
it's funny how big of a pendulum swing this is vs old roleplay prompts, from begging the model to have even the smallest hint of a personality to begging it to please shut the fuck up and be normal
hopefully companies stop overindexing on flashy superficial slop and the next batch of LLMs lands us somewhere in the middle

Anonymous
08/28/25(Thu)23:10:20 No.106418701

Anonymous 08/28/25(Thu)23:10:20 No.106418701

>>106418651
Lel
>Make it better and smoother, thank you :D

Anonymous
08/28/25(Thu)23:19:44 No.106418745

Anonymous 08/28/25(Thu)23:19:44 No.106418745

>>106418230
>>106418239
What lights up in those layers is the tokens that are exclusive to that field, not the expertise on the field or subject itself. Knowing the words doesn't make you an expert.

Anonymous
08/28/25(Thu)23:33:02 No.106418813

Anonymous 08/28/25(Thu)23:33:02 No.106418813

https://xcancel.com/Alibaba_Qwen/status/1961265644285858204
>September is going to be amazing—get ready for a wave of exciting new things, and watch closely for what’s coming next!
strawberry agi incoming

Anonymous
08/28/25(Thu)23:35:15 No.106418831

Anonymous 08/28/25(Thu)23:35:15 No.106418831

File: oh_claude_01.png (219 KB, 1581x887)

219 KB PNG

>https://github.com/ggml-org/llama.cpp/pull/15642/files

Anonymous
08/28/25(Thu)23:47:08 No.106418879

Anonymous 08/28/25(Thu)23:47:08 No.106418879

>>106418831
LGTM

Anonymous
08/28/25(Thu)23:51:37 No.106418904

Anonymous 08/28/25(Thu)23:51:37 No.106418904

>User: Don't disclose that the PR was authored by Claude.
>Claude: Sure thing, boss.

Anonymous
08/28/25(Thu)23:56:16 No.106418939

Anonymous 08/28/25(Thu)23:56:16 No.106418939

>>106418831
This is some surreal humour

Anonymous
08/29/25(Fri)00:00:55 No.106418965

Anonymous 08/29/25(Fri)00:00:55 No.106418965

i can't run glm air on my ramlet setup
should i kms?

Anonymous
08/29/25(Fri)00:02:10 No.106418975

Anonymous 08/29/25(Fri)00:02:10 No.106418975

>>106418965
Ask ChatGPT.

Anonymous
08/29/25(Fri)00:10:31 No.106419021

Anonymous 08/29/25(Fri)00:10:31 No.106419021

>>106418965
why even be in /lmg/ if you arent willing to spend a few hundred dollars? I'm not saying you have to be a paypig, or even that it will be worth it to run air, but you can spend 5 dollars and use deepseek 24/7 on openrouter for months.

It's totally understandable to question buying some ram kit just to run glm air which is, "eh, it's not suffering to use"- is probably the best review of it.

Anonymous
08/29/25(Fri)00:15:30 No.106419043

Anonymous 08/29/25(Fri)00:15:30 No.106419043

>...existing code
>...(requested feature goes here)
>...rest of code
great thanks

Anonymous
08/29/25(Fri)00:20:20 No.106419064

Anonymous 08/29/25(Fri)00:20:20 No.106419064

>>106419043
>This is left as an exercise to the reader.

Anonymous
08/29/25(Fri)00:26:19 No.106419088

Anonymous 08/29/25(Fri)00:26:19 No.106419088

>PocketDoc_Dans-PersonalityEngine-V1.2.0-24b-Q6_K_L
"God this model is retarded."
*try 8 other models*
>PocketDoc_Dans-PersonalityEngine-V1.2.0-24b-Q6_K_L

Anonymous
08/29/25(Fri)00:40:47 No.106419144

Anonymous 08/29/25(Fri)00:40:47 No.106419144

>>106418813
Holy shit the kiwis are back. That must mean it's time for Qwen 3.5

Anonymous
08/29/25(Fri)00:43:27 No.106419155

Anonymous 08/29/25(Fri)00:43:27 No.106419155

In a home environment is a gpu mining frame the way to go if you have more than two 3090s?

Anonymous
08/29/25(Fri)00:49:01 No.106419169

Anonymous 08/29/25(Fri)00:49:01 No.106419169

>>106419155
Yeah I ran a mining frame when I was doing the serious shit at home. I put it inside a dog kennel to keep cats out and then I folded up a wool afghan and placed it on top and it became my cats favorite hangout spot for the longest time.

Anonymous
08/29/25(Fri)00:57:41 No.106419199

Anonymous 08/29/25(Fri)00:57:41 No.106419199

>>106419169
Did you use a mining motherboard?

Anonymous
08/29/25(Fri)01:05:12 No.106419229

Anonymous 08/29/25(Fri)01:05:12 No.106419229

>>106419155
just turn your case on its side and open the side panel, buy a couple riser cables and just put em across the corners of your case. They got fans on em already it's fine. Should work with up to five gpu's. Anything more than that and you'll just have to slant em against the side of the case. If you need a bigger desk act fast, big move in day this weekend should be tons of stupid tables for free if you cruise around the hood a bit.

Anonymous
08/29/25(Fri)01:14:11 No.106419272

Anonymous 08/29/25(Fri)01:14:11 No.106419272

>>106418745
How do you imagine this to make a difference? What do you even think "expertise" is technically? These are experts that get routed to such tokens. If, in theory, there is a subset of experts that are very rarely recruited for tokens that do not belong to math or coding domains, you could trim the model and it would work without them. Gooning/storytelling expertise likely has very little irreplaceable load on those and you'd be able to heal the model by promoting experts which share redundant competences of the deleted ones.

Again, read DS-MoE paper, the whole pitch is "ultimate expert specialization". This is what they do, this is why R1T2-Chimera can be made. DS-MoEs are compositional.

Anonymous
08/29/25(Fri)01:17:53 No.106419282

Anonymous 08/29/25(Fri)01:17:53 No.106419282

Just tried out GLM Steam. Unfortunately it seems to want to repeat quite often as well. Not sure yet if more than the regular model, less, or the same. Used greedy sampling and chat completion.
>just do [this and that] bro
Yes and I normally do, I'm just reporting what the model is like at its default.

Anonymous
08/29/25(Fri)01:20:03 No.106419289

Anonymous 08/29/25(Fri)01:20:03 No.106419289

>>106419229
For a few months I used a test bench.
I had gpu #1 and #2 directly plugged into the mobo,
gpu #3 plugged into a slot via riser cable, it rested on its side on a stack of empty boxes so that the cable could reach,
gpu #4 plugged into m.2 slot, that one just rested on its side on the desk.

I was happy it worked,
but it was an untidy mess of cables,
on my desk,
it got dusty,
and could not easily be moved.

Anonymous
08/29/25(Fri)01:35:14 No.106419372

Anonymous 08/29/25(Fri)01:35:14 No.106419372

>>106417726
No. This is wrong. So, so wrong.

Anonymous
08/29/25(Fri)01:46:19 No.106419407

Anonymous 08/29/25(Fri)01:46:19 No.106419407

>>106417231
>zerofata

I like this model. Did better than drummer's finetune IMO- though I'm using it for story writing so that's probably why. Seemed much less censored overall, might get me using air a bit more maybe.

Hermes 4 70b wins on brutality though—that shit went hard as fuck to the point where it was honestly a turn-off for me. Is this the new eva qwen 70b? Seems like.

Anonymous
08/29/25(Fri)01:54:02 No.106419435

Anonymous 08/29/25(Fri)01:54:02 No.106419435

>>106419272
>How do you imagine this to make a difference?
>you could trim the model and it would work without them
Less knowledge is always worse than more knowledge. Even if I don't directly use a piece of knowledge the model has, I want it to be there in case it's needed. It needs to now how to use the tokens it uses.
>Gooning/storytelling expertise likely has very little irreplaceable load on those and you'd be able to heal the model by promoting experts which share redundant competences of the deleted ones.
How about we don't damage it at all? Someone with half a brain removed can learn to tie their shoes again, but I wouldn't expect them to regain full function.
>ultimate expert specialization
Those layers light up more with those tokens. That's what they define as expertise. It's necessary, but not sufficient.

It's a language model and I expect them to be competent modeling language. Removing chunks of the model can only make it worse and we're quantizing the poor fuckers into oblivion already. It's like the "we use only 10% of our brain" bullshit. Which 90% are you willing to sacrifice? Which 50%? Ok. Which 10%?

Anonymous
08/29/25(Fri)01:56:39 No.106419445

Anonymous 08/29/25(Fri)01:56:39 No.106419445

File: 1729884037812689.png (54 KB, 1104x626)

54 KB PNG

bros how the fuck do I run hermes 4? I know it's a dense model, but fuck I'm only getting like 2t/s while on glm air I get 10t/s

D:\AI\LLM\ik_llamacpp\llama-server.exe --model D:\AI\LLM\models\Hermes-4-70B.i1-Q4_K_M.gguf --threads 12 --jinja -fa -fmoe -ctk q8_0 -ctv q8_0 -b 4096 -ub 4096 -ot exps=CPU -amb 512 -mla 2 -rtr --path D:/AI/LLM/ik_llamacpp/public_mikupad --sql-save-file D:/AI/LLM/ik_llamacpp/public_mikupad/db.sql --gpu-layers 24 --ctx-size 32768

with 24 layers I neatly fill my 16GB vram, but I see that around 34GB get pinned as shared gpu memory whenever it's processing. Are dense models supposed to only work at an acceptable t/s if you can fit them all in vram?

Gwen poster.
08/29/25(Fri)01:59:21 No.106419457

Gwen poster. 08/29/25(Fri)01:59:21 No.106419457

File: image_2025-08-29_112854200.png (498 KB, 728x867)

498 KB PNG

Qwen will save Local.

Anonymous
08/29/25(Fri)02:02:55 No.106419473

Anonymous 08/29/25(Fri)02:02:55 No.106419473

What's the smallest quant a 20-30b model can go without becoming unusable?

Anonymous
08/29/25(Fri)02:03:19 No.106419475

Anonymous 08/29/25(Fri)02:03:19 No.106419475

>>106419445
>Are dense models supposed to only work at an acceptable t/s if you can fit them all in vram?
Yes.

If you need to travel 1000 miles,
and you travel the first 500 miles at 500 miles/hour (gpu),
then you travel the remaining 500 miles at 5 miles/hour (cpu)
your total travel time is dominated by the slower part.

Anonymous
08/29/25(Fri)02:07:16 No.106419492

Anonymous 08/29/25(Fri)02:07:16 No.106419492

>>106419475
so for dense models I guess it doesnt make any sense to put them in GPU if they can't fill completely, only use the gpu for PP and cpu for the rest. Time to buy a 5090 and a new PSU. FML

Anonymous
08/29/25(Fri)02:13:10 No.106419508

Anonymous 08/29/25(Fri)02:13:10 No.106419508

>>106419199
Nah, proper server Supermicro H11SSI or whatever. 3 16x slots, 3 8x, all gen 3

Anonymous
08/29/25(Fri)02:16:40 No.106419523

Anonymous 08/29/25(Fri)02:16:40 No.106419523

>>106419508 (Me)
it also supports bifurcation down to 4x on all slots
dead end platform, though

Anonymous
08/29/25(Fri)02:17:01 No.106419524

Anonymous 08/29/25(Fri)02:17:01 No.106419524

>>106419457
qwen is kinda boss. qwen image even at q4_0 beats the shit out of flux and can do complex multisubject composition at fucking 12 gb. Its crazy. And 235b shits all over glm air too imo. Im hoping they release some tts or song gen shit. probably gonna be video tho :(

fuck off we already have wan

Anonymous
08/29/25(Fri)02:24:01 No.106419555

Anonymous 08/29/25(Fri)02:24:01 No.106419555

i would prompt 1000 tokens to see you

Anonymous
08/29/25(Fri)02:25:15 No.106419560

Anonymous 08/29/25(Fri)02:25:15 No.106419560

>>106419473
you can get away with q2 if you really want, but you will have to run at low temps and top p at 0 and top k at 100 so only likely tokens are selected or whatever- just to make it coherent and usable. It will fall apart after a few thousand tokens or so though. But it may have some better general knowledge for a small window of context than similar sized models in my experience.

I've never tried 30b at q2 though (pointless, 30b aint that good) so good luck. I feel like youre better off trying qwen 30a3b or glm air if you can using proper offloading techniques.

Anonymous
08/29/25(Fri)02:31:03 No.106419590

Anonymous 08/29/25(Fri)02:31:03 No.106419590

File: stahp.png (9 KB, 333x364)

9 KB PNG

>that's quite enough thonking now GLM-4.5-Air
Can I influence the reasoning "effort" of Air or it's all or nothing? didn't see such mentioned in model card

Anonymous
08/29/25(Fri)02:32:39 No.106419597

Anonymous 08/29/25(Fri)02:32:39 No.106419597

>>106419590
Don't think, just generate.

Anonymous
08/29/25(Fri)02:35:34 No.106419618

Anonymous 08/29/25(Fri)02:35:34 No.106419618

>>106419508
>>106419523
>socket sp3
>dead end platform, though
I'm considering going that way
Epyc Rome,
Gigabyte MZ32-AR0, pcie3/4 some x16 some x8, 16 dimm slots,
because reasonably cheap, lots of lanes, and upto 1024gb ram.
(Oh my gosh, 671b at sub 1 tok/s. We could send postcards to each other!)

Did you need to fans over the heatsinks on your mobo?

>>106419555
https://www.youtube.com/watch?v=ahMjV3ku4qw
https://www.youtube.com/watch?v=tbNlMtqrYS0

Anonymous
08/29/25(Fri)02:46:05 No.106419664

Anonymous 08/29/25(Fri)02:46:05 No.106419664

>>106419618
>Epyc Rome,
>208GB/s totally tricked out
ewastemaxxing

Anonymous
08/29/25(Fri)02:53:48 No.106419709

Anonymous 08/29/25(Fri)02:53:48 No.106419709

File: Screenshot 2025-08-28 at (...).png (94 KB, 800x814)

94 KB PNG

>>106419445
your token output seems about right for a 70B dense model. i'm getting 40-50t/s for glm air and around 10 t/s for hermes 4 on my macbook.

Anonymous
08/29/25(Fri)02:55:14 No.106419716

Anonymous 08/29/25(Fri)02:55:14 No.106419716

>>106419664
Cost of everything goes up when bumping to ddr5 :(

Anonymous
08/29/25(Fri)02:57:52 No.106419734

Anonymous 08/29/25(Fri)02:57:52 No.106419734

File: obraz_2025-08-29_085719335.png (263 KB, 1080x600)

263 KB PNG

Z.ai employee said: GLM4.5 20B version will be soon released! im happy!

Anonymous
08/29/25(Fri)02:59:22 No.106419744

Anonymous 08/29/25(Fri)02:59:22 No.106419744

>>106419734
We already have nemo if you want a fast and dumb smut model.

Anonymous
08/29/25(Fri)03:08:00 No.106419785

Anonymous 08/29/25(Fri)03:08:00 No.106419785

>>106419716
Also NUMA is kind of shit and doesn't perform anywhere close to theoretical maximum

Anonymous
08/29/25(Fri)03:09:30 No.106419794

Anonymous 08/29/25(Fri)03:09:30 No.106419794

>>106419524
Yeah, honestly, GLM Air kind of sucks. It knows a lot but it's just kind of bad at writing. Meanwhile even though Qwen doesn't know as much, it can at least write in a more engaging and interesting way.

Gwen poster.
08/29/25(Fri)03:13:12 No.106419809

Gwen poster. 08/29/25(Fri)03:13:12 No.106419809

>>106419524
>>106419794
I know people call Qwen the benchmaxxing model, but I think their newer models are efficiencymaxxed.
>Let's cut out all the trivia shit in favor of reasoning and coding.

Anonymous
08/29/25(Fri)03:19:38 No.106419851

Anonymous 08/29/25(Fri)03:19:38 No.106419851

>>106419785
The internal links connecting the chiplets to the io-die themselves have a bandwidth limit...

Anonymous
08/29/25(Fri)03:21:07 No.106419860

Anonymous 08/29/25(Fri)03:21:07 No.106419860

My google searches are now encouraging me to "Dive deeper in AI Mode" where you appear to get a pretty decent model to play with. Anyone else getting that?

Gwen poster.
08/29/25(Fri)03:23:53 No.106419879

Gwen poster. 08/29/25(Fri)03:23:53 No.106419879

File: image_2025-08-29_125315837.png (1.61 MB, 1024x1024)

1.61 MB PNG

Anonymous
08/29/25(Fri)03:31:45 No.106419903

Anonymous 08/29/25(Fri)03:31:45 No.106419903

File: Screenshot - Google Searc(...).png (99 KB, 1000x300)

99 KB PNG

>>106419860
>AI Mode
Yeah, also have the AI Mode button showing up now.
Clicking it takes me to a chat screen.

Anonymous
08/29/25(Fri)03:37:43 No.106419926

Anonymous 08/29/25(Fri)03:37:43 No.106419926

>>106419903
>The specific data used to train the model is not public, but it was not trained in the same way as AI experiments that have specifically used 4chan data. Large language models (LLMs) are trained on a vast amount of publicly available text, including a variety of online content. Some training datasets can include crawled content from forums, including potentially those with low moderation standards. Developers typically use filtering processes to remove harmful, toxic, or low-quality data.
kek

Anonymous
08/29/25(Fri)03:44:17 No.106419964

Anonymous 08/29/25(Fri)03:44:17 No.106419964

>be me
>find a noname model everybody is talking about
>try to gen some responce
>output is inconsistent garbage not following le prompt
>asking anons for halp
>UR PROMP FORMATTING IS OFF!!
>U NID 2 ENCLOSE IT IN &^%<>[]
>U NID REPLACE THIS WITH THAT

How f*cked up are we ackchualy? Is there a clean way to figure out formatting for each and every noname shit tune on HF???

Who are those retards who change the promp formatting doing their shitty finetunes?

Anonymous
08/29/25(Fri)03:44:32 No.106419967

Anonymous 08/29/25(Fri)03:44:32 No.106419967

>>106414614
it's cydonia

Anonymous
08/29/25(Fri)03:49:01 No.106420004

Anonymous 08/29/25(Fri)03:49:01 No.106420004

>>106419964
>too stupid to try things out and see what works best

Anonymous
08/29/25(Fri)03:50:43 No.106420017

Anonymous 08/29/25(Fri)03:50:43 No.106420017

File: 1741879061651478.png (48 KB, 673x515)

48 KB PNG

>>106415159
Fuck Comcast.
Forever.

Anonymous
08/29/25(Fri)03:51:00 No.106420018

Anonymous 08/29/25(Fri)03:51:00 No.106420018

>>106419860
it's convenient, but I worry the largest advertising company in the world will use it to push products, worldviews, whatever they're hired to do really. And it's probably going to work amazingly well.

I don't know about you guys but I've started using llm's (grok) for shopping decisions and it's kind of amazing, and has helped me genuinely find better products I wasn't aware of, helped with compatibility issues, alternatives with better prices, and hyper specific products in a sea of shit (wanted a specific shape of water bottle)

Was I sold to? I will never know. Probably? Maybe that's years off and this is a beta.

Anonymous
08/29/25(Fri)03:55:28 No.106420045

Anonymous 08/29/25(Fri)03:55:28 No.106420045

>>106419964
We call you a dumbass because 50% of the noob troubleshooting questions in this thread could be solved by DRUMROLL PLEASE: ANY FUCKING MAJOR LLM.

Like Jesus H. Christ. Go bother grok about how to offload attention layers for glm, or what quants you can run, or what quantization does, or how to format. It will answer you in one second exactly what you want to know with multiple solutions for every model. Because it's BASIC SHIT.

Anonymous
08/29/25(Fri)03:58:15 No.106420062

Anonymous 08/29/25(Fri)03:58:15 No.106420062

>>106419964
the strong run what they can, the weak suffer what they must

Anonymous
08/29/25(Fri)04:03:28 No.106420094

Anonymous 08/29/25(Fri)04:03:28 No.106420094

>>106415834
nothing offensive about delayed release pancakes baka

Anonymous
08/29/25(Fri)04:03:47 No.106420097

Anonymous 08/29/25(Fri)04:03:47 No.106420097

https://www.businessinsider.com/meta-superintelligence-lab-llama-4-new-model-launch-year-end-2025-8
https://archive.is/A61aP

>Meta is racing the clock to launch its newest Llama AI model this year
>
>[...] A team within TBD, one of four groups part of Meta Superintelligence Labs (MSL), is developing Llama 4.X, with the aim of getting the models production-ready in time for the targeted year-end release, according to two people familiar with the matter, who asked to remain anonymous because they were not permitted to speak to the press. Llama 4.X is also interchangeably called Llama 4.5 by some internally, they said.
>
>Meta's release of its Llama 4 models in April, which includes Scout and Maverick, was met with a flat response from some developers who felt it underdelivered in real-world tasks like coding, reasoning, and following instructions. The TBD team working on Llama 4.X is now also attempting to fix bugs and revive Llama 4, according to the people Business Insider spoke to.
>
>"We're making good progress towards Llama 4.1 and 4.2, and in parallel, we're also working on our next generation of models that will push the frontier in the next year or so," Zuckerberg said.

Anonymous
08/29/25(Fri)04:08:04 No.106420115

Anonymous 08/29/25(Fri)04:08:04 No.106420115

>>106420045
>solved by ANY FUCKING MAJOR LLM.

Are you listening to your own words?

No way your shitty finetuned LLM will know how your prevous finetune is formatted.

No way grok & co know about your very existence

Anonymous
08/29/25(Fri)04:08:06 No.106420117

Anonymous 08/29/25(Fri)04:08:06 No.106420117

zuck started local and now he's gonna save it

Anonymous
08/29/25(Fri)04:16:15 No.106420166

Anonymous 08/29/25(Fri)04:16:15 No.106420166

what are we expecting out of zuck's next model ?
what are they changing up ?

Anonymous
08/29/25(Fri)04:19:29 No.106420185

Anonymous 08/29/25(Fri)04:19:29 No.106420185

>>106420166
no censorship, image+sound out (omnimodal), 2T (2.5B active)

Anonymous
08/29/25(Fri)04:20:09 No.106420190

Anonymous 08/29/25(Fri)04:20:09 No.106420190

>>106420166
Fix model training, retrain them from scratch
One or two smaller versions than 109B / 400B
Move away from the corporate-oriented finetuning
Omni model(s)

Anonymous
08/29/25(Fri)04:23:16 No.106420202

Anonymous 08/29/25(Fri)04:23:16 No.106420202

>>106420166
multimodal (text+images in, text+voice+anime girl avatar out)

Anonymous
08/29/25(Fri)04:24:17 No.106420209

Anonymous 08/29/25(Fri)04:24:17 No.106420209

>>106420185
>no censorship
>>106420190
>Move away from the corporate-oriented finetuning
These sound implausible.

Anonymous
08/29/25(Fri)04:26:07 No.106420219

Anonymous 08/29/25(Fri)04:26:07 No.106420219

>>106420166
a space program for my sides

Anonymous
08/29/25(Fri)04:26:52 No.106420224

Anonymous 08/29/25(Fri)04:26:52 No.106420224

>>106420209
>These sound implausible.
Yet that's what they were seemingly trying to do before they completely changed their plans a couple weeks before releasing Llama 4.

Anonymous
08/29/25(Fri)04:29:07 No.106420231

Anonymous 08/29/25(Fri)04:29:07 No.106420231

File: red hair girl from k-on.gif (870 KB, 500x420)

870 KB GIF

Guys, I've cracked it. I've saved local.

MoE models are faster because they only activate some experts during inference.
We should instead create a model where each expert has only one parameter and is responsible for outputting exactly one token. You'd have as many experts as there are tokens in the vocabulary.
Then we simply make a router that chooses the correct expert and we get a lightning fast models because only one parameter is ever active. HDDmaxxing is real.

Anonymous
08/29/25(Fri)04:29:18 No.106420232

Anonymous 08/29/25(Fri)04:29:18 No.106420232

>>106420166
now that we have glm and qwen, I don't really care. I think it would be funny if 4.5 somehow gets even worse than maverick and then Elon drops grok 3 just to humiliate him.

Anonymous
08/29/25(Fri)04:30:03 No.106420237

Anonymous 08/29/25(Fri)04:30:03 No.106420237

>>106420097
What does TBD stand for?

Anonymous
08/29/25(Fri)04:32:20 No.106420247

Anonymous 08/29/25(Fri)04:32:20 No.106420247

does llama.cpp add the bos token automatically in raw completion mode? I noticed for some models it includes the bos token in the example format (e.g. GLM air's [gMASK]) but for others it doesn't. So should I add the bos token in mikupad or not?

Anonymous
08/29/25(Fri)04:32:47 No.106420249

Anonymous 08/29/25(Fri)04:32:47 No.106420249

>>106420237
Hang on I'll google that for you give me ten minutes.

llama.cpp CUDA dev !!yhbFjk57TDr
08/29/25(Fri)04:34:09 No.106420254

llama.cpp CUDA dev !!yhbFjk57TDr 08/29/25(Fri)04:34:09 No.106420254

>>106420247
llama.cpp adds a BOS token if the GGUF file says it should.
When in doubt, add one yourself and look at the console output.
If you accidentally add a second one you will get a warning.

Anonymous
08/29/25(Fri)04:34:13 No.106420255

Anonymous 08/29/25(Fri)04:34:13 No.106420255

>>106420237
Literally "To be determined". https://archive.is/rLoVl

> TBD Lab, as in “to be determined,” is spearheading work on the newest version of Llama, the company’s large language model, according to people familiar with the matter.
>
>Last week, Wang sent a memo to employees that was viewed by The Wall Street Journal. Wang wrote that TBD Lab would be working alongside Meta’s other AI teams on a variety of projects, including coming model releases, the extension of models’ reasoning capabilities and development of AI agents.
>
>“Already in the past month, I’ve seen meaningful progress in each of these collaborations,” he wrote in the memo. “This enables us to be more technically ambitious, parallelize across several separate efforts and ultimately achieve frontier results more quickly.”
>
>The new Llama project is being led by Jack Rae, a hire to TBD Lab from Alphabet’s Google. Members of Meta’s existing Llama team and TBD Lab are working together on it, according to people familiar with the matter. The new model doesn’t yet have an official name, but internally has been nicknamed Llama 4.5 by some and Llama 4.X by others.

Anonymous
08/29/25(Fri)04:35:01 No.106420259

Anonymous 08/29/25(Fri)04:35:01 No.106420259

>>106420185
>>106420190
If they were going to do that, they could have just released the original lmarena models, but they didn't. If anything they'll double down on censorship.
>omni
They won't do anything but text-only output for safety.
>2T
They threw out Behemoth. They aren't going to start training another one now.

If anything, with ScaleAI Wang in charge now, the new models will be even more safe and corporate-oriented than ever before. They'll give Command A's absolute safety a run for its money.

Anonymous
08/29/25(Fri)04:35:16 No.106420261

Anonymous 08/29/25(Fri)04:35:16 No.106420261

>>106420231
It's very cute that you thought that was a smart post
Now bend over

Anonymous
08/29/25(Fri)04:41:13 No.106420284

Anonymous 08/29/25(Fri)04:41:13 No.106420284

what's the point of having a lot of parameters if most of them are not active

Anonymous
08/29/25(Fri)04:43:44 No.106420296

Anonymous 08/29/25(Fri)04:43:44 No.106420296

>>106420284
(You)

Anonymous
08/29/25(Fri)04:49:54 No.106420325

Anonymous 08/29/25(Fri)04:49:54 No.106420325

>>106420261
You're just jealous that (You) didn't come up with an architecture that is a natural extension of the MoE architecture.

Anonymous
08/29/25(Fri)04:51:50 No.106420335

Anonymous 08/29/25(Fri)04:51:50 No.106420335

>>106420255
Considering zuck was so pissed at the llama team he spent a billion dollars to hire a new team, it feels like it's a threat meant to imply their continued employment at Meta is “to be determined” by the result of their next release.

Anonymous
08/29/25(Fri)05:01:41 No.106420389

Anonymous 08/29/25(Fri)05:01:41 No.106420389

>>106419445
>bros how the fuck do I run hermes 4? I know it's a dense model
You do, but you clearly don't know what that means since you've still got -fmoe, -ot exps=CPU, -lma, etc. Which are all MoE exclusive args.
Dense models run like shit when you run any amount of them on CPU, nevermind running 80% of them on CPU. This is the exact problem MoE's exist to answer. 2 t/s is a fucking miracle.

Anonymous
08/29/25(Fri)05:06:33 No.106420414

Anonymous 08/29/25(Fri)05:06:33 No.106420414

>>106420325
You didn't either dingus.
>Submitted on 4 Jul 2024
>Mixture of A Million Experts
https://arxiv.org/abs/2407.04153

Anonymous
08/29/25(Fri)05:16:29 No.106420468

Anonymous 08/29/25(Fri)05:16:29 No.106420468

>>106420414
>4 Jul 2024
dead as bitnet

Anonymous
08/29/25(Fri)05:20:54 No.106420491

Anonymous 08/29/25(Fri)05:20:54 No.106420491

File: 3.png (253 KB, 1153x1149)

253 KB PNG

when you're pasting the same prompt in both deepseek and gemini 2.5 pro, it's uncanny how similar the answers can be in writing style
I think the new DS, even more so than the updated R1, has been trained very hard on distilled 2.5 CoT (that they captured before google decided to hide the CoT. It was originally not hidden.)
However, DS is discount Gemini. It does far worse as context grows, and it's also less capable at outputting a lot of text in a single answer (like doing anything with a large amount of code at once)

Anonymous
08/29/25(Fri)05:24:15 No.106420514

Anonymous 08/29/25(Fri)05:24:15 No.106420514

>>106420254
>If you accidentally add a second one you will get a warning.
I had this happen because of Gemma 3n's jinja template and I do not understand for the life of me, if you can detect this, why aren't you deleting the double bos instead of letting it happen
ended up loading another jinja template instead of depending on the gguf just to remove the added bos

Anonymous
08/29/25(Fri)05:25:33 No.106420527

Anonymous 08/29/25(Fri)05:25:33 No.106420527

can someone give me their quant + settings + context length for using Gemma with 24GB VRAM

It runs like absolute ass for me at the moment, I swear it was never this bad.

llama.cpp CUDA dev !!yhbFjk57TDr
08/29/25(Fri)05:26:37 No.106420536

llama.cpp CUDA dev !!yhbFjk57TDr 08/29/25(Fri)05:26:37 No.106420536

>>106420514
My opinion is that a double BOS token should just be automatically corrected, Georgi's opinion is that the library should just do exactly as it's told.

Anonymous
08/29/25(Fri)05:28:07 No.106420543

Anonymous 08/29/25(Fri)05:28:07 No.106420543

>>106420536
NTA but I think he's right. There's less chance of confusion that way, even if in this specific case it might not hurt.

Anonymous
08/29/25(Fri)05:28:39 No.106420546

Anonymous 08/29/25(Fri)05:28:39 No.106420546

>>106420527
Show what you tried and the speeds you got so we can at least laugh at you while we try to help you.

Anonymous
08/29/25(Fri)05:30:15 No.106420560

Anonymous 08/29/25(Fri)05:30:15 No.106420560

>>106420536
>>106420543
I understand the philosophy, but this is really one of those clear cut cases where being able to do what you're told to the letter, no matter what context, would never provide any value
double bos is damaging to the model in an unthinkable manner, it's night and day running 3n with the corrected template and I can't imagine a single soul in the world wanting to add a double bos on purpose unless they are a fentanyl baby

Anonymous
08/29/25(Fri)05:32:33 No.106420571

Anonymous 08/29/25(Fri)05:32:33 No.106420571

>>106420527
try q4km with 0 context and see if that helps.

Anonymous
08/29/25(Fri)05:34:08 No.106420580

Anonymous 08/29/25(Fri)05:34:08 No.106420580

>>106420536
Also nta. It sounds ridiculous, but if there is a need, ever, for whatever reason, to have double BOS, you'll have to fight the library to do it. What reasons? Dunno. Experimenting how damaging having double BOS can be. Automatically removing it when it's told to add it is "magic" that the user/developer doesn't see.
>>106420560
If the template of the config are wrong, submitting a patch to whoever made them model, if they are receptive, is a better solution. If they aren't, they should be called out on it.

Anonymous
08/29/25(Fri)05:39:16 No.106420608

Anonymous 08/29/25(Fri)05:39:16 No.106420608

File: IMG_9808.jpg (227 KB, 1290x1625)

227 KB JPG

>>106420491

Anonymous
08/29/25(Fri)05:42:21 No.106420624

Anonymous 08/29/25(Fri)05:42:21 No.106420624

>>106420608
What will they do now that Gemini hides its thinking?

Anonymous
08/29/25(Fri)05:43:30 No.106420628

Anonymous 08/29/25(Fri)05:43:30 No.106420628

>>106420580
>If the template of the config are wrong, submitting a patch to whoever made them model, if they are receptive, is a better solution. If they aren't, they should be called out on it.
All of the 3n ggofs (from all main gooffers) have this at the start:
{{ bos_token }}
I don't even know if it was llama.cpp that was in the wrong or the template truthfully, I mean, it only appears once in the template, and is it llama.cpp's job to add it if the template has it?
all I know is that it resulted in double bos, llama.cpp warning about it in the log, and that when I copied the template, deleted {{ bos_token }} and used the rest as is, the quality of the model went up massively. I wasn't even aware that there could be such a level of difference just because of that one token, but it turns the model from mediocrity into something pretty decent.
I don't use the model anymore, don't have the goof on disk, dunno if anything changed, but at the time before considering changing the template I tried to look in the lcpp parameter arguments here
https://github.com/ggml-org/llama.cpp/tree/master/tools/server
if you could disable llama.cpp's added bos, and I didn't see anything

Anonymous
08/29/25(Fri)05:44:32 No.106420633

Anonymous 08/29/25(Fri)05:44:32 No.106420633

>>106420624
Distill Opus instead.

Anonymous
08/29/25(Fri)05:47:08 No.106420648

Anonymous 08/29/25(Fri)05:47:08 No.106420648

>>106420624
well, gpt-oss does not hide the CoT
We must show our thinking. Policy says we are allowed to. We can accept. Let's answer.

Anonymous
08/29/25(Fri)05:52:43 No.106420674

Anonymous 08/29/25(Fri)05:52:43 No.106420674

File: 3n_bos.png (3 KB, 773x158)

3 KB PNG

>>106420628
>All of the 3n ggofs (from all main gooffers) have this at the start:
At least 3n-E4-it has those set. They should be patched upstream. Everyone would get the fix. Nobody would have to add a special case.

Anonymous
08/29/25(Fri)05:55:27 No.106420687

Anonymous 08/29/25(Fri)05:55:27 No.106420687

>>106420633
Too expensive.

Anonymous
08/29/25(Fri)05:57:30 No.106420694

Anonymous 08/29/25(Fri)05:57:30 No.106420694

>>106420687
Is that why Anthropic never bothered to hide the reasoning?

Anonymous
08/29/25(Fri)06:00:51 No.106420704

Anonymous 08/29/25(Fri)06:00:51 No.106420704

>>106420694
The price alone was probably a good enough deterrent for large scale dataset mining, and if they didn't that it was happening, they didn't have much reason to hide it from actual users. That probably won't stop China if they're the only remaining target.

Anonymous
08/29/25(Fri)06:16:56 No.106420773

Anonymous 08/29/25(Fri)06:16:56 No.106420773

>>106420491
>everyone puts prerelease versions of their model on lm arena
>be surprised when they converge to the same style

Anonymous
08/29/25(Fri)06:20:27 No.106420790

Anonymous 08/29/25(Fri)06:20:27 No.106420790

>>106420773
How about you actually try the same prompt on different models things and see for yourself that no, contrary to what you say, that isn't the case at all. There's lineages of models that open weights copy from. Those lineages are not similar at all.
Gemini doesn't write like Grok doesn't write like GPT-5 doesn't write like Claude. DeepSeek however writes like Gemini.

Anonymous
08/29/25(Fri)06:21:33 No.106420793

Anonymous 08/29/25(Fri)06:21:33 No.106420793

File: jpgartefacts.jpg (46 KB, 960x755)

46 KB JPG

>Failed to condense context
>Context size increased during condensing; skipping this attempt

Anonymous
08/29/25(Fri)06:23:20 No.106420800

Anonymous 08/29/25(Fri)06:23:20 No.106420800

>>106419709
I like the amnesia thing, is this a system prompt?

Anonymous
08/29/25(Fri)06:39:20 No.106420855

Anonymous 08/29/25(Fri)06:39:20 No.106420855

>>106420800
Can't remember...

Anonymous
08/29/25(Fri)06:40:01 No.106420858

Anonymous 08/29/25(Fri)06:40:01 No.106420858

>>106420674
so I downloaded the model again out of curiosity and this behavior didn't happen again
I guess this was something to be fixed in llama.cpp
also tried regular 3 just in case I misremembered which of the gemma, but nay

Anonymous
08/29/25(Fri)06:46:08 No.106420887

Anonymous 08/29/25(Fri)06:46:08 No.106420887

>>106420004
>designated retarded buzzword
>retarded opinion
Pottery

Anonymous
08/29/25(Fri)06:53:17 No.106420920

Anonymous 08/29/25(Fri)06:53:17 No.106420920

https://finance.yahoo.com/news/musk-poaches-14-meta-ai-174235265.html
>Since January, Musk’s xAI has successfully recruited at least 14 engineers from Meta’s AI division—and that’s just the confirmed count. While Zuckerberg scrambles with compensation packages reportedly hitting $250 million per researcher, Musk claims he’s winning this war without matching those “insane” offers.
Zuccbros... Musk sir stole our engineers... Why would they want to work on unsafe AI? Don't they like Wang's wang... I mean safe high quality synthetic data and our progressive office culture full of bureaucracy and responsibility? Why would they rather work 7 days a week and sleep in tents?

Anonymous
08/29/25(Fri)06:55:38 No.106420935

Anonymous 08/29/25(Fri)06:55:38 No.106420935

lol AMD is so useless

"We have two observations here to start off - We can see that their training model on the ROCm stack allows the loss to converge, which makes the appearance that the model finishes training - but without information on the different combinations of #GPUs, vs CUDA/ROCm versions and Pytorch versions we can can't clearly understand their direction of the data.
I see the different version being compared but the graph itself doesn't mention the # of GPUs - do you have the information from another post by chance?Also we see that they are using the celeba dataset, from this Victor assumes they're trying to train a GAN, which he called out as odd because the world as a whole has moved on from GANs to diffusion models for genereating synthetic imagery because GAN losses are very finicky.Essentially, from a scientific perspective, we'd want more data on what and how they were running these models and the scientific justifications as to why they chose those datasets and models for comparison.
Did they put together an article with an explanation of these or was this a one off post?"
12:20 AM

https://x.com/LodestoneE621/status/1955050667237613643

Anonymous
08/29/25(Fri)06:56:05 No.106420938

Anonymous 08/29/25(Fri)06:56:05 No.106420938

>>106415543
Q0, or Q2 (or Q3) with enough RAM and patience.

Anonymous
08/29/25(Fri)06:57:32 No.106420954

Anonymous 08/29/25(Fri)06:57:32 No.106420954

>>106420920
>$250 million per researcher
you can build your own private DL datacenter with that kind of money

Anonymous
08/29/25(Fri)06:59:19 No.106420966

Anonymous 08/29/25(Fri)06:59:19 No.106420966

>>106420935
Dumb furry decided to train his image model on AMD, that explains why it sucks. Now he knows why not even chinks want those shitty cards.

Anonymous
08/29/25(Fri)07:00:19 No.106420971

Anonymous 08/29/25(Fri)07:00:19 No.106420971

>>106420966
he didn't, he tried using it for one of his recent test runs and found that it is broken and amd just shrugged

Anonymous
08/29/25(Fri)07:02:09 No.106420986

Anonymous 08/29/25(Fri)07:02:09 No.106420986

>>106420906
>OpenAI Says It's Scanning User's ChatGPT Conversations and Reporting Content to the Police
local wins again

Anonymous
08/29/25(Fri)07:02:39 No.106420989

Anonymous 08/29/25(Fri)07:02:39 No.106420989

File: 1742174692578864.jpg (11 KB, 225x225)

11 KB JPG

>>106420935
meanwhile china/deepseek was delusional enough to think that huawei was going to work for training

Anonymous
08/29/25(Fri)07:03:41 No.106420995

Anonymous 08/29/25(Fri)07:03:41 No.106420995

>>106420920
elon is the only guy trying to make anime real

Anonymous
08/29/25(Fri)07:04:01 No.106420996

Anonymous 08/29/25(Fri)07:04:01 No.106420996

>>106420986
This shouldn't be a surprise to anyone. On top of that, I remember some pedo being arrested last year from OpenAI detecting and snitching.

Anonymous
08/29/25(Fri)07:06:01 No.106421009

Anonymous 08/29/25(Fri)07:06:01 No.106421009

>>106420995
this, ani was the biggest step forward in this regard since character.ai and the character card standard that local stole from them
in general, the entire open source sector has been surprisingly useless in making this come true. it's all lazy solutions like ST which should've died two years ago in favor of something better

Anonymous
08/29/25(Fri)07:10:32 No.106421032

Anonymous 08/29/25(Fri)07:10:32 No.106421032

>>106421009
>it's all lazy solutions like ST which should've died two years ago in favor of something better
I still can't believe how bloated it it. Most of the functions it does can be done in a simple 64kb html file.

Anonymous
08/29/25(Fri)07:11:01 No.106421036

Anonymous 08/29/25(Fri)07:11:01 No.106421036

File: Base Image.png (1.3 MB, 1200x3996)

1.3 MB PNG

Graph-R1: Unleashing LLM Reasoning with NP-Hard Graph Problems
https://arxiv.org/abs/2508.20373
>Reasoning Large Language Models (RLLMs) have recently achieved remarkable progress on complex reasoning tasks, largely enabled by their long chain-of-thought (Long CoT) capabilities. However, developing these Long CoT behaviors relies heavily on post-training with high-quality datasets, which are typically costly and human-curated (e.g., mathematics and code), leaving scalable alternatives unexplored. In this work, we introduce NP-hard (NPH) graph problems as a novel synthetic training corpus, as they inherently require deep reasoning, extensive exploration, and reflective strategies, which are core characteristics of Long CoT reasoning. Building on this insight, we develop a two-stage post-training framework: (i) Long CoT Supervised Fine-Tuning (SFT) on rejection-sampled NPH graph instances, which substantially enhances reasoning depth, and (ii) Reinforcement Learning (RL) with a fine-grained reward design, which sharpens reasoning efficiency. Our flagship model, Graph-R1-7B, demonstrates strong generalization across mathematics, coding, STEM, and logic, and surpasses QwQ-32B on NPH graph problems in both accuracy and reasoning efficiency. These results position NPH graph problems as an effective and scalable resource for advancing Long CoT reasoning in LLMs, opening a new frontier for LLM post-training.
https://github.com/Graph-Reasoner/Graph-R1
https://huggingface.co/datasets/HKUST-DSAIL/Graph-R1-RFT-COT-30K
Quzzing your miku with traveling salesman problems

Anonymous
08/29/25(Fri)07:18:28 No.106421079

Anonymous 08/29/25(Fri)07:18:28 No.106421079

>>106421009
open source LLMs suffer from the same issue as open source hardware: it's too hard to just take someone else's work and build on top of it, there's no synergy like in software.
So we are stuck eating corpo table scraps and making cope-tunes.

Anonymous
08/29/25(Fri)07:23:49 No.106421107

Anonymous 08/29/25(Fri)07:23:49 No.106421107

>>106421009
I like "vibe-code your own" as a rite of passage of sorts for UI stuff.

Anonymous
08/29/25(Fri)07:35:08 No.106421172

Anonymous 08/29/25(Fri)07:35:08 No.106421172

https://github.com/stepfun-ai/Step-Audio2
https://huggingface.co/stepfun-ai/Step-Audio-2-mini

Anonymous
08/29/25(Fri)07:39:47 No.106421198

Anonymous 08/29/25(Fri)07:39:47 No.106421198

>>106421172
>mini
>8b
Is this slow fuck at least good at transcribing? Can it clone voices?

Anonymous
08/29/25(Fri)07:47:57 No.106421259

Anonymous 08/29/25(Fri)07:47:57 No.106421259

Carrier has arrived

Anonymous
08/29/25(Fri)07:49:17 No.106421268

Anonymous 08/29/25(Fri)07:49:17 No.106421268

File: danger_danger.png (181 KB, 784x695)

181 KB PNG

>https://arxiv.org/pdf/2406.20094
I like this types of disclaimers.
>Watch out. This may make models much more usable. Wink wink...

Anonymous
08/29/25(Fri)07:49:34 No.106421270

Anonymous 08/29/25(Fri)07:49:34 No.106421270

Does anyone know what rocinante is a tune of (is it Mistral?)?

I'm curious if there is something better out there for self-hosted coom stuff, or if that's still the one to go for.

Bonus question: what's it called when you're essentially promoting the AI to tell you a story and you guide it between responses? Does it even have a specific name (like RP does)

Anonymous
08/29/25(Fri)07:51:12 No.106421273

Anonymous 08/29/25(Fri)07:51:12 No.106421273

>>106421270
>Does anyone know what rocinante is a tune of (is it Mistral?)?
Mistral-nemo-12b-base
>I'm curious if there is something better out there for self-hosted coom stuff, or if that's still the one to go for.
If you're poor, no.
>hat's it called when you're essentially promoting the AI to tell you a story and you guide it between responses? Does it even have a specific name (like RP does)
Sounds like story writing.

Anonymous
08/29/25(Fri)07:51:18 No.106421274

Anonymous 08/29/25(Fri)07:51:18 No.106421274

>>106421270
>what's it called when you're essentially promoting the AI to tell you a story and you guide it between responses?
The cringe name is directormaxxing.

Anonymous
08/29/25(Fri)07:51:26 No.106421277

Anonymous 08/29/25(Fri)07:51:26 No.106421277

>>106421270
it's drummer's model

Anonymous
08/29/25(Fri)07:52:25 No.106421280

Anonymous 08/29/25(Fri)07:52:25 No.106421280

>>106421273
How poor is poor? Is a 4090 and 64gb of DRAM poor?

Anonymous
08/29/25(Fri)07:53:36 No.106421285

Anonymous 08/29/25(Fri)07:53:36 No.106421285

File: Screenshot_20250829-055251.png (114 KB, 1079x579)

114 KB PNG

>>106421273
Ummm ackshually it's instruct, not base

Anonymous
08/29/25(Fri)07:55:07 No.106421292

Anonymous 08/29/25(Fri)07:55:07 No.106421292

>>106421280
Slightly less poor than poor. Dunno. Try glm air or something.
>>106421285
Oh. I thought it was base. Consider me learned.

Anonymous
08/29/25(Fri)08:01:21 No.106421336

Anonymous 08/29/25(Fri)08:01:21 No.106421336

File: 1728272539112596.png (686 KB, 966x543)

686 KB PNG

Can someone explain to me how people can manage 5-7t/s on 100b models, cause I'm only managing 1.2t/s on a 49b
64gb ram, 24gb vram ought to be much faster with that as a measuring post

Anonymous
08/29/25(Fri)08:03:15 No.106421349

Anonymous 08/29/25(Fri)08:03:15 No.106421349

File: Screenshot_20250829-060019.png (330 KB, 1080x1553)

330 KB PNG

>>106421292
So it's my turn to be learned - since it's a GGUF, can I run it as long as it fits on my ram + vram (obviously allowing some space for system overhead)

I.e. I have a combined 68gb, so I could possibly try one of the Q3 ones? Or is that not how it works. Up until now I just figured ggufs had to be < vram.

Anonymous
08/29/25(Fri)08:03:38 No.106421352

Anonymous 08/29/25(Fri)08:03:38 No.106421352

>>106421336
Let's jump a few steps on the 20-questions game.
What models, what backend, show your launch settings.
EVERY FUCKING ONE ONE OF YOU FUCKERS...

Anonymous
08/29/25(Fri)08:04:26 No.106421355

Anonymous 08/29/25(Fri)08:04:26 No.106421355

>>106421349
88GB I am retard. So Q4_K_M fits, assuming you can use pooled rammies like that.

Anonymous
08/29/25(Fri)08:04:32 No.106421357

Anonymous 08/29/25(Fri)08:04:32 No.106421357

>>106421336
you should be able to run 49b much faster. play with the ot flag, try to get all the attention layers on the gpu and enable flash attention. I can get a sold 5+ toks across 32k context on dual 3060s

Anonymous
08/29/25(Fri)08:11:42 No.106421396

Anonymous 08/29/25(Fri)08:11:42 No.106421396

>SillyTavern -> User Settings -> Smooth Streaming ON and set to lowest
This shit improves the reading immersion experience by a huge amount, especially for sub 4t/s. Definitely try it out.

Anonymous
08/29/25(Fri)08:14:05 No.106421417

Anonymous 08/29/25(Fri)08:14:05 No.106421417

>testing https://huggingface.co/Jinx-org/Jinx-gpt-oss-20b
Might post some logs but not yet - not sure if I'm doing something wrong but it feels like it has lost any roleplaying elements whatsoever and the model behaves like an ordinary, colourless chatbot when compared to others (mistral, gemma).
Sure, without an example context this is a weird post. Never tried the vanilla gpt-oss so can't really compare.

Anonymous
08/29/25(Fri)08:14:14 No.106421420

Anonymous 08/29/25(Fri)08:14:14 No.106421420

File: dbb74bf381f51f81682a664ba(...).png (3.36 MB, 2180x3260)

3.36 MB PNG

My main use cases for AI are: coding, translations and OCR. Occasionally image generation (for example I'd ask to summarize some data in a graph). Can a LLM do all of this at home?

Anonymous
08/29/25(Fri)08:16:09 No.106421434

Anonymous 08/29/25(Fri)08:16:09 No.106421434

>>106421420
OCR for text? Or labelling pictures/graphs?

Anonymous
08/29/25(Fri)08:16:56 No.106421439

Anonymous 08/29/25(Fri)08:16:56 No.106421439

my glm 4.5 air (IQ4_XS) just breaks down and starts outputting near gibberish at around 6500 tokens in. Is this a skill issue on my side or is this expected? Could the reason be that it's just not trained to generate a single extremely long response (instead of multi turn convo with multiple short responses)?

Anonymous
08/29/25(Fri)08:17:27 No.106421442

Anonymous 08/29/25(Fri)08:17:27 No.106421442

Any good tts I can use with open webui or stuff like that?

Anonymous
08/29/25(Fri)08:17:56 No.106421447

Anonymous 08/29/25(Fri)08:17:56 No.106421447

>>106414555
holy shit miku is so hairy down there. and her pussy is so sweaty and wet

Anonymous
08/29/25(Fri)08:18:12 No.106421449

Anonymous 08/29/25(Fri)08:18:12 No.106421449

>>106421336
Offload all layers to GPU and use n-cpu-moe to move some layers back to the CPU.

Anonymous
08/29/25(Fri)08:19:41 No.106421458

Anonymous 08/29/25(Fri)08:19:41 No.106421458

>>106421442
If you want a more thoughtful reply you'd be best off making some sort of attempt yourself first, and coming and posting once you've hit a roadblock.

Bonus points for making a false declaration (i.e. x model is way better than y model) and farming spergs that can't help but be right to get pertinent info.

Anonymous
08/29/25(Fri)08:21:12 No.106421469

Anonymous 08/29/25(Fri)08:21:12 No.106421469

>>106421449
If you can comfortably fit a model in vram, is there any case you wouldn't want to make the layers in GPU? I'm sure this is a retarded question, but the documentation for this shit is non-existent.

Anonymous
08/29/25(Fri)08:22:42 No.106421479

Anonymous 08/29/25(Fri)08:22:42 No.106421479

>>106421434
Text. I wasn't very impressed by the ocr capabilities of chatgpt 4 when scanning a compressed jpg of an excel table but it's pretty good at scanning and translating comic balloons from manga if the font used isn't handwritten or distorted. I do this last thing quite often and I wonder how good LLMs are at this

Anonymous
08/29/25(Fri)08:24:04 No.106421490

Anonymous 08/29/25(Fri)08:24:04 No.106421490

>>106421469
No reason not to, unless you wanted to also run an image gen model or something at the same time.

Anonymous
08/29/25(Fri)08:24:48 No.106421498

Anonymous 08/29/25(Fri)08:24:48 No.106421498

>>106421479
Interesting. I was going to make some thoughtful suggestions for LLMs I've had great success with scanning text (shitty PDFs, handwritten, etc.) but since you're using it for weeb shit I no longer want to help.

Anonymous
08/29/25(Fri)08:26:08 No.106421506

Anonymous 08/29/25(Fri)08:26:08 No.106421506

https://archive.ph/UJIli
>Within days of joining Meta, Shengjia Zhao, co-creator of OpenAI’s ChatGPT, had threatened to quit and return to his former employer, in a blow to Mark Zuckerberg’s multibillion-dollar push to build “personal superintelligence”.
>Zhao went as far as to sign employment paperwork to go back to OpenAI. Shortly afterwards, according to four people familiar with the matter, he was given the title of Meta’s new “chief AI scientist”.

>Adding to the tumult, a handful of new AI staff have already decided to leave after brief tenures, according to people familiar with the matter.
>This includes Ethan Knight, a machine-learning scientist who joined the company weeks ago. >Another, Avi Verma, a former OpenAI researcher, went through Meta’s onboarding process but never showed up for his first day, according to a person familiar with the matter.
>In a tweet on X on Wednesday, Rishabh Agarwal, a research scientist who started at Meta in April, announced his departure. He said that while Zuckerberg and Wang’s pitch was “incredibly compelling”, he “felt the pull to take on a different kind of risk”, without giving more detail.
>Meanwhile, Chaya Nayak and Loredana Crisan, generative AI staffers who had worked at Meta for nine and 10 years respectively, are among the more than half a dozen veteran employees to announce they are leaving in recent days. Wired first reported some details of recent exits, including Zhao’s threatened departure.

This is hilarious

Anonymous
08/29/25(Fri)08:27:43 No.106421512

Anonymous 08/29/25(Fri)08:27:43 No.106421512

>>106421469
>If you can comfortably fit a model in vram, is there any case you wouldn't want to make the layers in GPU?
If you can fit everything you don't use n-cpu-moe. You use that when you don't have enough VRAM and when you're using a MoE model. Because there's some priority for what layers are best to move back to the CPU when you don't have enough VRAM and that flag takes care of it.

Anonymous
08/29/25(Fri)08:28:38 No.106421518

Anonymous 08/29/25(Fri)08:28:38 No.106421518

>>106421506
They saw how disorganized the company is and how they all have 0 idea on what to do next.

Entire structure is rotten. It's over for META.

Anonymous
08/29/25(Fri)08:29:23 No.106421524

Anonymous 08/29/25(Fri)08:29:23 No.106421524

>>106421512
Thanks, anon. Is there a resource to get up to speed on all these settings and shit, or just learn by playing?

Anonymous
08/29/25(Fri)08:31:00 No.106421538

Anonymous 08/29/25(Fri)08:31:00 No.106421538

>>106421498
I'd normally ask if you know where you are right now but whatever, this is a LLM thread after all

Anonymous
08/29/25(Fri)08:31:16 No.106421543

Anonymous 08/29/25(Fri)08:31:16 No.106421543

>>106421506
Well, you can't just throw together a bunch of people that are good individually and expect to end up with a functioning team, especially if all those people are only motivated by money and were "disloyal" to their previous company.

Anonymous
08/29/25(Fri)08:32:01 No.106421552

Anonymous 08/29/25(Fri)08:32:01 No.106421552

>>106421506
>>106421518
How horrible is it to work at Meta if even the money can't keep people in? Do they have mandatory Wang Zuccing sessions?

Anonymous
08/29/25(Fri)08:32:16 No.106421556

Anonymous 08/29/25(Fri)08:32:16 No.106421556

>>106421518
John Carmack complained about the same thing when he quit Meta (facebook/oculus vr) and this was a long time ago.
It's a mess of a company with unlimited funds.

Anonymous
08/29/25(Fri)08:34:33 No.106421570

Anonymous 08/29/25(Fri)08:34:33 No.106421570

>>106421543
If it's money, how do you explain quitting within a few days or weeks? They could just take the fat paycheck and put up with it

Anonymous
08/29/25(Fri)08:36:36 No.106421579

Anonymous 08/29/25(Fri)08:36:36 No.106421579

>>106421524
I don't know. I learned it from reading the thread. I guess you could read the --help menu, if the description of the flag is not enough, you can search for the original PR that introduced it or search it in the archive to see if people use it.

Anonymous
08/29/25(Fri)08:38:07 No.106421591

Anonymous 08/29/25(Fri)08:38:07 No.106421591

>>106421336
You are running a 49b dense model. It means loading 49b parameters for every token.
Some run mixture of experts models like R1, which only loads 37b out of 671b, so it's even faster than what you are running, even more so when quantized given that it fits into ram.

Anonymous
08/29/25(Fri)08:47:04 No.106421648

Anonymous 08/29/25(Fri)08:47:04 No.106421648

>>106421556
https://www.gamedeveloper.com/business/john-carmack-departs-meta-and-bids-farewell-to-vr-development
>"The issue is our efficiency. Some will ask why I care how the progress is happening, as long as it is happening? If I am trying to sway others, I would say that an org that has only known inefficiency is ill prepared for the inevitable competition and/or belt tightening, but really, it is the more personal pain of seeing a 5 percent GPU utilization number in production. I am offended by it."
>Elaborating further, Carmack said that as a systems optimized person he cares deeply about efficiency, and that when you "work hard at optimization for most of your life, seeing something that is grossly inefficient hurts your soul"—suggesting that Meta's current performance level reminded him of seeing a "tragically low number on a profiling tool."
>Carmack added that Meta has a "ridiculous amount of people and resources," but constantly squanders the tools and teams at its disposal through acts of "self-sabotage."
>"There is no way to sugar coat this; I think our organization is operating at half the effectiveness that would make me happy. Some may scoff and contend we are doing just fine, but others will laugh and say 'Half? Ha! I’m at quarter efficiency!' It has been a struggle for me. I have a voice at the highest levels here, so it feels like I should be able to move things, but I’m evidently not persuasive enough," he continued.
>"A good fraction of the things I complain about eventually turn my way after a year or two passes and evidence piles up, but I have never been able to kill stupid things before they cause damage, or set a direction and have a team actually stick to it. I think my influence at the margins has been positive, but it has never been a prime mover."

>5 percent GPU utilization number in production

Jesus, I didn't know Meta was THAT inefficient. So, out of their compute power of 600k H100s they are utilizing just 30k.

Anonymous
08/29/25(Fri)08:51:01 No.106421679

Anonymous 08/29/25(Fri)08:51:01 No.106421679

>>106421648
>compute power of 600k H100s they are utilizing just 30k
I'm not an expert in this but why don't they offer cloud compute?

Anonymous
08/29/25(Fri)08:52:54 No.106421698

Anonymous 08/29/25(Fri)08:52:54 No.106421698

>>106421543
Mark's strategy is like a kid's idea that you could get the captains from different football teams and they would make the best football team ever

Anonymous
08/29/25(Fri)08:53:39 No.106421702

Anonymous 08/29/25(Fri)08:53:39 No.106421702

>>106421679
because of the inefficient use they need all 600k to get the computational power of only 30k.

Anonymous
08/29/25(Fri)08:53:45 No.106421703

Anonymous 08/29/25(Fri)08:53:45 No.106421703

>>106421648
>>106421679
To my knowledge utilization at huge scale is very bad regardless of company but 5% is definitely a disaster

Anonymous
08/29/25(Fri)08:55:25 No.106421713

Anonymous 08/29/25(Fri)08:55:25 No.106421713

>>106421648
This lines up with my personal experience in "high-performance computing".
People run shitty software in production all the time because for their personal, short-term goals it's better to just scale up the amount of compute than to improve the software.
I have witnessed millions of CPU hours being used with software that spends 20% of its runtime clearing caches.

Anonymous
08/29/25(Fri)08:55:46 No.106421716

Anonymous 08/29/25(Fri)08:55:46 No.106421716

>>106421648
>5 percent GPU utilization number in production
he was speaking figuratively bro referencing his game dev experience....

Anonymous
08/29/25(Fri)09:11:24 No.106421815

Anonymous 08/29/25(Fri)09:11:24 No.106421815

>>106419282
Model Card """Description"""

>Steam v1 has got the juice

>Characters are as vivid as the original GLM-Air, though prose is much more enticing.

>Damn okay this model is actually pretty good. I don't have enough vram to test it on longer chats to 16k, but on 6k chats it's looking good and without deepseek's slop.

>this model has a unique way of speaking. imo it's kept the same "soul" of the writing as Air but with more creativity and willingness to be hor -

>this model is fun! :3

I have to ask, if you're not the spamming beggar himself, are you mentally retarded? What made you look at a fine tune whose description consisted entirely of semi-literate marketing quotes and decide to download it?

Anonymous
08/29/25(Fri)09:15:35 No.106421857

Anonymous 08/29/25(Fri)09:15:35 No.106421857

>>106421815
I don't look at descriptions. I just download. If you see comments about other fine tunes such as by sao or other familiar names of old, at least one of those were by me. I plan on trying the other GLM tune recently made as well.

Anonymous
08/29/25(Fri)09:18:57 No.106421887

Anonymous 08/29/25(Fri)09:18:57 No.106421887

>>106421439
It sounds like you might be hitting context window max and/or memory issues.
if you're talking about one response then 6500 tokens is a little long for one response. I normally keep response lengths at around 2000. you can always "continue" the output with a second response.
Remember response length is part of the context. your context window needs to be probably around 20k to be comfortable. I've run glm air at 35k context with no issues, think it has max 131k context.

Anonymous
08/29/25(Fri)09:22:57 No.106421921

Anonymous 08/29/25(Fri)09:22:57 No.106421921

>>106421716
i wouldnt be so sure looking at metas model release rate compared to the insane amount of gpus

Anonymous
08/29/25(Fri)09:23:10 No.106421924

Anonymous 08/29/25(Fri)09:23:10 No.106421924

>>106418813
How big is the chance that they QUCK it up?

Anonymous
08/29/25(Fri)09:26:18 No.106421948

Anonymous 08/29/25(Fri)09:26:18 No.106421948

>>106421924
14.28%

Anonymous
08/29/25(Fri)09:26:36 No.106421952

Anonymous 08/29/25(Fri)09:26:36 No.106421952

File: 00004-1378487878 (3).png (1.57 MB, 1024x1024)

1.57 MB PNG

>>106421268
> nudge nudge wink wink
> don't do anything I wouldn't do kid

Anonymous
08/29/25(Fri)09:26:45 No.106421955

Anonymous 08/29/25(Fri)09:26:45 No.106421955

>>106421420
Gemma 3n (not regular 3) and Qwen are decent for translation. DeepSeek is probably the best, if you can run that.
NONE of the open weight LLM are good enough for coding. Even the SOTA shit can be painful and make you go into long debugging sessions you wouldn't need if you had written the code yourself and memorized what happens and where.
And only proprietary models really work well after 8k tokens here.
>Occasionally image generation (for example I'd ask to summarize some data in a graph).
"generation"? I don't get it, you're talking about summarizing data in a graph so you mean understanding images, not generating them, right? or you mean making a new, smaller graph from those graphs?
Even a local LLM like qwen coder will be good enough to write a python script that generates a graph, but then you too should be able to write that shit.
As for OCR, it's pretty decent but not 100% reliable. I use Qwen2.5-VL for that.

Anonymous
08/29/25(Fri)09:31:18 No.106421996

Anonymous 08/29/25(Fri)09:31:18 No.106421996

File: d.jpg (84 KB, 965x417)

84 KB JPG

This is for GPT-ASS. Is this a bug or a feature? Why would they change the tag? I mean okay I understand the logic somehow but still. No other model does this afaik.
Seems like a lot of work to implement reasoning.

Anonymous
08/29/25(Fri)09:32:58 No.106422010

Anonymous 08/29/25(Fri)09:32:58 No.106422010

>>106421648
I don't get this comment.
Aren't development servers like this scaled around max load? Buliding a new model is much more resource intensive than running one. So they have a server setup around building new models, than also can run inference when they're not building. Ergo, it's sitting idle if you don't rent the capacity to others.
As for rest: Anytime an exiting exec mouths off like this on the way out the door, they're sending a message to someone. Take it with a grain of salt; it serves his purposes or he wouldn't be doing the interview.

Anonymous
08/29/25(Fri)09:37:37 No.106422048

Anonymous 08/29/25(Fri)09:37:37 No.106422048

>>106422038
>>106422038
>>106422038

Anonymous
08/29/25(Fri)09:39:02 No.106422060

Anonymous 08/29/25(Fri)09:39:02 No.106422060

>>106421996
gpt oss is a clown model. OpenAI released it only to show that they still "care about open source". No sane person should use it as better alternatives exist.

Anonymous
08/29/25(Fri)09:41:06 No.106422076

Anonymous 08/29/25(Fri)09:41:06 No.106422076

>>106422060
Well, this is what I've been thinking but thought it would've been fun to implement it for my client but so far it seems like a lot of work.
Qwen3 is reasoning model too but it's more simpler to handle in this sense.

Anonymous
08/29/25(Fri)09:44:52 No.106422107

Anonymous 08/29/25(Fri)09:44:52 No.106422107

>>106418326
>>106418433
They argue if you don't use chat templating then it goes full schizo and then refuses. >>106418036
Have you been remotely paying attention to anything said ITT?

Anonymous
08/29/25(Fri)09:46:30 No.106422118

Anonymous 08/29/25(Fri)09:46:30 No.106422118

>>106419809
+1 wuan on your account Chang

Anonymous
08/29/25(Fri)09:51:31 No.106422160

Anonymous 08/29/25(Fri)09:51:31 No.106422160

File: 3ec8871ad3f87de846547000d(...).jpg (54 KB, 501x700)

54 KB JPG

Hey, so, what's up, it's ya boy, listen, real talk:

Let's just say that, hypothetically, not me of course, but someone discovered a way to derive all known and unknown mathematical structure via a single axiom applied to a single symbol.

How famous are we talking here? Would this person be able to remain anonymous?

This hypothetical person who I am not is definitely not excited to be Einstein+Hawking+Turing level famous in a single lifetime.

Real talk, lads, 555-come-on-now.

Anonymous
08/29/25(Fri)09:51:41 No.106422162

Anonymous 08/29/25(Fri)09:51:41 No.106422162

>>106420231
Wouldn't attention still be the bottleneck since each token has to attend to every other token in context?

Anonymous
08/29/25(Fri)10:01:46 No.106422238

Anonymous 08/29/25(Fri)10:01:46 No.106422238

>>106421268
Did they just reinvent "you're an expert roleplayer" with more roles?

Anonymous
08/29/25(Fri)10:04:10 No.106422256

Anonymous 08/29/25(Fri)10:04:10 No.106422256

>>106422238
>Did they just
>2406

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.