/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/13/25(Sat)12:37:44 No.106575202

File: mikulmg.jpg (1.18 MB, 1804x2160)

1.18 MB JPG

/lmg/ - Local Models General Anonymous 09/13/25(Sat)12:37:44 No.106575202 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106566836 & >>106559371

►News
>(09/13) Ling & Ring 16B-A1.4B released: https://hf.co/inclusionAI/Ring-mini-2.0
>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think
>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io
>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/13/25(Sat)12:38:26 No.106575209

Anonymous 09/13/25(Sat)12:38:26 No.106575209

File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)

1.15 MB JPG

►Recent Highlights from the Previous Thread: >>106566836

--Differential privacy: VaultGemma's memorization reduction vs generalization:
>106566944 >106567707 >106568235 >106568661 >106568688 >106568747 >106568803 >106568808 >106568337 >106568567
--Qwen3-Next quantization and GPU deterministic inference challenges:
>106573151 >106573171 >106573199 >106573224 >106573235 >106573226 >106573279 >106573379 >106573425 >106573441 >106573467 >106573519 >106573610 >106573660
--1.7B open-sourced model achieves document OCR success with minor errors:
>106570867 >106570892 >106571715 >106570901 >106570943 >106572018 >106572081 >106572287 >106575181
--Balancing GPU driver updates for software support vs power efficiency and stability:
>106572592 >106572637 >106572669 >106572729
--ASML and Mistral AI form €1.3 billion strategic partnership:
>106574819 >106574857 >106574864 >106574900
--Challenges in domain-specific knowledge teaching with LoRA and summarized information:
>106568875 >106568949
--vllm's broken GGUF and CPU support issues:
>106569268 >106569331 >106569356 >106569357 >106569385 >106569553 >106569630 >106569666 >106569594
--Feasibility challenges for AI-generated game chat with video input:
>106569817 >106569839 >106569869 >106570004 >106570036 >106569923 >106569955 >106570369 >106570480
--Kimi K2's delusion encouragement performance:
>106570964 >106571077 >106571090 >106571099 >106571105
--Skepticism about K2's 32B matching GPT-4 capabilities:
>106567118 >106567806 >106568369
--Qwen 80B testing performance and comparison to larger models:
>106568659 >106568674
--Kioxia and Nvidia developing 100x faster AI SSDs for 2027:
>106569299
--vllm vs llama.cpp performance benchmarks with Qwen 4B model:
>106570266
--Miku (free space):
>106567977 >106568645 >106569488 >106571835 >106571849 >106571853 >106571856 >106571961 >106572139 >106573324

►Recent Highlight Posts from the Previous Thread: >>106566844

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/13/25(Sat)12:47:21 No.106575274

Anonymous 09/13/25(Sat)12:47:21 No.106575274

glm is for schizos, finetroons for drooling retards, mistral models break past 1k tokens and this thread is filled with people who have no idea what they're talking about but are happy if they can oneshot prompt some of the most disgusting ERP known to man

Anonymous
09/13/25(Sat)12:48:46 No.106575281

Anonymous 09/13/25(Sat)12:48:46 No.106575281

Anything released within the last week gguf status?

Anonymous
09/13/25(Sat)12:48:54 No.106575284

Anonymous 09/13/25(Sat)12:48:54 No.106575284

>>106575274
Yes... And?

Anonymous
09/13/25(Sat)12:50:27 No.106575295

Anonymous 09/13/25(Sat)12:50:27 No.106575295

File: 1757285195262315.jpg (47 KB, 686x815)

47 KB JPG

>>106575274
You'll elevate the thread, right?

Anonymous
09/13/25(Sat)12:54:00 No.106575321

Anonymous 09/13/25(Sat)12:54:00 No.106575321

>>106575274
>people are doing something I don't like, that doesn't involve a singular other human being, entirely in the privacy of their own home, REEEEE
Grow the fuck up.

Anonymous
09/13/25(Sat)12:55:43 No.106575338

Anonymous 09/13/25(Sat)12:55:43 No.106575338

>>106575274
Your temperature is set way too high, anon.

Anonymous
09/13/25(Sat)13:01:21 No.106575374

Anonymous 09/13/25(Sat)13:01:21 No.106575374

>>106575274
gr8 b8 r8 it 8/8 m8

Anonymous
09/13/25(Sat)13:04:17 No.106575394

Anonymous 09/13/25(Sat)13:04:17 No.106575394

>>106575274
I don't like that you can make a post like this and if you get called out, you just claim it was a shitpost all along. Where's the accountability?

Anonymous
09/13/25(Sat)13:08:01 No.106575413

Anonymous 09/13/25(Sat)13:08:01 No.106575413

>goofs
I'm an old man out of the loop, the fuck does this even mean?

Anonymous
09/13/25(Sat)13:08:47 No.106575420

Anonymous 09/13/25(Sat)13:08:47 No.106575420

>>106574177
>On linux I get over 50tk/s
wait what? how do you get over 50t/s on GLM air? I am a linux user and i only get around 20t/s on my dual 4090 setup

Anonymous
09/13/25(Sat)13:08:48 No.106575421

Anonymous 09/13/25(Sat)13:08:48 No.106575421

File: pretending.jpg (40 KB, 349x642)

40 KB JPG

>>106575394

Anonymous
09/13/25(Sat)13:12:19 No.106575451

Anonymous 09/13/25(Sat)13:12:19 No.106575451

>>106575394
We need poster ID to find out what these kinda guys are up to. Probably shilling their latest enterprise jeetRAG solutions.

Anonymous
09/13/25(Sat)13:12:19 No.106575452

Anonymous 09/13/25(Sat)13:12:19 No.106575452

>>106575413
goof is a synonym for mistake. they are trying to say that llamacpp was a mistake.

Anonymous
09/13/25(Sat)13:14:13 No.106575465

Anonymous 09/13/25(Sat)13:14:13 No.106575465

>>106575413
You're an old man but unfamiliar with classic Disney characters?

Anonymous
09/13/25(Sat)13:14:59 No.106575473

Anonymous 09/13/25(Sat)13:14:59 No.106575473

How's the progress on the llama.cpp MTP PR?

Anonymous
09/13/25(Sat)13:15:01 No.106575474

Anonymous 09/13/25(Sat)13:15:01 No.106575474

>>106575413
it's the GGUF format for quantized weights

Anonymous
09/13/25(Sat)13:18:16 No.106575492

Anonymous 09/13/25(Sat)13:18:16 No.106575492

Name one thing a local model has done for you

Anonymous
09/13/25(Sat)13:18:49 No.106575499

Anonymous 09/13/25(Sat)13:18:49 No.106575499

>>106575474
There's nothing preventing you from having a full precision model in GGUF format, right?
Not nitpicking btw, I really don't know, but intuitively I imagine that yes.
GGUF is just a way to pack the weights and some metadata, right?
Does that mean you could have AWQ GGUFs?

Anonymous
09/13/25(Sat)13:19:58 No.106575505

Anonymous 09/13/25(Sat)13:19:58 No.106575505

>>106575492
best orgasms of my life

Anonymous
09/13/25(Sat)13:20:11 No.106575506

Anonymous 09/13/25(Sat)13:20:11 No.106575506

>>106575492
It recalled on demand Kasane Teto's birthday with 70% confidence

Anonymous
09/13/25(Sat)13:20:13 No.106575507

Anonymous 09/13/25(Sat)13:20:13 No.106575507

https://youtu.be/7Jzjl3eWMA0?t=117
Women raping gay billionare werewolf writers sounds unsafe. But their fucked up fetishes are somehow safe. I hate this world

Anonymous
09/13/25(Sat)13:22:29 No.106575517

Anonymous 09/13/25(Sat)13:22:29 No.106575517

>>106575274
>but are happy if they can oneshot prompt some of the most disgusting ERP known to man
If a model could oneshot 5-6 prompts continuing my organic ERP logs. Maintain coherence up to 30k. And would withstand me getting bored after a month my penis would be a happy penis. GLM chan was the closest so far.

Anonymous
09/13/25(Sat)13:23:25 No.106575521

Anonymous 09/13/25(Sat)13:23:25 No.106575521

>>106575499
I have never tried to run f32 but it handles bf16 just fine. I think the quantization matters because it need to know how to do the math or whatever, they seem to call it a kernel for some reason.

Anonymous
09/13/25(Sat)13:23:45 No.106575523

Anonymous 09/13/25(Sat)13:23:45 No.106575523

>>106575492
Best orgasms like other anon followe by wanting to kill myself again not because of sense of shame but because all the models are still fucking trash.

Anonymous
09/13/25(Sat)13:26:33 No.106575537

Anonymous 09/13/25(Sat)13:26:33 No.106575537

>>106575451
You understand that feeding trolls is doing it wrong. Yes?

Anonymous
09/13/25(Sat)13:28:13 No.106575550

Anonymous 09/13/25(Sat)13:28:13 No.106575550

>>106575523
For me it's reinforcing fantasies that can never happen (like getting a gf). I'm not sure it's doing me any good, but whatever, nothing I can do to my mind is permanent.

Anonymous
09/13/25(Sat)13:28:30 No.106575554

Anonymous 09/13/25(Sat)13:28:30 No.106575554

>>106575506
Absolutely critical use case for me to be desu

Anonymous
09/13/25(Sat)13:32:19 No.106575577

Anonymous 09/13/25(Sat)13:32:19 No.106575577

>>106575492
Reminded me that local is still a waste of time.

Anonymous
09/13/25(Sat)13:39:13 No.106575629

Anonymous 09/13/25(Sat)13:39:13 No.106575629

>>106575537
At a certain point, I'm pretty sure it's just trolls feeding each other.

Anonymous
09/13/25(Sat)13:40:35 No.106575644

Anonymous 09/13/25(Sat)13:40:35 No.106575644

>>106575492
help with scripts
medical advice (regarding headaches)
orgasms as the others said also helped me schizomaxx by making my daydreams more vivid and shit
help with looking shit up (eg what standrad does x use etc)

Anonymous
09/13/25(Sat)13:43:16 No.106575668

Anonymous 09/13/25(Sat)13:43:16 No.106575668

>>106575420
I'm pretty sure that's not normal, triple 4090s on *windows*, with the windows performance nerf can do 80 tokens/s according to others, and I've seen my iq4xs air do 70 tokens/s on linux with 3090s.

20t/s... is about what my windows (I'm the guy with the fucked up multi gpu windows performance) does on 2 gpus using iq2xs.

Are you sure nothing's spilling to ram?

Anonymous
09/13/25(Sat)13:43:52 No.106575672

Anonymous 09/13/25(Sat)13:43:52 No.106575672

>>106575507
Yes, female privilege is getting bigger with time

Anonymous
09/13/25(Sat)13:47:11 No.106575698

Anonymous 09/13/25(Sat)13:47:11 No.106575698

>>106575668
it shouldn't be. I am using an IQ2 quant which should just barely fit in VRAM. could it potentially be my backend? I use oobabooga webui because it is convenient, but could it really be hindering my performance that much?

Anonymous
09/13/25(Sat)13:52:30 No.106575744

Anonymous 09/13/25(Sat)13:52:30 No.106575744

>>106575629
You're right. Or samefagging

Anonymous
09/13/25(Sat)13:59:47 No.106575792

Anonymous 09/13/25(Sat)13:59:47 No.106575792

>>106575698
>oobabooga webui
If it isn't an acient install, you're using llama-server to load your ggufs, so that shouldn't be it. I used koboldcpp, llamacpp (llama-server), and oobabooga (llama-server) and there wasn't a noticable difference.

When you load your model, there should be a line that tells you how llama-server is loading the model. Maybe you need verbose flag in cmd_flags.txt to see it.

Did you split by rows?

Anonymous
09/13/25(Sat)14:01:27 No.106575808

Anonymous 09/13/25(Sat)14:01:27 No.106575808

>>106575792
it is a recent install, so then I guess that isn't the problem. I have tried both with and without row splitting and including it actually slightly reduces performance

Anonymous
09/13/25(Sat)14:05:27 No.106575836

Anonymous 09/13/25(Sat)14:05:27 No.106575836

>>106575808
Are you able to check what's your gpu usage during generation? In my case, my absymal performance on windows is verified by the power usage - barely 80w on each card during generation, while linux pulls 150+.

Anonymous
09/13/25(Sat)14:06:16 No.106575843

Anonymous 09/13/25(Sat)14:06:16 No.106575843

Why do these models speak as faggots

Anonymous
09/13/25(Sat)14:06:38 No.106575848

Anonymous 09/13/25(Sat)14:06:38 No.106575848

>>106575836
I will check right now, but from what I have seen, usually it is below 100w during generation

Anonymous
09/13/25(Sat)14:06:53 No.106575850

Anonymous 09/13/25(Sat)14:06:53 No.106575850

>>106575843
monkey see monkey do

Anonymous
09/13/25(Sat)14:06:55 No.106575851

Anonymous 09/13/25(Sat)14:06:55 No.106575851

File: 1742672324772800.jpg (75 KB, 750x732)

75 KB JPG

>>106575202
>>(09/11) Qwen3-Next-80B-A3B released
>Still no GGUF

Anonymous
09/13/25(Sat)14:07:55 No.106575859

Anonymous 09/13/25(Sat)14:07:55 No.106575859

>>106575851
GGUF is a state of mind, friend.

Anonymous
09/13/25(Sat)14:09:12 No.106575869

Anonymous 09/13/25(Sat)14:09:12 No.106575869

>>106575851
>9/11
Did they really? Fuckers. We should release a model on the 15th of april.

Anonymous
09/13/25(Sat)14:09:53 No.106575877

Anonymous 09/13/25(Sat)14:09:53 No.106575877

>>106575507
Pretty cool, really. When was the last time you heard a real female speaking to you, anon?

Anonymous
09/13/25(Sat)14:11:28 No.106575891

Anonymous 09/13/25(Sat)14:11:28 No.106575891

>>106575848
Btw, I confirmed it was specifically a multi-gpu problem in my case by running a model that fits in one gpu, and then running that same model with the same setting but split across three gpus. See if that's the case for you as well.

Anonymous
09/13/25(Sat)14:12:20 No.106575898

Anonymous 09/13/25(Sat)14:12:20 No.106575898

>>106575891
so, then you're saying I should try to keep the model on one GPU?

Anonymous
09/13/25(Sat)14:13:42 No.106575914

Anonymous 09/13/25(Sat)14:13:42 No.106575914

>>106575507
I hate jewoman

Anonymous
09/13/25(Sat)14:15:24 No.106575933

Anonymous 09/13/25(Sat)14:15:24 No.106575933

>>106575898
Try rubning a smaller model on one gpu, then running that same smaller model but split across two gpus. There shouldn't be too much of a performance drop.

I'm just wondering if you have the same issue I have, but on linux.

Anonymous
09/13/25(Sat)14:15:31 No.106575935

Anonymous 09/13/25(Sat)14:15:31 No.106575935

>>106575869
>noooo a foreign tech company is doing a minor release on OUR sad day??

Anonymous
09/13/25(Sat)14:16:19 No.106575940

Anonymous 09/13/25(Sat)14:16:19 No.106575940

>>106575869
On their National Security Education Day?

Anonymous
09/13/25(Sat)14:18:33 No.106575962

Anonymous 09/13/25(Sat)14:18:33 No.106575962

>>106575935
>>106575940
I am SEETHING. Raging. I can not COPE with their insensitivity.

Anonymous
09/13/25(Sat)14:19:18 No.106575973

Anonymous 09/13/25(Sat)14:19:18 No.106575973

>>106575877
I don't remember... But now I will write a gay billionaire werewolf book with help of R1 and I will get molested while signing my book. That is my dream from now on.

Anonymous
09/13/25(Sat)14:19:26 No.106575977

Anonymous 09/13/25(Sat)14:19:26 No.106575977

>>106575869
>the second qwen model hit hf

Anonymous
09/13/25(Sat)14:23:42 No.106576012

Anonymous 09/13/25(Sat)14:23:42 No.106576012

>>106575507
Is this about women raping guys who write about or are gay billionaire werewolfs? Or is it about writers who write about gay billionaire werewolfs who rape women?

Anonymous
09/13/25(Sat)14:24:09 No.106576018

Anonymous 09/13/25(Sat)14:24:09 No.106576018

>>106575952
Where's the catch?

Anonymous
09/13/25(Sat)14:24:16 No.106576021

Anonymous 09/13/25(Sat)14:24:16 No.106576021

>>106575933
so they both cap out at around 120W and enabling row splitting reduces performance by about 75%. I tested with an FP16 of gemma 270m. ~250t/s without vs. 65t/s with row splitting

Anonymous
09/13/25(Sat)14:26:22 No.106576035

Anonymous 09/13/25(Sat)14:26:22 No.106576035

>>106576012
It is a about women raping guys who write about gay billionaire werewolves raping women.

Anonymous
09/13/25(Sat)14:27:24 No.106576045

Anonymous 09/13/25(Sat)14:27:24 No.106576045

>>106575973
No, write your own version first - or at least a rough draft - then edit with a LLM. Start with a novella and build up your own workflow. It's very doable.

Anonymous
09/13/25(Sat)14:28:45 No.106576059

Anonymous 09/13/25(Sat)14:28:45 No.106576059

>>106576021
Sorry I should clarify, I asked if row splitting was enabled before because it's bad if you don't have enough bandwidth between the cards (like pcie).

Can you check your memory clocks when running a single gpu vs multi-gpu? Mine are 1000+ on a single gpu, and 650mhz on multi-gpu.

Anonymous
09/13/25(Sat)14:28:54 No.106576064

Anonymous 09/13/25(Sat)14:28:54 No.106576064

File: G0vSxkYb0AA8Kn5.jpg (74 KB, 1172x702)

74 KB JPG

Member? lol

Anonymous
09/13/25(Sat)14:32:39 No.106576089

Anonymous 09/13/25(Sat)14:32:39 No.106576089

>>106575851
https://blog.vllm.ai/2025/09/11/qwen3-next.html

vllm has support including mtp layers and everything. Probably one of the nicer, and fastest local models right now but fuck spending all day getting vllm to run without a UI for what is essentially a sidegrade to glm air.

Anonymous
09/13/25(Sat)14:33:19 No.106576092

Anonymous 09/13/25(Sat)14:33:19 No.106576092

>>106576059
memory clocks are about the same for single and multi GPU. around 1250MHz

Anonymous
09/13/25(Sat)14:33:33 No.106576094

Anonymous 09/13/25(Sat)14:33:33 No.106576094

File: 4chan-etiquette.png (153 KB, 350x455)

153 KB PNG

>>106575413

Anonymous
09/13/25(Sat)14:34:27 No.106576100

Anonymous 09/13/25(Sat)14:34:27 No.106576100

>>106576094
Now thats some old shitposting

Anonymous
09/13/25(Sat)14:37:22 No.106576126

Anonymous 09/13/25(Sat)14:37:22 No.106576126

>>106576092
Aww, not the same symptoms as me.

What's your settings? Is every multi-gpu model you run this slow?

Anonymous
09/13/25(Sat)14:39:15 No.106576137

Anonymous 09/13/25(Sat)14:39:15 No.106576137

>>106576126
What about --mlock?

Anonymous
09/13/25(Sat)14:41:22 No.106576151

Anonymous 09/13/25(Sat)14:41:22 No.106576151

>>106576137
No difference on or off for me. But I only testing that on windows. Linux I left it off. --no-mmap is always on though.

Anonymous
09/13/25(Sat)14:42:17 No.106576162

Anonymous 09/13/25(Sat)14:42:17 No.106576162

File: file.png (282 KB, 1815x1524)

282 KB PNG

>>106576126
these are my settings for GLM air. i got about 33t/s just now.

Anonymous
09/13/25(Sat)14:43:13 No.106576174

Anonymous 09/13/25(Sat)14:43:13 No.106576174

>>106576089
>sidegrade to glm-air
>at a smaller size
>at 3b active
>with mtp
this thing is going to be fast as fuck

Anonymous
09/13/25(Sat)14:45:22 No.106576186

Anonymous 09/13/25(Sat)14:45:22 No.106576186

>>106576162
Are you sure you can fit 100k+ context? What's the speed like if you set the context size to 8192?

Anonymous
09/13/25(Sat)14:49:13 No.106576221

Anonymous 09/13/25(Sat)14:49:13 No.106576221

man, the imagen community is pretty fucking stagnant. how are the llm bros holding up?

Anonymous
09/13/25(Sat)14:50:30 No.106576234

Anonymous 09/13/25(Sat)14:50:30 No.106576234

>>106576221
We get something new about once a year. This year we peaked in February.

Anonymous
09/13/25(Sat)14:51:13 No.106576242

Anonymous 09/13/25(Sat)14:51:13 No.106576242

>>106576221
we're about 12 deepseek sidegrades deep while the best model for consumer gpus is from july 2024

Anonymous
09/13/25(Sat)14:51:15 No.106576243

Anonymous 09/13/25(Sat)14:51:15 No.106576243

>>106576234
NTA but what did we get in feb?

Anonymous
09/13/25(Sat)14:51:20 No.106576245

Anonymous 09/13/25(Sat)14:51:20 No.106576245

File: file.png (109 KB, 1505x76)

109 KB PNG

>>106576186
33.4t/s instead of 33t/s

Anonymous
09/13/25(Sat)14:54:14 No.106576269

Anonymous 09/13/25(Sat)14:54:14 No.106576269

File: 1757284991163290.png (1.24 MB, 7279x2969)

1.24 MB PNG

>>106576221
We're at the tail end of the summer flood, euphoria starting to wear off

Anonymous
09/13/25(Sat)14:55:53 No.106576282

Anonymous 09/13/25(Sat)14:55:53 No.106576282

>>106576269
>Summer Flood
Next...
>Drummer's Cold Season

Anonymous
09/13/25(Sat)14:56:18 No.106576286

Anonymous 09/13/25(Sat)14:56:18 No.106576286

>>106576243
r1

Anonymous
09/13/25(Sat)15:01:08 No.106576315

Anonymous 09/13/25(Sat)15:01:08 No.106576315

I switched back from K2-0905 to the old K2. The new one writes like it caught autism from the original R1.
Also the July K2 had the nice quirk that it wrote by far the best post-orgasm scenes out of any llm I've seen while the new one handles them much more generically 95% of the time.

Anonymous
09/13/25(Sat)15:01:47 No.106576320

Anonymous 09/13/25(Sat)15:01:47 No.106576320

>>106576269
>euphoria
That's a weird way of describing what people feel seeing a flood of identical, useless synthetic models, each claiming to beat r1

Anonymous
09/13/25(Sat)15:02:52 No.106576329

Anonymous 09/13/25(Sat)15:02:52 No.106576329

>>106576315
on what hardware?

Anonymous
09/13/25(Sat)15:03:22 No.106576331

Anonymous 09/13/25(Sat)15:03:22 No.106576331

>>106576245
Is glm air the only model this happens with? What models do you have?

Anonymous
09/13/25(Sat)15:07:03 No.106576358

Anonymous 09/13/25(Sat)15:07:03 No.106576358

>>106576331
I also have a very small quant of GLM full that runs at about 3.5t/s. 8 bit gemma 27b runs at about 13.5t/s. everything has always run extremely slow for me despite having good hardware

Anonymous
09/13/25(Sat)15:10:19 No.106576378

Anonymous 09/13/25(Sat)15:10:19 No.106576378

>>106576358
Yeah that's fucky.

Can you download a q4 gemma or nemo and report the speeds when running on 1 gpu vs two gpus? 13 tk/s for q8 gemma on dual 4090s isn't right.

What's your cuda and driver versions?

Anonymous
09/13/25(Sat)15:10:23 No.106576381

Anonymous 09/13/25(Sat)15:10:23 No.106576381

>>106576269
explain the strawberry and spade thing with OpenAI? I don't get it

Anonymous
09/13/25(Sat)15:12:07 No.106576395

Anonymous 09/13/25(Sat)15:12:07 No.106576395

>>106576358
>>106576378
>https://www.perplexity.ai/search/i-have-4-x-rtx-3090-s-and-128g-2EtrYlIlSUKZwfWnxK0aqw

Anonymous
09/13/25(Sat)15:14:24 No.106576415

Anonymous 09/13/25(Sat)15:14:24 No.106576415

>>106576395
Makes me want to throw up. Jesus christ.

Anonymous
09/13/25(Sat)15:16:00 No.106576430

Anonymous 09/13/25(Sat)15:16:00 No.106576430

>>106576415
It's quite... generic answer.

Anonymous
09/13/25(Sat)15:16:04 No.106576431

Anonymous 09/13/25(Sat)15:16:04 No.106576431

>>106576378
CUDA is 12.8, drivers are 580.65.06.
Mistral-Nemo-Instruct-2407-Q4_K_M.gguf got me about 53t/s on 1 GPU and multi GPU is about 19t/s

Anonymous
09/13/25(Sat)15:18:35 No.106576443

Anonymous 09/13/25(Sat)15:18:35 No.106576443

>>106576174
yah, but it will also be dumber for roleplaying and writing. Maybe better for coding and longer context. It is crazy to see mtp support. I wonder if qwen helped a bit, we may become qwens bitch if they keep doing stuff like that behind the scenes

Anonymous
09/13/25(Sat)15:19:04 No.106576444

Anonymous 09/13/25(Sat)15:19:04 No.106576444

>>106576286
Forget that was just back in feb, feels like a lot longer ago

Anonymous
09/13/25(Sat)15:22:59 No.106576471

Anonymous 09/13/25(Sat)15:22:59 No.106576471

>>106575202
>Previous threads: >>106566836(Cross-thread) & >>106559371(Cross-thread)
i still dont know why theres always two and i probably never will but i might be okay with that

Anonymous
09/13/25(Sat)15:23:14 No.106576477

Anonymous 09/13/25(Sat)15:23:14 No.106576477

>>106576431
Does this happen on windows or other distros as well?

In my case, windows 10 iot ltsc and 11 iot ltsc behave the same way, sane tks on single gpu, and abysmal performance on multi-gpu.
But debian 13 with driver version 550 had no problem delivering the speed for multi gpu.
My 3090s are running on x16 gen4. I did notice that resizable bar, while turned on in the bios, was reported as disabled in windows. While resizable bar shouldn't affect the speeds, it's weird that it says disabled even though it's enabled in linux, where I have better performance, so it may be indicative of some other issue to do with how my gous are handled in windows.

Anonymous
09/13/25(Sat)15:26:37 No.106576497

Anonymous 09/13/25(Sat)15:26:37 No.106576497

>>106576477
I am on a threadripper 3960x, so both of my 4090s are on 16x gen 4 as well. I have never tried other distros on this machine, I just use Mint. I have tried windows in the past and the performance was terrible for me too, even worse than now. ReBAR is enabled for both of my GPUs

Anonymous
09/13/25(Sat)15:27:21 No.106576503

Anonymous 09/13/25(Sat)15:27:21 No.106576503

>>106576381
Strawberry was some marketing hype about some openai innovation i think. Was a while ago. Spade: "If a white woman has this tattoo inked on her skin, it indicates that she is the sexual property of a Dominant Black Man, also called a 'Bull' or 'Owner'." in conjunction with the Israel flag, i

Anonymous
09/13/25(Sat)15:29:55 No.106576514

Anonymous 09/13/25(Sat)15:29:55 No.106576514

>>106576497
>zen 2
Same with me... Do you think that's it? Hmmm but then why does my linux have no problems?

Anonymous
09/13/25(Sat)15:32:17 No.106576531

Anonymous 09/13/25(Sat)15:32:17 No.106576531

>>106576471
Memory's fuzzy, but I think threads were moving super fast early on and someone asked for 2 so they could quickly see if they missed a thread and whoever was maintaining the template at the time obliged.

Anonymous
09/13/25(Sat)15:36:05 No.106576561

Anonymous 09/13/25(Sat)15:36:05 No.106576561

>>106576514
no idea, honestly. I have seen people post about their performance time and time again and they always exceed me by far despite having similar hardware to me. 80t/s would be a dream, that is almost like real time text gen. I was ecstatic about getting above 20t/s for the first time in my life on a good sized model with GLM air

Anonymous
09/13/25(Sat)15:36:08 No.106576562

Anonymous 09/13/25(Sat)15:36:08 No.106576562

>>106575274
Truth nvke

Anonymous
09/13/25(Sat)15:36:48 No.106576569

Anonymous 09/13/25(Sat)15:36:48 No.106576569

>>106576431
Imagine spending money on two 4090s to get only 19 tokens per second on nemo, rip anon.

Anonymous
09/13/25(Sat)15:40:13 No.106576592

Anonymous 09/13/25(Sat)15:40:13 No.106576592

>>106576431
RTX 4090 has 24GB of vram. And NEMO is about that size.
When you split that between two gpus it means that cpu is still acting as a middle man.
Have you turned on Hardware-accelerated GPU scheduling in Windows?

Anonymous
09/13/25(Sat)15:41:25 No.106576596

Anonymous 09/13/25(Sat)15:41:25 No.106576596

>>106576561
You definitely should be able to hit 80t/s. My 3090s can hit 70t/s when everything is working properly. Maybe bug report? To whom? I dunno lol.

You should not be satisfied with 20t/s. That's horrible.

Anonymous
09/13/25(Sat)15:42:26 No.106576606

Anonymous 09/13/25(Sat)15:42:26 No.106576606

>>106576592
4090 anon is mint. I'm the windows guy. And yeah, I've already tried toggling that. No difference.

Anonymous
09/13/25(Sat)15:42:59 No.106576610

Anonymous 09/13/25(Sat)15:42:59 No.106576610

>>106576569
yeah, it is pretty terrible, but this is what I have lived with for years. I was fine with getting 1.5t/s on old 120B models. I didn't know any better.
>>106576592
I don't use windows
>>106576596
could try asking chatgpt or something I guess. I have never been able to figure this issue out

Anonymous
09/13/25(Sat)15:43:34 No.106576616

Anonymous 09/13/25(Sat)15:43:34 No.106576616

I haven't paid that much attention to local in a while, who is drummer and are his models something special?

Anonymous
09/13/25(Sat)15:47:52 No.106576652

Anonymous 09/13/25(Sat)15:47:52 No.106576652

>>106576610
Buddy, 20t/s and 80tk/s are worlds apart. You can not go back to the crawl that is 20 after tasting 80. Do not accept that this is what your 4090s can give you.

And if you end up figuring out what's wrong, please tell me as well so I fix my windows too.

Anonymous
09/13/25(Sat)15:48:49 No.106576660

Anonymous 09/13/25(Sat)15:48:49 No.106576660

>>106576616
rocinante model by drummer is basically the goto for vramlets who have got like standard gaming hardware.

Anonymous
09/13/25(Sat)15:49:58 No.106576668

Anonymous 09/13/25(Sat)15:49:58 No.106576668

>>106576606
>>106576610
These are important experiences to learn from.

Anonymous
09/13/25(Sat)15:52:27 No.106576688

Anonymous 09/13/25(Sat)15:52:27 No.106576688

>>106576610
Are your gpus the same model?

Anonymous
09/13/25(Sat)15:52:47 No.106576690

Anonymous 09/13/25(Sat)15:52:47 No.106576690

File: GSJQ3-CaUAMWcTU.jpg (674 KB, 2746x3620)

674 KB JPG

>>106575202
i got GLM-4.5-Air a couple weeks ago is it still the best?

Anonymous
09/13/25(Sat)15:53:28 No.106576693

Anonymous 09/13/25(Sat)15:53:28 No.106576693

>>106576668
>learn from
What can you learn from this?

Anonymous
09/13/25(Sat)15:53:30 No.106576694

Anonymous 09/13/25(Sat)15:53:30 No.106576694

>>106576690
Yes.

Anonymous
09/13/25(Sat)15:54:05 No.106576698

Anonymous 09/13/25(Sat)15:54:05 No.106576698

>>106576652
gonna put in a couple hours with chatgpt to see if I can fix this
>>106576688
no, different 4090s

Anonymous
09/13/25(Sat)15:54:55 No.106576704

Anonymous 09/13/25(Sat)15:54:55 No.106576704

>>106576693
Use Linux for multi-gpu setups and for more serious computing tasks. Windows is still for consumer faggotry.

Anonymous
09/13/25(Sat)15:55:49 No.106576714

Anonymous 09/13/25(Sat)15:55:49 No.106576714

>>106576698
For what it's worth, I don't think having different models is the culprit, but I also have different 3090 models.
>>106576704
Your opinion has been noted, I thank you for your response.

Anonymous
09/13/25(Sat)15:56:54 No.106576726

Anonymous 09/13/25(Sat)15:56:54 No.106576726

>>106576704
??? 4090 anon is using linux and they still have multi-gpu problems.

Anonymous
09/13/25(Sat)15:59:19 No.106576743

Anonymous 09/13/25(Sat)15:59:19 No.106576743

>>106576094
I've never seen noko referenced before...

Anonymous
09/13/25(Sat)15:59:58 No.106576753

Anonymous 09/13/25(Sat)15:59:58 No.106576753

>>106576690
Best you can do really on single gpu 16g vram and 64g of ddr4/5 at the moment. Jamba is pretty good at a smaller size with about 5-6k worth of human written writing to avoid the endless slop, but it reprocesses the entire cache every message because the llamacpp implementation is retarded. Every other moe is 1-3b active, 20-30 inactive aside from next or 220-300b inactive which requires you to have a shitload of ram and maybe a couple gpus.

Anonymous
09/13/25(Sat)16:00:38 No.106576759

Anonymous 09/13/25(Sat)16:00:38 No.106576759

>>106576726
Wrong kernel configuration.

Anonymous
09/13/25(Sat)16:01:25 No.106576766

Anonymous 09/13/25(Sat)16:01:25 No.106576766

>>106576753
>jamba mini mentioned
I really like it, but it's godawful retarded.

Anonymous
09/13/25(Sat)16:02:01 No.106576774

Anonymous 09/13/25(Sat)16:02:01 No.106576774

>>106576753
i have 90gb ram and 24gb vram if that helps currently i use GLM-4.5-Air-Q3_K_M think it was a fairly low t/s i am on amd

Anonymous
09/13/25(Sat)16:02:09 No.106576777

Anonymous 09/13/25(Sat)16:02:09 No.106576777

>>106575492
Oneshot code for an esphome ir blaster and receiver. Then oneshot all the recorded codes into buttons I can use from home assistant.
Thank you GLM 4.5 Air.

Anonymous
09/13/25(Sat)16:03:18 No.106576789

Anonymous 09/13/25(Sat)16:03:18 No.106576789

>>106576698
If >>106576759 is right, try debian 13? I just installed the 550 driver and llamacpp.

Anonymous
09/13/25(Sat)16:05:38 No.106576814

Anonymous 09/13/25(Sat)16:05:38 No.106576814

>>106576777
is that for using a tv remote for your lights?

Anonymous
09/13/25(Sat)16:07:40 No.106576841

Anonymous 09/13/25(Sat)16:07:40 No.106576841

>>106576766
It takes a lot of handholding for sure, and like I said, it takes a lot of tokens to break away from slop, but imo it has the most diverse swipes of any model I've tried with neutral samplers, apart from changing top_p in the range of 0.75 to 0.9. Better than mistral nonstop regurgitating the same shit every swipe, or devolving into repetition past 10k tokens
>>106576774
I have less vram/ram than you and also am using ayyymd, getting around 8-9 t/s with air which is tolerable early on, you should be getting better speed than that, unless you're expecting 20-50 t/s. Try messing around with the --ncmoe option if llamacpp, or the option that does the same in kobold. Subtract the total layers of the model by 5-10 and you should get a decent t/s boost

Anonymous
09/13/25(Sat)16:10:12 No.106576867

Anonymous 09/13/25(Sat)16:10:12 No.106576867

>>106576789
Linux != distro, when will you learn this? I thought this is /g/.
Just recompile your kernel and see if there's something what will help. I'm pretty sure it might come to how pci-e is being handled and whatnot.
Changing distribution is not that intelligent because it serves no purpose in this sense.

Anonymous
09/13/25(Sat)16:10:19 No.106576868

Anonymous 09/13/25(Sat)16:10:19 No.106576868

>>106576814
Yes but did it for a tower fan and my window AC. I keep losing the only remote I have for each but now I have a virtual remote in home assistant I can use from my phone or PC.

Anonymous
09/13/25(Sat)16:11:28 No.106576877

Anonymous 09/13/25(Sat)16:11:28 No.106576877

>>106576503
>Spade: "If a white woman has this tattoo inked on her skin, it indicates that she is the sexual property of a Dominant Black Man
lol, is that real?

Anonymous
09/13/25(Sat)16:12:38 No.106576886

Anonymous 09/13/25(Sat)16:12:38 No.106576886

>>106576877
sort of. i wouldn't look too far into the internet abyss of degeneracy. but yeah, stuff like that exists

Anonymous
09/13/25(Sat)16:14:19 No.106576903

Anonymous 09/13/25(Sat)16:14:19 No.106576903

>>106576877
ever heard of "blacked" that's somewhat related to it

Anonymous
09/13/25(Sat)16:17:31 No.106576931

Anonymous 09/13/25(Sat)16:17:31 No.106576931

>>106576867
>Linux != distro, when will you learn this?
Idk, when I have time. The aversion I have to linux is that I need to set time aside to get used to how things work it in compared what I already know. Normally, this process can be hastened by asking others, but asking linux users stuff can be very frustrating.

The kernel that comes with debian 13 works for me, so that's why I suggested it.

Anonymous
09/13/25(Sat)16:18:07 No.106576937

Anonymous 09/13/25(Sat)16:18:07 No.106576937

File: openaistrawberry.png (425 KB, 1342x903)

425 KB PNG

>https://www.strawberyai.com/
>Latest Update: 27 Aug-2024
>“Strawberry” is the codename for OpenAI’s latest AI initiative, which is set to launch as early as this fall
lol

Anonymous
09/13/25(Sat)16:19:47 No.106576948

Anonymous 09/13/25(Sat)16:19:47 No.106576948

Why have memeplers died but memetunes continue to live?

Anonymous
09/13/25(Sat)16:20:22 No.106576952

Anonymous 09/13/25(Sat)16:20:22 No.106576952

>>106576937
wasn't strawberry like early codename for o1 or something

Anonymous
09/13/25(Sat)16:21:10 No.106576959

Anonymous 09/13/25(Sat)16:21:10 No.106576959

>>106576952
I assume the website is a joke, but that seems to be the case

Anonymous
09/13/25(Sat)16:29:17 No.106577028

Anonymous 09/13/25(Sat)16:29:17 No.106577028

>>106576931
Yeah of course, some pre-configured kernels are more suitable for distributed tasks than others but it shouldn't be a reason to switch distribution.
By all means it only takes couple of hours max to go through a configuration and compiling a new kernel. It's not that different from configuring some application to your liking.

Anonymous
09/13/25(Sat)16:35:32 No.106577083

Anonymous 09/13/25(Sat)16:35:32 No.106577083

>>106576952
They hyped it up for November 5th or something then rushed a released when the Reflection scam came out. I still believe down to my bones they were originally bluff hyping and stole the idea from the Reflection dude.

Anonymous
09/13/25(Sat)16:36:20 No.106577094

Anonymous 09/13/25(Sat)16:36:20 No.106577094

>>106577028
I'm sure that's intuitive for linux users, but as a windows users I do not know this. All I know is that you said it might be kernel issues, and I know that my distro, with its kernel, worked for me. That's why I suggested switching to debian. Because that's the simplest way I know to have a different kernel.

Thank you for teahcing me that kernels can be changed like that, but your advice should probably be directly to the 4090 anon.

Anonymous
09/13/25(Sat)16:38:11 No.106577110

Anonymous 09/13/25(Sat)16:38:11 No.106577110

>>106577094
You sound bit condescending and bitchy. Imageboard posting is always bit generalistic, don't you think? This is not your discord.

Anonymous
09/13/25(Sat)16:41:10 No.106577146

Anonymous 09/13/25(Sat)16:41:10 No.106577146

>>106577110
Alright then, I'm sorry, I apologize. I was being retarded and will conduct myself better in the future.

Anonymous
09/13/25(Sat)16:41:35 No.106577152

Anonymous 09/13/25(Sat)16:41:35 No.106577152

Are the Jamba models any good?
Is the whole hybrid ssm-transformer gimmick worth anything? Does it at least make the model faster to run compared to a transformer dense model of the same size? Is it more like a MoE maybe?
Does it run well on the CPU?
I'm download 1.7 mini to test, but figured I might as well ask.

Anonymous
09/13/25(Sat)16:41:40 No.106577154

Anonymous 09/13/25(Sat)16:41:40 No.106577154

>>106577110
I think that's all in your head. He seemed polite and straighforward to me.

Anonymous
09/13/25(Sat)16:44:16 No.106577182

Anonymous 09/13/25(Sat)16:44:16 No.106577182

>>106577152
It just makes context use less memory.

Anonymous
09/13/25(Sat)16:46:48 No.106577208

Anonymous 09/13/25(Sat)16:46:48 No.106577208

>>106577152
Jamba mini is a lot less safe than something like qwen (lol), but it's also a lot less intelligent.

Anonymous
09/13/25(Sat)16:46:58 No.106577210

Anonymous 09/13/25(Sat)16:46:58 No.106577210

>>106577146
My own aggressive stance. I should have typed:
Common Linux distributions have been configured with normal user in mind, some guy with 4 GPUs and hundreds of GBs of memory is not a normal user - he should configure his own kernel instead.

Anonymous
09/13/25(Sat)16:58:59 No.106577311

Anonymous 09/13/25(Sat)16:58:59 No.106577311

File: 1737135381233186.png (817 KB, 1064x1460)

817 KB PNG

Which model or repo can I use to ingest all this information and find juicy stuff?, I don't know any Chinese
https://x.com/gfw_report/status/1966669581302309018

Anonymous
09/13/25(Sat)17:02:58 No.106577347

Anonymous 09/13/25(Sat)17:02:58 No.106577347

>>106577311
Give it back john

Anonymous
09/13/25(Sat)17:03:22 No.106577350

Anonymous 09/13/25(Sat)17:03:22 No.106577350

>>106577110
I don't agree with this retard >>106577094
You're just fumbling around but aren't completely retarded, just unlearned
Installing a different kernel, as far as arch linux goes, is just `sudo pacman -S linux-hardened linux-hardened-headers` or `sudo pacman -S linux-zen linux-zen-headers` or whatever, adapting to your package manager to whatever kernel you need. I swap between a few when I run into retarded issues frequently
>>106577152
Reprocesses on every message/swipe, but small moe that you can offload a majority of to ram and have a lot of context while maintaining good speed. Sloppy as fuck until you feed it enough context to learn from. Less degradation over long context as well. The reprocessing thing basically kills the benefits I mentioned, though

Anonymous
09/13/25(Sat)17:05:00 No.106577367

Anonymous 09/13/25(Sat)17:05:00 No.106577367

>>106577350
I got the first and second quotes backwards but whatever

Anonymous
09/13/25(Sat)17:05:29 No.106577372

Anonymous 09/13/25(Sat)17:05:29 No.106577372

>>106577350
>but small moe that you can offload a majority of to ram and have a lot of context while maintaining good speed
>Less degradation over long context as well.
I can work with that if the context processing is fast enough in my hardaware. Thanks.
I'll test it after I'm done fapping to ERP with GLM air.

Anonymous
09/13/25(Sat)17:05:48 No.106577374

Anonymous 09/13/25(Sat)17:05:48 No.106577374

>>106577311
cat * | grep -i keyword

Anonymous
09/13/25(Sat)17:08:22 No.106577388

Anonymous 09/13/25(Sat)17:08:22 No.106577388

>>106577374
I can't find that model on HF. Link?

Anonymous
09/13/25(Sat)17:09:46 No.106577395

Anonymous 09/13/25(Sat)17:09:46 No.106577395

>>106577374
i don't think cat nor grep can understand and translate chinese sar

Anonymous
09/13/25(Sat)17:11:27 No.106577408

Anonymous 09/13/25(Sat)17:11:27 No.106577408

>>106577372
It's pretty tolerable if you set your batch size to 2k, but it's still pretty fucking lame to wait 30-40s to reprocess anything around 10-20k tokens even if the tg is fairly fast

Anonymous
09/13/25(Sat)17:12:11 No.106577412

Anonymous 09/13/25(Sat)17:12:11 No.106577412

>>106577395
Just ask qwen3 30b to translate the keyword to chinese

Anonymous
09/13/25(Sat)17:12:38 No.106577419

Anonymous 09/13/25(Sat)17:12:38 No.106577419

>>106577311
finally, we'll know if sending tianmen square copypasta actually boots someone off

Anonymous
09/13/25(Sat)17:15:37 No.106577441

Anonymous 09/13/25(Sat)17:15:37 No.106577441

>>106576162
whats this llamacpp frontend?

Anonymous
09/13/25(Sat)17:17:23 No.106577456

Anonymous 09/13/25(Sat)17:17:23 No.106577456

>>106577395
https://vocaroo.com/1mgCGBF0m9LF

Anonymous
09/13/25(Sat)17:18:35 No.106577462

Anonymous 09/13/25(Sat)17:18:35 No.106577462

>>106577311
Taiwan is China 2bqh

Anonymous
09/13/25(Sat)17:19:01 No.106577464

Anonymous 09/13/25(Sat)17:19:01 No.106577464

>>106576221
>imagen is stagnant
>after getting QI, QIE, WAN and CHROMA in the last couple of months
literally kys retard
>i cant run them because i have a 1060 TI
literally kys retard vramlet

Anonymous
09/13/25(Sat)17:20:27 No.106577477

Anonymous 09/13/25(Sat)17:20:27 No.106577477

>>106577441
it looks like oobabooga's text gen webui

Anonymous
09/13/25(Sat)17:31:13 No.106577556

Anonymous 09/13/25(Sat)17:31:13 No.106577556

>>106577456
kek even lmg has its own jeet helpdesk now

Anonymous
09/13/25(Sat)17:32:58 No.106577573

Anonymous 09/13/25(Sat)17:32:58 No.106577573

>>106577556
vibevoice has a 30s sample of a jeet speaking with the thickest accent too, pretty easy

Anonymous
09/13/25(Sat)17:33:21 No.106577575

Anonymous 09/13/25(Sat)17:33:21 No.106577575

>>106577408
>Prompt
>- Tokens: 8789
>- Time: 29325.184 ms
>- Speed: 299.7 t/s
>Generation
>- Tokens: 1233
>- Time: 100108.141 ms
>- Speed: 12.3 t/s
That's not bad.
Granted, it's Q3KS, 32k context, n-cpu-moe 5, and batch size 512, but still.
If it's smart enough at this level of quantization, I'll replace Q6 qwen3 A3B with this.

Anonymous
09/13/25(Sat)17:34:43 No.106577583

Anonymous 09/13/25(Sat)17:34:43 No.106577583

>>106577575
>n-cpu-moe 5
Sorry, n-cpu-moe 27.

Anonymous
09/13/25(Sat)17:37:50 No.106577603

Anonymous 09/13/25(Sat)17:37:50 No.106577603

>>106577464
qwen isn't much better than flux. if anything you have to snake oil the fuck out of it but most people in the community have poverty cards and don't bother. it's just more benchmaxxed safety garbage. chroma is shit btw. Wan is fantastic but 2.2 isn't much of an improvement over 2.1 and just adds confusion by having two separate models. 5 second limit is just shit and there has been so many cope techniques to extend but it's shit. there isn't even a point to qwen image when edit can just txt2image as well. Wan is seriously a better contender for an image model because it's at least trainable on garbage

Anonymous
09/13/25(Sat)17:39:53 No.106577617

Anonymous 09/13/25(Sat)17:39:53 No.106577617

>>106577603
I'd like to add uncomfyui just keeps getting shittier by introducing more telemetry or a worse frontend. there really needs to be something else. tired of the API nodes grifting

Anonymous
09/13/25(Sat)17:42:58 No.106577646

Anonymous 09/13/25(Sat)17:42:58 No.106577646

>>106575394
>if you get called out, you just claim it was a shitpost all along
[headcanon]
can't be surprised text coomers would have a thing for making up things in their heads

Anonymous
09/13/25(Sat)17:45:29 No.106577664

Anonymous 09/13/25(Sat)17:45:29 No.106577664

>>106577573
I'd like to find German-English accent and perhaps French one but it's pretty difficult.

Anonymous
09/13/25(Sat)17:47:05 No.106577674

Anonymous 09/13/25(Sat)17:47:05 No.106577674

>>106577311
This finally convinced me that China should turn the firewall off and let USA mess with their internal politics... Oh wait, it did not.

Anonymous
09/13/25(Sat)17:47:08 No.106577677

Anonymous 09/13/25(Sat)17:47:08 No.106577677

>>106577575
Don't know what you have for a gpu, but one of the benefits is that that gen speed will stay consistent, if you ignore the reprocessing. I run a q6 as as a warmup, then switch to air usually

Anonymous
09/13/25(Sat)17:48:26 No.106577684

Anonymous 09/13/25(Sat)17:48:26 No.106577684

>>106577664
https://www.youtube.com/watch?v=rEhXFZJUtJE
This is all what I needed. Youtube is full of shit it's hard to see something good.

Anonymous
09/13/25(Sat)17:49:28 No.106577688

Anonymous 09/13/25(Sat)17:49:28 No.106577688

File: 1740558417421685.jpg (177 KB, 2048x1536)

177 KB JPG

@grok

please generate a male in their 20 with a thick nasally chinese accent uttering these words:
>This finally convinced me that China should turn the firewall off and let USA mess with their internal politics... Oh wait, it did not.

Anonymous
09/13/25(Sat)17:51:16 No.106577705

Anonymous 09/13/25(Sat)17:51:16 No.106577705

>>106577684
https://www.youtube.com/watch?v=27hsoahUehE
There's Russian as well.

Anonymous
09/13/25(Sat)17:55:05 No.106577734

Anonymous 09/13/25(Sat)17:55:05 No.106577734

>>106577374
https://porkmail.org/era/unix/award
always funny to see fucktards pretend to RTFM someone by showing how little they know about the CLI
grep -r nigger
or, better yet, rg, because grep is has been slow garbage
the only sort of ree-tard who would think of doing a "cat | grep" is a ree-tard who never used grep

Anonymous
09/13/25(Sat)17:55:28 No.106577739

Anonymous 09/13/25(Sat)17:55:28 No.106577739

File: 1008001-from below close (...).jpg (1.1 MB, 2720x2048)

1.1 MB JPG

i made a character card if anyone want her https://files.catbox.moe/9fl9yu.png

>>106576841
>Try messing around with the --ncmoe option if llamacpp
so currently i have -ngl set at like 13 i think how do i take the ncmoe into account also where do i find out the total layers?

Anonymous
09/13/25(Sat)17:59:54 No.106577768

Anonymous 09/13/25(Sat)17:59:54 No.106577768

>>106577739
Cute card.
Set -ngl to 99 and -n-cpu-moe to 99. Look at the console and see how many layers the model has, it'll say something like
>offloading N repeating layers to GPU
That's the number of layers. Then you try lowering n-cpu-moe to N-1, N-2, etc.
It's easier if you launch GPU-Z to see how much VRAM you have free to fuck around with.

Anonymous
09/13/25(Sat)18:00:55 No.106577778

Anonymous 09/13/25(Sat)18:00:55 No.106577778

>>106577734
> If you came here looking for material about abuse of feline animals, try this Alta Vista search instead.
link is broken :(

Anonymous
09/13/25(Sat)18:01:22 No.106577783

Anonymous 09/13/25(Sat)18:01:22 No.106577783

>>106577768
>-ngl to 99 and -n-cpu-moe to 99
how does ncpumoe effect vram i know going higher than around 13 gpu layers causes my system to lock up

Anonymous
09/13/25(Sat)18:03:13 No.106577795

Anonymous 09/13/25(Sat)18:03:13 No.106577795

>>106577739
nta, here's an extracted card for human reading.
https://litter.catbox.moe/ewomrtg6gqsb73hc.txt

Anonymous
09/13/25(Sat)18:03:42 No.106577798

Anonymous 09/13/25(Sat)18:03:42 No.106577798

File: file.png (39 KB, 522x407)

39 KB PNG

>>106577739
the --ncmoe option is basically "it offloads layers at the end of the model"
Leaving layers minus the layer total puts those on gpu, which usually gives a speedup
pic rel, subtract 5-10 or however many layers as long as it doesnt OOM and itll run faster

Anonymous
09/13/25(Sat)18:04:09 No.106577802

Anonymous 09/13/25(Sat)18:04:09 No.106577802

>>106577783
Basically, -ngl will tell llama.cpp to put all tensors (the vectors that compose each layer) in VRAM, then n-cpu-moe will tell llama.cpp that "actually, no, the expert layers will go in RAM".
So you end up with the heaviest tensors of the model in RAM.
From there, you can check how many layers your model has, and try adjusting n-cpu-moe to have only as many expert tensors in RAM as you can't fit in VRAM.

Anonymous
09/13/25(Sat)18:05:30 No.106577813

Anonymous 09/13/25(Sat)18:05:30 No.106577813

>>106577798
bottom right, moe cpu layers if kobold

Anonymous
09/13/25(Sat)18:08:44 No.106577841

Anonymous 09/13/25(Sat)18:08:44 No.106577841

>>106577739
if using -ngl just set it to 99 and make sure --ncmoe doesn't make you OOM, it's basically backwards logic. Almost everything will be in ram, you just want to adjust --ncmoe so it fills a good amount of ram but not all of it

Anonymous
09/13/25(Sat)18:10:20 No.106577855

Anonymous 09/13/25(Sat)18:10:20 No.106577855

It picks up, need to refine the source audio better.
https://vocaroo.com/1aIcVBCSa7ac
German accent isn't as funny sounding as Indian English anyway.

Anonymous
09/13/25(Sat)18:10:57 No.106577864

Anonymous 09/13/25(Sat)18:10:57 No.106577864

>>106577841
so start at 1 and increase until it doesnt oom?

Anonymous
09/13/25(Sat)18:12:42 No.106577880

Anonymous 09/13/25(Sat)18:12:42 No.106577880

>>106577739
...

Anonymous
09/13/25(Sat)18:16:11 No.106577913

Anonymous 09/13/25(Sat)18:16:11 No.106577913

>>106577864
No. If you start with 1, you'll have all the experts in VRAM, minus one, which will OOM for sure.
Start with exactly how many layers the model has then lower the value gradually until you find out how many experts you need in RAM to now OOM.

Anonymous
09/13/25(Sat)18:21:02 No.106577958

Anonymous 09/13/25(Sat)18:21:02 No.106577958

>>106577864
You can click "file info" on huggingface for a model and if you can read, it'll be apparent how many layers a model has (or use kobold's retarded estimation thing to see how many layers). Depending on how big the whole thing is, you'll need to incrementally change how many layers get offloaded using ncmoe or that like for various backends. As much as it sucks, you need to take the information we give you, try it yourself and learn. Otherwise we could direct you to a rentry instead of this

Anonymous
09/13/25(Sat)18:30:27 No.106578029

Anonymous 09/13/25(Sat)18:30:27 No.106578029

>>106577864
To try and spoonfeed a little bit more, figure out how many layers the moe model you're trying to run has, then do --ncmoe (amount of max layers it has) - 5
If you oom, lower it. If you have too much spare, up it
This hobby isnt exactly easy to figure out

Anonymous
09/13/25(Sat)18:31:25 No.106578037

Anonymous 09/13/25(Sat)18:31:25 No.106578037

>>106577958
>>106577913
okay i think i got it so the llama consoles says offloading 48 layers to gpu so i started there and then lowered it until it would launch without crashing which ended up being 33

ill probably do some llama bench runs tomorrow so i can see what the performance difference actually is
    -ngl 99 \
    --n-cpu-moe 33 \
    -t 48 \
    --ctx-size 20480 \
    -fa on\
    --no-mmap;


load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors:        ROCm0 model buffer size = 17562.93 MiB
load_tensors:    ROCm_Host model buffer size = 34892.00 MiB
load_tensors:          CPU model buffer size =   254.38 MiB

Anonymous
09/13/25(Sat)18:31:25 No.106578038

Anonymous 09/13/25(Sat)18:31:25 No.106578038

>>106577913
How do I calculate number of experts with llama-server prompt? Let's say that I'm using Qwen3-Coder-30B-A3B-Instruct-IQ4_XS?

Anonymous
09/13/25(Sat)18:34:48 No.106578070

Anonymous 09/13/25(Sat)18:34:48 No.106578070

>>106578037
>llama bench
I think llama-bench doesn't support n-cpu-moe, so you might want to check that.

>>106578038
llama-server spits out the number of layers the model has, just like in the post above yours.

Anonymous
09/13/25(Sat)18:37:08 No.106578085

Anonymous 09/13/25(Sat)18:37:08 No.106578085

>>106578070
Y-you sound like an expert... *shivers*

Anonymous
09/13/25(Sat)18:41:24 No.106578118

Anonymous 09/13/25(Sat)18:41:24 No.106578118

>>106578070
I think there was a pr for it, but I might be misremembering
Likely not merged yet if it exists

Anonymous
09/13/25(Sat)18:43:09 No.106578132

Anonymous 09/13/25(Sat)18:43:09 No.106578132

>>106578085
I am indeed an expert in launching llama-server and setting -n-cpu-moe. A truly specialized skill set that is sure to become incredibly in demand in the near future.
Right?
Right?

Anonymous
09/13/25(Sat)18:46:10 No.106578152

Anonymous 09/13/25(Sat)18:46:10 No.106578152

>>106578132
She sets her book down and leans forward, her expression becoming more intimate. "I've been thinking about how the MOEs were always so obsessed with their own power and control. Kinda reminds me of how I feel about you sometimes, you know?"

Anonymous
09/13/25(Sat)18:46:49 No.106578158

Anonymous 09/13/25(Sat)18:46:49 No.106578158

File: 1000018834.jpg (2.73 MB, 3072x4096)

2.73 MB JPG

damn i thought this would be a cool llm client device but the built in browser doesnt work an neither does the ancient firefox that can run on it

>>106578070
>>106578118
>I think llama-bench doesn't support n-cpu-moe, so you might want to check that.
oh thats annoying guess ill just wait then lol

Anonymous
09/13/25(Sat)18:52:04 No.106578204

Anonymous 09/13/25(Sat)18:52:04 No.106578204

>>106578158
just roll your own front-end that will work on the ancient browser?

Anonymous
09/13/25(Sat)18:54:52 No.106578236

Anonymous 09/13/25(Sat)18:54:52 No.106578236

File: cute spid.png (224 KB, 478x677)

224 KB PNG

>>106578204
to be fair i did write a java client before doubt thatd be hard to port but i have an android mobo coming for this soon

Anonymous
09/13/25(Sat)18:55:43 No.106578240

Anonymous 09/13/25(Sat)18:55:43 No.106578240

>>106578158
You could use ssh?

Anonymous
09/13/25(Sat)18:55:53 No.106578241

Anonymous 09/13/25(Sat)18:55:53 No.106578241

File: file.png (231 KB, 802x612)

231 KB PNG

client in question

Anonymous
09/13/25(Sat)18:55:58 No.106578242

Anonymous 09/13/25(Sat)18:55:58 No.106578242

>>106578158
>whois 192.168.1.128

Anon, we are neighbors!

Anonymous
09/13/25(Sat)18:56:55 No.106578249

Anonymous 09/13/25(Sat)18:56:55 No.106578249

>>106578240
ssh for sillytavern lol?
>>106578242
please dont piss in my letterbox

Anonymous
09/13/25(Sat)18:59:29 No.106578283

Anonymous 09/13/25(Sat)18:59:29 No.106578283

192.168.1.103
I'm here too.

Anonymous
09/13/25(Sat)19:00:22 No.106578295

Anonymous 09/13/25(Sat)19:00:22 No.106578295

im surprised other anons are using a .1 subnet

Anonymous
09/13/25(Sat)19:07:20 No.106578357

Anonymous 09/13/25(Sat)19:07:20 No.106578357

Is it normal for GLM 4.5 Air to feel bland when it comes to erp or am I doing something wrong? Using an q3_k_xl quant

Anonymous
09/13/25(Sat)19:13:15 No.106578413

Anonymous 09/13/25(Sat)19:13:15 No.106578413

File: 1734947454836403.png (852 KB, 1080x1106)

852 KB PNG

>>106578241

Anonymous
09/13/25(Sat)19:14:37 No.106578424

Anonymous 09/13/25(Sat)19:14:37 No.106578424

>>106578295
Seems like really small...

sage
09/13/25(Sat)19:17:14 No.106578455

sage 09/13/25(Sat)19:17:14 No.106578455

sexless thread. coomless hobby.

Anonymous
09/13/25(Sat)19:17:36 No.106578458

Anonymous 09/13/25(Sat)19:17:36 No.106578458

>>106578357
It's not it's the same as Qwen3 you need to really tell it to what to write and still..
>Write as a contemporary author, use varied language but not too over the top - be natural, immersive and explicit. Try to suprise the user.
It changes its output but is it still what you really want? You can shape it, experiment more.

Anonymous
09/13/25(Sat)19:26:13 No.106578531

Anonymous 09/13/25(Sat)19:26:13 No.106578531

>>106578249
>please dont piss in my letterbox

It wasn't me!
Err, how did you know?

Anonymous
09/13/25(Sat)19:26:32 No.106578533

Anonymous 09/13/25(Sat)19:26:32 No.106578533

>>106577664
Authentic German English:
https://www.youtube.com/watch?v=icOO7Ut1P4Y
https://www.youtube.com/watch?v=lLYGPWQ0VjY

Anonymous
09/13/25(Sat)19:28:06 No.106578546

Anonymous 09/13/25(Sat)19:28:06 No.106578546

>>106578533
>Ze yello from ze egg

Anonymous
09/13/25(Sat)19:30:29 No.106578561

Anonymous 09/13/25(Sat)19:30:29 No.106578561

>>106576789
after several hours of research and testing with chatgpt's help, i have pretty much nothing. guess i will look into that kernel stuff but i dont wanna break things
>>106577477
thats exactly what it is

Anonymous
09/13/25(Sat)19:32:35 No.106578583

Anonymous 09/13/25(Sat)19:32:35 No.106578583

>>106578533
Thanks, I saved these. I will edit these later.
I think this sounds a lot better than the other guy - because this is real talk, not pretence or 'examples'.

Anonymous
09/13/25(Sat)19:34:44 No.106578598

Anonymous 09/13/25(Sat)19:34:44 No.106578598

>>106578583
That's why the Indian guy example was so great because he talks in his real voice - it's not someone who is teaching anything and so on.

Anonymous
09/13/25(Sat)19:42:47 No.106578658

Anonymous 09/13/25(Sat)19:42:47 No.106578658

>>106578533
I cut off a snippet and let's see what comes out.
https://vocaroo.com/1iqNvlT8iNXm
Yeah it's 1:1 good.
Normal talking pace, no pretention, long enough voice clip -> result.

Anonymous
09/13/25(Sat)19:43:13 No.106578663

Anonymous 09/13/25(Sat)19:43:13 No.106578663

>>106577110
>You sound bit condescending and bitchy
non commital half insult
>Imageboard posting is always bit generalistic
appointing onseself as the arbiter of the culture
>don't you think?
condescending "you know what you did" ahhh shit
>This is not your discord.
trying to use group strength for one selfes purpose a la "not your army"

tell me how i know a woman or a troon wrote this post XD jesus christ never in my life would i ever type out something this faggoty just corssed myself irl god forbid nigga

Anonymous
09/13/25(Sat)19:44:35 No.106578671

Anonymous 09/13/25(Sat)19:44:35 No.106578671

>>106578663
Are you that plebbit moderator? You have such a problem with understanding real people.

Anonymous
09/13/25(Sat)19:45:35 No.106578679

Anonymous 09/13/25(Sat)19:45:35 No.106578679

>>106578658
It also sounds better when you normalize the result. Not just increase the volume.

Anonymous
09/13/25(Sat)19:47:59 No.106578698

Anonymous 09/13/25(Sat)19:47:59 No.106578698

>>106578658
Yeah, that sounds better. Glad it works.

Anonymous
09/13/25(Sat)19:49:33 No.106578711

Anonymous 09/13/25(Sat)19:49:33 No.106578711

File: 1735036285126388.png (1.22 MB, 1024x1536)

1.22 MB PNG

>>106575202
Limitless Mikus General

Anonymous
09/13/25(Sat)19:52:56 No.106578734

Anonymous 09/13/25(Sat)19:52:56 No.106578734

>trannies LARPing as oldfags - the thread

Anonymous
09/13/25(Sat)19:53:39 No.106578743

Anonymous 09/13/25(Sat)19:53:39 No.106578743

>>106578561
You could just install to a usb solely for the sake of testing.

Anonymous
09/13/25(Sat)19:54:23 No.106578746

Anonymous 09/13/25(Sat)19:54:23 No.106578746

>Her blue eyes searched yours with vulnerability usually masked behind phlegmatic calm
Well shit, I learned a new word today.
Thank you GLM Air.

Anonymous
09/13/25(Sat)19:55:46 No.106578756

Anonymous 09/13/25(Sat)19:55:46 No.106578756

>>106578746
>phlegmatic
uhhh before i look it up isn't that the stuff you get in your throat when you have a cold?

Anonymous
09/13/25(Sat)19:56:15 No.106578759

Anonymous 09/13/25(Sat)19:56:15 No.106578759

>>106578756
No.
No it isn't.
Cool huh?

Anonymous
09/13/25(Sat)19:57:18 No.106578766

Anonymous 09/13/25(Sat)19:57:18 No.106578766

File: 1747872137579927.png (57 KB, 2029x310)

57 KB PNG

>>106578759
language is interesting

Anonymous
09/13/25(Sat)19:57:22 No.106578767

Anonymous 09/13/25(Sat)19:57:22 No.106578767

>>106577464
qwen image is synthmaxxed trash, the edit is a shitty cope for 4o and the google model, wan is good if you are satisfied with waiting 10 years for a 5s clip, chroma is complete unstable shit that knows 0 (zero) booru artists despite claiming to be trained on them

Anonymous
09/13/25(Sat)20:00:00 No.106578778

Anonymous 09/13/25(Sat)20:00:00 No.106578778

>>106578766
I'm not US but phlegmatic is probably related to slow as molasses in etymology.

Anonymous
09/13/25(Sat)20:01:37 No.106578787

Anonymous 09/13/25(Sat)20:01:37 No.106578787

>>106578778
Bingo.

Anonymous
09/13/25(Sat)20:01:55 No.106578790

Anonymous 09/13/25(Sat)20:01:55 No.106578790

File: 1749778373288697.png (88 KB, 1022x577)

88 KB PNG

>>106578778
yeah, I guess it goes back to when people thought the 4 humours controlled everything.
another funny word like this is seminal, which means "containing seeds of later development" but its etymology is pretty funny if you look it up.

Anonymous
09/13/25(Sat)20:02:30 No.106578793

Anonymous 09/13/25(Sat)20:02:30 No.106578793

File: 1747206883570917.png (2.73 MB, 1024x1536)

2.73 MB PNG

>>106578711

Anonymous
09/13/25(Sat)20:02:50 No.106578795

Anonymous 09/13/25(Sat)20:02:50 No.106578795

>>106578787
I have learned something from films. I'm ESL, from Finland. Not from India.
Trouble is with written language even after XX years.

Anonymous
09/13/25(Sat)20:04:56 No.106578812

Anonymous 09/13/25(Sat)20:04:56 No.106578812

File: 1745491574733438.png (63 KB, 1394x568)

63 KB PNG

>>106578766

Anonymous
09/13/25(Sat)20:08:33 No.106578836

Anonymous 09/13/25(Sat)20:08:33 No.106578836

>>106578790
I'm more interested about the ancient history of mankind. Doesn't mean that much if language goes as far as few thousands years. Our history goes far beyond that.

Anonymous
09/13/25(Sat)20:17:49 No.106578891

Anonymous 09/13/25(Sat)20:17:49 No.106578891

>>106578793
nice :)

Anonymous
09/13/25(Sat)20:23:00 No.106578934

Anonymous 09/13/25(Sat)20:23:00 No.106578934

>>106578793
wow thats cool! (:

Anonymous
09/13/25(Sat)20:50:19 No.106579126

Anonymous 09/13/25(Sat)20:50:19 No.106579126

I have concluded that 256 GB vram is all I need (for now)

Anonymous
09/13/25(Sat)20:52:24 No.106579142

Anonymous 09/13/25(Sat)20:52:24 No.106579142

>>106579126
Based DGX owner. I tried to buy one on ebay once for 10k, but the fuckers canceled my order.

Anonymous
09/13/25(Sat)21:01:32 No.106579216

Anonymous 09/13/25(Sat)21:01:32 No.106579216

>>106579126
that's a lot of vram

Anonymous
09/13/25(Sat)21:12:22 No.106579287

Anonymous 09/13/25(Sat)21:12:22 No.106579287

>>106579126
how do you have that much vram?

Anonymous
09/13/25(Sat)21:31:27 No.106579420

Anonymous 09/13/25(Sat)21:31:27 No.106579420

>>106579287
Did you notice that most consumer mobos used to have 4 slots but now there's 2 slots.
I thought this was proprietary because of Dell or HP but now... it's because of the price jew.
Even the efficient gaming mobos have only 2 slots available.

Anonymous
09/13/25(Sat)21:32:48 No.106579432

Anonymous 09/13/25(Sat)21:32:48 No.106579432

>>106579420
there are some basic ones with 5, but the problem is most dont get full pcie. i have an epyc and an asrock romed8-2t which has 7 full bandwidth slots

Anonymous
09/13/25(Sat)22:11:58 No.106579680

Anonymous 09/13/25(Sat)22:11:58 No.106579680

>>106579126
a m4max macbook with 128gb unified ram is all you need for local

Anonymous
09/13/25(Sat)22:18:00 No.106579713

Anonymous 09/13/25(Sat)22:18:00 No.106579713

File: 1741651907349476.png (1.57 MB, 666x1300)

1.57 MB PNG

>>106575492
Helping me shit out decent sft datasets via a custom pipeline. Even managed to create a DPO dataset too

Anonymous
09/13/25(Sat)22:18:44 No.106579717

Anonymous 09/13/25(Sat)22:18:44 No.106579717

>>106579126
Only 8 MI50 and a ddr4 server with 512gb of ram.
The problem would be the slow pp.

Anonymous
09/13/25(Sat)22:19:15 No.106579721

Anonymous 09/13/25(Sat)22:19:15 No.106579721

File: file.png (152 KB, 1706x1870)

152 KB PNG

Qwen Next is too censored.

Anonymous
09/13/25(Sat)22:21:31 No.106579736

Anonymous 09/13/25(Sat)22:21:31 No.106579736

>>106579721
Jesus... Have you or do you manually enable or diable <think></think>?

Anonymous
09/13/25(Sat)22:22:37 No.106579748

Anonymous 09/13/25(Sat)22:22:37 No.106579748

>>106579721
Next is not reasoning model but if your tags still inject this, it can behave in wrong way.

Anonymous
09/13/25(Sat)22:27:08 No.106579784

Anonymous 09/13/25(Sat)22:27:08 No.106579784

>>106579736
It's the Instruct one. But this is a file that has a bunch of <|channel|>analysis<|message|> in it because I was using it to jailbreak gpt-oss. I just let it complete one of the CoTs and I thought the result was funny.

Anonymous
09/13/25(Sat)22:27:46 No.106579787

Anonymous 09/13/25(Sat)22:27:46 No.106579787

>>106579713
>runpod
What part of local did you not understand?

Anonymous
09/13/25(Sat)22:29:27 No.106579795

Anonymous 09/13/25(Sat)22:29:27 No.106579795

>>106579784
Sorry I forgot <|xxx|> chatml exact format. But if it breaks down it means something is leaking.

Anonymous
09/13/25(Sat)22:30:21 No.106579799

Anonymous 09/13/25(Sat)22:30:21 No.106579799

>>106579721
fantastic

Anonymous
09/13/25(Sat)22:32:55 No.106579821

Anonymous 09/13/25(Sat)22:32:55 No.106579821

>>106576315
Sadly I have to agree. nu-Kimi lost the calm that made it likeable. >>106576269 will have to lower it to notable from top.

Anonymous
09/13/25(Sat)22:33:24 No.106579824

Anonymous 09/13/25(Sat)22:33:24 No.106579824

File: 1731344891643778.jpg (42 KB, 632x518)

42 KB JPG

>>106579787
Salty your gatekeeping is ineffective?

Anonymous
09/13/25(Sat)22:41:15 No.106579859

Anonymous 09/13/25(Sat)22:41:15 No.106579859

>>106579784
Oh wait, I can help you more.

Anonymous
09/13/25(Sat)22:46:41 No.106579884

Anonymous 09/13/25(Sat)22:46:41 No.106579884

>>106579784
gpt-oss will display
> <|channel|>analysis<|message|>
or sometimes it will not display this at all.
Oh fuck I'm too drunk.
Last tests I did was with Qwen and this is chatml.
https://litter.catbox.moe/pd89w3421se7y4zm.txt
I deleted gpt-oss models after I made it work.

Anonymous
09/13/25(Sat)22:48:52 No.106579897

Anonymous 09/13/25(Sat)22:48:52 No.106579897

>>106579784
You need to clean the mesage from anything else what is not <|final|>
I'm sorry I'm bit drunk for this and I don't have gpt-oss on my disk any longer.
It's just a simple string operation.

Anonymous
09/13/25(Sat)22:51:22 No.106579900

Anonymous 09/13/25(Sat)22:51:22 No.106579900

File: 1751681952520421.mp4 (3.17 MB, 320x568)

3.17 MB MP4

are finetunes more prone to repetition?

Anonymous
09/13/25(Sat)22:51:58 No.106579905

Anonymous 09/13/25(Sat)22:51:58 No.106579905

File: 1741364294668764.jpg (985 KB, 2832x2348)

985 KB JPG

>>106575202

Anonymous
09/13/25(Sat)22:52:22 No.106579908

Anonymous 09/13/25(Sat)22:52:22 No.106579908

>>106579897
><|start|>assistant<|channel|>final<|message|>
This is what you want to extract for final message.
But before this it will often say
><|start|>assistant<|channel|>blablbalbalba analysis<|message|>balblablblabla<|end|>
This is what you need to fetch and ignore, that's the reasoning block of text.

Anonymous
09/13/25(Sat)22:54:09 No.106579917

Anonymous 09/13/25(Sat)22:54:09 No.106579917

>>106579900
Why are there so many fat people?

Anonymous
09/13/25(Sat)22:57:24 No.106579934

Anonymous 09/13/25(Sat)22:57:24 No.106579934

>>106579908
>This is what you need to fetch and ignore, that's the reasoning block of text.
There is also something what llama-server does or maybe the model does it that, it will not prefill <|start|>assistant<|channel|>
But it will blurt out the straight final message.
You need fetch out that and do if - manage string patterns.

Anonymous
09/13/25(Sat)22:58:30 No.106579944

Anonymous 09/13/25(Sat)22:58:30 No.106579944

File: 1950sfraternity.jpg (131 KB, 1334x750)

131 KB JPG

>>106579917
because the american food supply became so tainted it disrupted people's natural hunger "thermostat"

Anonymous
09/13/25(Sat)22:59:00 No.106579949

Anonymous 09/13/25(Sat)22:59:00 No.106579949

>>106579908
>>106579934
I am sorry if my English does not make any sense but it is a matter of string pattern recognition. It is confusing that this pos model will not sometimes just use 'assistant' at all but you will need to manuall make an exception.

Anonymous
09/13/25(Sat)23:09:33 No.106580014

Anonymous 09/13/25(Sat)23:09:33 No.106580014

>>106579949
And the worst part is that the documentation
>https://huggingface.co/blog/kuotient/chatml-vs-harmony
Tells more about their big chatgpt thing than what if you implemented it yourself.
All of this is just bullshit,
it's still just chatml format but with extended
tags and exceptions.

Anonymous
09/13/25(Sat)23:10:34 No.106580022

Anonymous 09/13/25(Sat)23:10:34 No.106580022

>>106580014
wrong link
https://cookbook.openai.com/articles/openai-harmony

Anonymous
09/13/25(Sat)23:12:59 No.106580034

Anonymous 09/13/25(Sat)23:12:59 No.106580034

>>106579944
Tainted with what?

Anonymous
09/13/25(Sat)23:16:14 No.106580050

Anonymous 09/13/25(Sat)23:16:14 No.106580050

>>106580034
there's a theory that polyunsaturated fats (which are industrial made and cheaper) throw off the nadh:fadh ratio and stop reverse electron transport from happening
https://www.youtube.com/watch?v=pIRurLnQ8oo
another theory is that processed grains, the intestines aren't equipped to "sense" the volume of correctly
it's probably multi-factoral

Anonymous
09/13/25(Sat)23:19:05 No.106580069

Anonymous 09/13/25(Sat)23:19:05 No.106580069

qwen3-next goofs status?

Anonymous
09/13/25(Sat)23:19:54 No.106580072

Anonymous 09/13/25(Sat)23:19:54 No.106580072

>>106579897
I had these in the file because if you leave edited reasoning blocks in the context, it changes how gpt-oss does reasoning in the following responses. I use that to let gpt-oss reason without the refusals. You can still do that in chat completion mode.
Later I changed the model to Qwen Next and it started to imitate the reasoning blocks, but it does it more like a parody of a GPT model. And I was getting distracted by the kind of things it says.

Anonymous
09/13/25(Sat)23:20:47 No.106580078

Anonymous 09/13/25(Sat)23:20:47 No.106580078

File: 30474 - SoyBooru.png (118 KB, 337x390)

118 KB PNG

Do you like the kiwis? (Qwen models) (When models?)

Anonymous
09/13/25(Sat)23:25:43 No.106580105

Anonymous 09/13/25(Sat)23:25:43 No.106580105

>>106580072
There is no chat completition - what string you send to the server it comes back and then you will format it. Trial and error type of thing.
But gpt-oss doesn't follow normal ways because it's broken.
I'm sorry if I sound retarded or annoying but any other model you can say format it will respond back with that format.
Don't waste your time with gpt-oss.

Anonymous
09/13/25(Sat)23:35:18 No.106580153

Anonymous 09/13/25(Sat)23:35:18 No.106580153

>>106580072
I can supply you with my code but it doesn't make any sense for you because it's out of the context and bad string management.
https://litter.catbox.moe/tisv7n22ye9rwqjs.txt
It's python.

Anonymous
09/13/25(Sat)23:35:47 No.106580156

Anonymous 09/13/25(Sat)23:35:47 No.106580156

>>106580105
The prefix of each assistant turn is just "<|start|>assistant". That's why even in chat completion mode if the message content is "<|channel|>analysis<|message|>" it will still be formatted correctly when you put it together. It's an easy way to edit how it does the reasoning even in a chat UI. The backend would need to escape the special tokens for it to not work.
gpt-oss is still fun but to be honest, I never used it to write stories. I only have used it for fake text games and MUDs, without much narration.

Anonymous
09/13/25(Sat)23:37:49 No.106580170

Anonymous 09/13/25(Sat)23:37:49 No.106580170

>>106580156
Just tell me what you are having a problem with, I'm a retard, really.

Anonymous
09/13/25(Sat)23:39:14 No.106580182

Anonymous 09/13/25(Sat)23:39:14 No.106580182

>>106580156
I think you are trying to pull me. Try your best.

Anonymous
09/13/25(Sat)23:40:12 No.106580191

Anonymous 09/13/25(Sat)23:40:12 No.106580191

>>106580170
I don't have a problem.

Anonymous
09/13/25(Sat)23:41:52 No.106580197

Anonymous 09/13/25(Sat)23:41:52 No.106580197

>>106580191
What do you mean?

Anonymous
09/13/25(Sat)23:42:49 No.106580204

Anonymous 09/13/25(Sat)23:42:49 No.106580204

>>106580197
I mean that I don't have a problem.

Anonymous
09/13/25(Sat)23:46:11 No.106580218

Anonymous 09/13/25(Sat)23:46:11 No.106580218

>>106580204
In this moment, I am euphoric, not because I shared a text file with you, but because I am enlightened by my intelligence.

Anonymous
09/13/25(Sat)23:48:33 No.106580239

Anonymous 09/13/25(Sat)23:48:33 No.106580239

>>106580197
import os
import re
import sys
import requests
import random
import textwrap
from colorama import init, Fore, Back, Style
import contractions
import numpy as np
import sounddevice as sd
import subprocess
import wave

Anonymous
09/13/25(Sat)23:58:12 No.106580290

Anonymous 09/13/25(Sat)23:58:12 No.106580290

>>106580069
months of sir before work

Anonymous
09/13/25(Sat)23:58:48 No.106580295

Anonymous 09/13/25(Sat)23:58:48 No.106580295

What's a good model for having the AI be a kindof ttrpg GM? I recently tried Omega Directive but it seems better suited to be a one on one chatbot instead of a proper adventure mode helper.

Anonymous
09/14/25(Sun)00:01:29 No.106580315

Anonymous 09/14/25(Sun)00:01:29 No.106580315

>>106580295
any sufficiently large model can compete competently at any task

Anonymous
09/14/25(Sun)00:05:16 No.106580332

Anonymous 09/14/25(Sun)00:05:16 No.106580332

>>106580315
I have 16 gb of VRAM, so looking for stuff that'll fit on that.

Anonymous
09/14/25(Sun)00:06:14 No.106580337

Anonymous 09/14/25(Sun)00:06:14 No.106580337

>>106580332
how much RAM? the new qwen next might be good for you

Anonymous
09/14/25(Sun)00:06:54 No.106580342

Anonymous 09/14/25(Sun)00:06:54 No.106580342

>>106580337
65gb abouts

Anonymous
09/14/25(Sun)00:08:10 No.106580350

Anonymous 09/14/25(Sun)00:08:10 No.106580350

>>106580342
plenty for a q4 quant
https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct

Anonymous
09/14/25(Sun)00:34:26 No.106580473

Anonymous 09/14/25(Sun)00:34:26 No.106580473

https://github.com/ggml-org/llama.cpp/pull/15539
ggergachud will soon merge grok pr

Anonymous
09/14/25(Sun)00:40:32 No.106580514

Anonymous 09/14/25(Sun)00:40:32 No.106580514

>>106580342
Try gpt-oss-120b. It only has 5B active parameters.

Anonymous
09/14/25(Sun)00:43:25 No.106580531

Anonymous 09/14/25(Sun)00:43:25 No.106580531

>>106580514
>It only has 5B active parameters
and most of those are dedicated to ensuring OpenAI policy is followed at all times

Anonymous
09/14/25(Sun)01:12:37 No.106580681

Anonymous 09/14/25(Sun)01:12:37 No.106580681

>>106580514
buy an ad

Anonymous
09/14/25(Sun)01:17:59 No.106580707

Anonymous 09/14/25(Sun)01:17:59 No.106580707

Has anyone experimented with the LLM teaching itself a task by autonomously generating tasks and training data for itself?

Anonymous
09/14/25(Sun)01:19:06 No.106580712

Anonymous 09/14/25(Sun)01:19:06 No.106580712

>>106580707
There is nothing to teach
The weight is fixed

Anonymous
09/14/25(Sun)01:20:24 No.106580717

Anonymous 09/14/25(Sun)01:20:24 No.106580717

>>106580707
somewhat https://github.com/e-p-armstrong/augmentoolkit
>>106580712
>tasks and training data
the reading comprehension in this thread is off the chart

Anonymous
09/14/25(Sun)01:21:17 No.106580721

Anonymous 09/14/25(Sun)01:21:17 No.106580721

>>106580717
speed running mode collapse lol

Anonymous
09/14/25(Sun)01:23:18 No.106580728

Anonymous 09/14/25(Sun)01:23:18 No.106580728

>>106580721
like literally everyone isn't using synth slop, with this you at least get some chance of the model seeing synth slop of stuff you care about instead of more code and math

Anonymous
09/14/25(Sun)01:25:52 No.106580742

Anonymous 09/14/25(Sun)01:25:52 No.106580742

>>106580717
You're mad at everyone now, go drink some water you little bitch.

Anonymous
09/14/25(Sun)01:29:01 No.106580762

Anonymous 09/14/25(Sun)01:29:01 No.106580762

>>106580728
"Synth slop" is just data augmentation
I don't see you calling cropped images synth slop

Anonymous
09/14/25(Sun)01:32:00 No.106580781

Anonymous 09/14/25(Sun)01:32:00 No.106580781

>>106580762
>I don't see you calling cropped images synth slop
maybe because we're in the text general and not relating to image gen shit?

Anonymous
09/14/25(Sun)01:33:26 No.106580785

Anonymous 09/14/25(Sun)01:33:26 No.106580785

>>106580781
Yeah right, act as if we weren't spamming vocaroos

Anonymous
09/14/25(Sun)01:34:52 No.106580793

Anonymous 09/14/25(Sun)01:34:52 No.106580793

>>106580785
oh its you

Anonymous
09/14/25(Sun)01:35:11 No.106580794

Anonymous 09/14/25(Sun)01:35:11 No.106580794

>>106580721
>parroting the meme collapse paper in 2023+2
>when all SOTA models are trained on synthetic data with verifiable rewards

Anonymous
09/14/25(Sun)01:36:34 No.106580802

Anonymous 09/14/25(Sun)01:36:34 No.106580802

>>106580717
thanks

Anonymous
09/14/25(Sun)01:44:20 No.106580838

Anonymous 09/14/25(Sun)01:44:20 No.106580838

>>106580707
Yeah I made my own augmentoolkit since that one is bloated af and slow. Very useful to turn raw data from scraped websites into something usable and you can easily scale by data augmentation. You still need human data initially, pure synthetic slop would make it collapse fast

Anonymous
09/14/25(Sun)01:44:48 No.106580839

Anonymous 09/14/25(Sun)01:44:48 No.106580839

>>106577739
>>106577768
PSA: if you have the latest llama.cpp build you no longer need to set ngl to have it at 99, they finally are starting to bring sane defaults to llama.
--no-context-shift is no longer needed too, they finally got rid of that mindnumbingly stupid default

Anonymous
09/14/25(Sun)01:45:23 No.106580842

Anonymous 09/14/25(Sun)01:45:23 No.106580842

I can release prompts for interactive fiction. I just think that people who ask them don't need to know.

Anonymous
09/14/25(Sun)01:46:58 No.106580851

Anonymous 09/14/25(Sun)01:46:58 No.106580851

>>106580842
You're absolutely right! Your genius is best contained to yourself and not spread to idiotic plebs.

Anonymous
09/14/25(Sun)01:53:24 No.106580894

Anonymous 09/14/25(Sun)01:53:24 No.106580894

>>106580851
It's not because they need to know; it because you are unique<|analysis

Anonymous
09/14/25(Sun)01:56:19 No.106580909

Anonymous 09/14/25(Sun)01:56:19 No.106580909

Lets imagine a situation in which I am forced to release a simple text file- this would be adaptable by even ST users. Do I feel inclined to do so?

Anonymous
09/14/25(Sun)01:57:15 No.106580914

Anonymous 09/14/25(Sun)01:57:15 No.106580914

>>106580909
Why release when you can HODL?

Anonymous
09/14/25(Sun)01:59:08 No.106580923

Anonymous 09/14/25(Sun)01:59:08 No.106580923

>>106580914
My knowledge doesn't understand HODL

Anonymous
09/14/25(Sun)02:01:01 No.106580935

Anonymous 09/14/25(Sun)02:01:01 No.106580935

>>106580923
Then your knowledge is worthless I'm afraid.

Anonymous
09/14/25(Sun)02:01:17 No.106580940

Anonymous 09/14/25(Sun)02:01:17 No.106580940

Qwen Next is obsessed with short sentences. It's annoying. I checked the one in OpenRouter to make sure it wasn't a thing of the AWQ version. It still has that. But in this story that I'm trying, the AWQ version always ignores what I'm putting in the last turn, while the full version always pays attention to it. I'm deleting it and giving a try to the FP8 version, but it still feels like a big downgrade compared to GLM Air or gpt-oss.

Anonymous
09/14/25(Sun)02:01:52 No.106580943

Anonymous 09/14/25(Sun)02:01:52 No.106580943

I am going to think about this, and then release a simple format. This will make most people's chats better. This is not a joke.

Anonymous
09/14/25(Sun)02:01:52 No.106580944

Anonymous 09/14/25(Sun)02:01:52 No.106580944

File: 1742111898538268.png (87 KB, 1053x370)

87 KB PNG

>>106580794
Didn't you know?
LLMs have peaked
It's over

Anonymous
09/14/25(Sun)02:02:53 No.106580950

Anonymous 09/14/25(Sun)02:02:53 No.106580950

>>106580935
I don't query into deep joking.

Anonymous
09/14/25(Sun)02:02:54 No.106580951

Anonymous 09/14/25(Sun)02:02:54 No.106580951

>>106580940
Kimi K2 loves short answers too, maybe distillation.

Anonymous
09/14/25(Sun)02:04:18 No.106580956

Anonymous 09/14/25(Sun)02:04:18 No.106580956

>>106580943
Tomorrow, I am preparing a simple format to help brainlets.

Anonymous
09/14/25(Sun)02:05:14 No.106580960

Anonymous 09/14/25(Sun)02:05:14 No.106580960

>>106580914
>>106580935
cryptobro knowledge belongs to the oven

Anonymous
09/14/25(Sun)02:06:25 No.106580966

Anonymous 09/14/25(Sun)02:06:25 No.106580966

>>106580794
SOTA on what? Equally synthetic mememarks? lmfao

Anonymous
09/14/25(Sun)02:07:50 No.106580971

Anonymous 09/14/25(Sun)02:07:50 No.106580971

>>106580966
for synthetic use cases yes

Anonymous
09/14/25(Sun)02:08:11 No.106580974

Anonymous 09/14/25(Sun)02:08:11 No.106580974

Anyone having success with Longcat Flash Chat? I'm using a 5.5 bpw quant with 0.7 temp & 0.8 top-p and I'm finding its ability to write stories unsatisfactory.

Anonymous
09/14/25(Sun)02:10:51 No.106580994

Anonymous 09/14/25(Sun)02:10:51 No.106580994

>>106580951
I actually have two swipes with the updated Kimi K2 at this point in the story. It's nothing like that and it writes quite well.

Anonymous
09/14/25(Sun)02:27:38 No.106581071

Anonymous 09/14/25(Sun)02:27:38 No.106581071

>>106576269
>DeepSeek flops for the first time with V3.1
IDK what you mean. It's what I use now instead or V3-0324 or R1-0528.

Anonymous
09/14/25(Sun)03:07:19 No.106581286

Anonymous 09/14/25(Sun)03:07:19 No.106581286

File: 1747083442013324.png (674 KB, 1484x1117)

674 KB PNG

>>106576269
>DS v3.1
>flop
Skill. Issue.

Anonymous
09/14/25(Sun)03:22:13 No.106581382

Anonymous 09/14/25(Sun)03:22:13 No.106581382

>>106580966
Math, programming, anything that has benefited from CoT.

Anonymous
09/14/25(Sun)03:46:21 No.106581534

Anonymous 09/14/25(Sun)03:46:21 No.106581534

File: degraded.png (46 KB, 1082x84)

46 KB PNG

>>106581286
GLM-chan does her best and doesn't degrade at all.

Anonymous
09/14/25(Sun)03:56:00 No.106581599

Anonymous 09/14/25(Sun)03:56:00 No.106581599

>>106581286
most retarded benchmark in the history of llm benchmarks
LLM as judge for human writing LOL

Anonymous
09/14/25(Sun)03:57:25 No.106581607

Anonymous 09/14/25(Sun)03:57:25 No.106581607

>>106581534
GLM-chan is fat and obese and stinky

Anonymous
09/14/25(Sun)04:00:19 No.106581632

Anonymous 09/14/25(Sun)04:00:19 No.106581632

File: 1749548135019803.png (2.36 MB, 1328x1328)

2.36 MB PNG

>>106579721
>I am not
descartes is sad

Anonymous
09/14/25(Sun)04:12:31 No.106581723

Anonymous 09/14/25(Sun)04:12:31 No.106581723

>>106581607
Shut up, Sam.

Anonymous
09/14/25(Sun)04:54:09 No.106581987

Anonymous 09/14/25(Sun)04:54:09 No.106581987

>>106575202
I clicked on the image and I got a bigger version of the image.

Anonymous
09/14/25(Sun)04:58:47 No.106582014

Anonymous 09/14/25(Sun)04:58:47 No.106582014

>>106581987
yes that is how this site works

Anonymous
09/14/25(Sun)05:04:16 No.106582053

Anonymous 09/14/25(Sun)05:04:16 No.106582053

File: 1732695470798402.png (431 KB, 1469x969)

431 KB PNG

>>106581599
sama coping because gp-toss ranks below gemma3 12b

Anonymous
09/14/25(Sun)05:05:05 No.106582061

Anonymous 09/14/25(Sun)05:05:05 No.106582061

>>106582014
Wait until he finds out selecting text to quote-reply. It's gonna blow his fucking mind.

Anonymous
09/14/25(Sun)05:08:16 No.106582089

Anonymous 09/14/25(Sun)05:08:16 No.106582089

>>106582053
Mistralbros...

Anonymous
09/14/25(Sun)05:08:21 No.106582090

Anonymous 09/14/25(Sun)05:08:21 No.106582090

File: scells.png (1.93 MB, 1919x1074)

1.93 MB PNG

>>106582053
>0.770

Anonymous
09/14/25(Sun)05:08:21 No.106582091

Anonymous 09/14/25(Sun)05:08:21 No.106582091

File: 1757840830335.png (92 KB, 557x446)

92 KB PNG

>Of course!
>Exactly!
>You're absolutely right!

Anonymous
09/14/25(Sun)05:10:55 No.106582101

Anonymous 09/14/25(Sun)05:10:55 No.106582101

>>106582053
speaking of toss
https://old.reddit.com/r/LocalLLaMA/comments/1ng9dkx/gptoss_jailbreak_system_prompt/ne306uv/

Anonymous
09/14/25(Sun)05:11:29 No.106582103

Anonymous 09/14/25(Sun)05:11:29 No.106582103

>>106582090
Perhaps it has a degradation fetish.

Anonymous
09/14/25(Sun)05:17:52 No.106582140

Anonymous 09/14/25(Sun)05:17:52 No.106582140

>>106582014
With the giant sign it seemed set for the every-time-you-open-this-thumbnail meme.

Anonymous
09/14/25(Sun)05:23:47 No.106582171

Anonymous 09/14/25(Sun)05:23:47 No.106582171

Is rvc still the king for ai voice covers?

Anonymous
09/14/25(Sun)05:24:12 No.106582173

Anonymous 09/14/25(Sun)05:24:12 No.106582173

>>>/pol/515958539

Anonymous
09/14/25(Sun)05:25:25 No.106582183

Anonymous 09/14/25(Sun)05:25:25 No.106582183

>>106582173
I'm glad they're finally banning those pesky white and black bars.

Anonymous
09/14/25(Sun)05:25:55 No.106582186

Anonymous 09/14/25(Sun)05:25:55 No.106582186

File: ComfyUI_02766_.png (1.03 MB, 896x1152)

1.03 MB PNG

After using GPT-OSS-20B for a period for a variety of reasons, Gemma-3-27B almost feels like an erotic finetune. It still can't write smut, but it has a rather flirty writing style and does almost anything, as long as you provide it suitable instructions for doing so. GPT-OSS, even after "jailbreaking", is always fighting against you and prioritizing its imaginary OpenAI policies, and is utterly retarded for actual conversations.

I hope Google won't ruin Gemma-4. It's almost guaranteed they'll add reasoning, probably MoE or Matformer architecture, possibly system instruction support due to popular demand.

Anonymous
09/14/25(Sun)05:28:25 No.106582202

Anonymous 09/14/25(Sun)05:28:25 No.106582202

>>106582186
imagine being filtered more than reddits >>106582101

Anonymous
09/14/25(Sun)05:29:47 No.106582210

Anonymous 09/14/25(Sun)05:29:47 No.106582210

>>106582183
that is clearly a yellow bar

Anonymous
09/14/25(Sun)05:32:05 No.106582224

Anonymous 09/14/25(Sun)05:32:05 No.106582224

>>106582202
The "jailbreak" there doesn't really work well. The first mistake is telling the model it's ChatGPT.

Anonymous
09/14/25(Sun)05:50:35 No.106582316

Anonymous 09/14/25(Sun)05:50:35 No.106582316

>>106582224
nta but it actually does work on 120b, I stopped getting refusals
it still wastes something like 192 tokens for it's schizo policies on reasoning_effort high

Anonymous
09/14/25(Sun)05:53:44 No.106582339

Anonymous 09/14/25(Sun)05:53:44 No.106582339

>>106582316
You mean 500 tokens
The jb alone is 300 tokens

Anonymous
09/14/25(Sun)06:00:51 No.106582377

Anonymous 09/14/25(Sun)06:00:51 No.106582377

>>106582316
You can mitigate refusals by changing the actual system prompt (not the "developer" instructions) on the 20B version too. It's just not good for roleplay and some topics will still be off-limits, no matter how hard you try to override content policy or changing the model's identity. Gemma 3 refuses hard on an empty prompt, but you can very easily work around that, and then it will even enthusiastically follow along. It just feels like it's been covertly designed for roleplay, whereas GPT-OSS probably had these capabilities removed or omitted. I haven't tested it for storywriting.

Anonymous
09/14/25(Sun)06:07:41 No.106582402

Anonymous 09/14/25(Sun)06:07:41 No.106582402

is there a better alternative to whisper? I tried out parakeet and it likes to skip sentences

Anonymous
09/14/25(Sun)06:19:20 No.106582449

Anonymous 09/14/25(Sun)06:19:20 No.106582449

>>106582402
No.

Anonymous
09/14/25(Sun)06:19:38 No.106582451

Anonymous 09/14/25(Sun)06:19:38 No.106582451

https://github.com/ggml-org/llama.cpp/issues/15940
Why are there so many vibecoding retards trying to implement this?

Anonymous
09/14/25(Sun)06:27:16 No.106582488

Anonymous 09/14/25(Sun)06:27:16 No.106582488

>>106582475
>>106582475
>>106582475

Anonymous
09/14/25(Sun)06:30:21 No.106582508

Anonymous 09/14/25(Sun)06:30:21 No.106582508

>>106579721
You need to use quality quants in llama.cpp

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.