/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 03/06/26(Fri)15:26:44 No.108312616

File: 1749619091999848.png (715 KB, 1192x892)

715 KB PNG

/lmg/ - Local Models General Anonymous 03/06/26(Fri)15:26:44 No.108312616 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108307593

►News
>(03/04) Yuan3.0 Ultra 1010B-A68.8B released: https://hf.co/YuanLabAI/Yuan3.0-Ultra
>(03/03) WizardLM publishes "Beyond Length Scaling" GRM paper: https://hf.co/papers/2603.01571
>(03/03) Junyang Lin leaves Qwen: https://xcancel.com/JustinLin610/status/2028865835373359513
>(03/02) Step 3.5 Flash Base, Midtrain, and SteptronOSS released: https://xcancel.com/StepFun_ai/status/2028551435290554450
>(03/02) Introducing the Qwen 3.5 Small Model Series: https://xcancel.com/Alibaba_Qwen/status/2028460046510965160

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/06/26(Fri)15:28:52 No.108312628

Anonymous 03/06/26(Fri)15:28:52 No.108312628

how do i use ai?

Anonymous
03/06/26(Fri)15:29:09 No.108312632

Anonymous 03/06/26(Fri)15:29:09 No.108312632

>>108312592
qwen 2B is better than LFM 2.6B, and qwen 4B is a lot better than their 24BA2B MoE.
I can't see what makes you think a LFM 4B would be any good.

Anonymous
03/06/26(Fri)15:29:13 No.108312633

Anonymous 03/06/26(Fri)15:29:13 No.108312633

>>108312606
I ask LLMs for the stuff I don't know, but I like to stay within my knowledge zone. And as for my client I like to do manual work.
I have a job history in scripting (Maya and Houdini) and done lots of things in work but I have never evolved to be a real software developer, I simply lack the knowledge. I understand this of course but it is above my pay grade at the moment.
Asking LLM for an explanation gives you a vague idea but it doesn't magically embolster you with the knowledge to create such mechanics.

Anonymous
03/06/26(Fri)15:29:44 No.108312635

Anonymous 03/06/26(Fri)15:29:44 No.108312635

>>108312633
Why not use it to grow your knowledge?

Anonymous
03/06/26(Fri)15:31:27 No.108312650

Anonymous 03/06/26(Fri)15:31:27 No.108312650

>>108312635
I do but it needs to be hierarchial. I need to have the motivation to read some books and do exercises and then use LLM. I believe in pragmatism.
Of course I don't always care about this, depends. Describing my mindset, I'm not that clever in the first place.

Anonymous
03/06/26(Fri)15:31:43 No.108312652

Anonymous 03/06/26(Fri)15:31:43 No.108312652

>>108312628
How do I use a computer?

Anonymous
03/06/26(Fri)15:32:55 No.108312658

Anonymous 03/06/26(Fri)15:32:55 No.108312658

>>108312652
arent you using one

Anonymous
03/06/26(Fri)15:36:02 No.108312676

Anonymous 03/06/26(Fri)15:36:02 No.108312676

File: DO NOT PULL.png (5 KB, 323x85)

5 KB PNG

WARNING, ALERT
>+12,846-9,950
from a single pwilkin commit
this can only mean one thing: do not pull for at least a month, let plebbitors suffer and beta test

Anonymous
03/06/26(Fri)15:37:42 No.108312690

Anonymous 03/06/26(Fri)15:37:42 No.108312690

>>108312676
Does he make anything we use?

Anonymous
03/06/26(Fri)15:38:56 No.108312700

Anonymous 03/06/26(Fri)15:38:56 No.108312700

Are there projects to make the copilot thing from MS but local?
Aka able to ask anything and it having access to windows terminal and able to change things for me or just answer quick questions?

Anonymous
03/06/26(Fri)15:40:06 No.108312715

Anonymous 03/06/26(Fri)15:40:06 No.108312715

>>108312676
Huh? I just pulled. Should I roll back? Which version is stable?

Anonymous
03/06/26(Fri)15:40:49 No.108312722

Anonymous 03/06/26(Fri)15:40:49 No.108312722

>>108312700
You can probably use any lightweight model, Qwen or Gemma 4/8B.
The model what Firefox made people to download was only 600MB in size as far as I remember.

Anonymous
03/06/26(Fri)15:41:50 No.108312729

Anonymous 03/06/26(Fri)15:41:50 No.108312729

>>108312722
-> It's not that useful of course.

Anonymous
03/06/26(Fri)15:42:44 No.108312732

Anonymous 03/06/26(Fri)15:42:44 No.108312732

>>108312715
run
git checkout 34df42f7bef5a711b2b40f5d2b6b78254def99c3
for the last commit before the nuke dropped
you can go back to git checkout master once the fallout has been cleaned up.

Anonymous
03/06/26(Fri)15:43:47 No.108312744

Anonymous 03/06/26(Fri)15:43:47 No.108312744

>>108312722
My issue isn't the model, I can run anything below 184GB (vram+ram) on my server, it's more about the local program taking advantage of that.

Anonymous
03/06/26(Fri)15:43:55 No.108312745

Anonymous 03/06/26(Fri)15:43:55 No.108312745

Im going to start a RP session. What do you guys recommend me to try?
I can run pretty much anything at Q6+ at decent t/s so any open model is good to go.
I heard GLM5 is worse than 4.7?
Is Kimi2.5 better than the GLM family?
Any other alternative?

Anonymous
03/06/26(Fri)15:44:12 No.108312749

Anonymous 03/06/26(Fri)15:44:12 No.108312749

>>108312676
It's easy to add code when he's not the one writing it.
>>108312690
>>108312715
This is what happens when you don't post reference links, anon.
https://github.com/ggml-org/llama.cpp/pull/18675

Anonymous
03/06/26(Fri)15:45:38 No.108312763

Anonymous 03/06/26(Fri)15:45:38 No.108312763

File: meme-careta.gif (12 KB, 220x165)

12 KB GIF

>still using TheDrummer's Cydonia 24B
What am I missing out?

Anonymous
03/06/26(Fri)15:46:22 No.108312770

Anonymous 03/06/26(Fri)15:46:22 No.108312770

>>108312763
gpt 5.4

Anonymous
03/06/26(Fri)15:46:35 No.108312772

Anonymous 03/06/26(Fri)15:46:35 No.108312772

>drummer
not even once

Anonymous
03/06/26(Fri)15:47:18 No.108312777

Anonymous 03/06/26(Fri)15:47:18 No.108312777

>>108312749
>It's easy to add code when he's not the one writing it.
He's been working on that for months at this point though.

Anonymous
03/06/26(Fri)15:47:51 No.108312780

Anonymous 03/06/26(Fri)15:47:51 No.108312780

>>108312770
>gpt
>local

Anonymous
03/06/26(Fri)15:49:03 No.108312786

Anonymous 03/06/26(Fri)15:49:03 No.108312786

File: 1741861790313647.png (121 KB, 640x360)

121 KB PNG

>>108312780

Anonymous
03/06/26(Fri)15:49:04 No.108312787

Anonymous 03/06/26(Fri)15:49:04 No.108312787

File: hahafunny.png (43 KB, 781x138)

43 KB PNG

>>108312777
>He's been working on that for months at this point though.
His model. Yes. Ha ha funny comment.

Anonymous
03/06/26(Fri)15:49:06 No.108312788

Anonymous 03/06/26(Fri)15:49:06 No.108312788

>>108312744
TAke a look at the new Qwen 3.5 models. You need some sort of setup for that too.

Anonymous
03/06/26(Fri)15:52:49 No.108312807

Anonymous 03/06/26(Fri)15:52:49 No.108312807

>>108312786
fuck off jeet

Anonymous
03/06/26(Fri)15:54:12 No.108312815

Anonymous 03/06/26(Fri)15:54:12 No.108312815

>>108312807
you lost

Anonymous
03/06/26(Fri)15:56:47 No.108312832

Anonymous 03/06/26(Fri)15:56:47 No.108312832

>>108312815
no u

Anonymous
03/06/26(Fri)15:57:47 No.108312847

Anonymous 03/06/26(Fri)15:57:47 No.108312847

So Reuters lied again about muh V4 release? They should fire all their anonymous sources

Anonymous
03/06/26(Fri)15:59:58 No.108312861

Anonymous 03/06/26(Fri)15:59:58 No.108312861

>>108312847
>listening to journos ever

Anonymous
03/06/26(Fri)16:02:08 No.108312871

Anonymous 03/06/26(Fri)16:02:08 No.108312871

>>108312861
Yeah much better to trust retards like you huh?

Anonymous
03/06/26(Fri)16:02:52 No.108312877

Anonymous 03/06/26(Fri)16:02:52 No.108312877

i got a python string searcher working via qwen3.5 9b after copy pasting an error twice let's fucking go

Anonymous
03/06/26(Fri)16:03:16 No.108312882

Anonymous 03/06/26(Fri)16:03:16 No.108312882

>>108312877
proof?

Anonymous
03/06/26(Fri)16:10:17 No.108312921

Anonymous 03/06/26(Fri)16:10:17 No.108312921

i dont get vibe coding. even gpt 5.4 and opus 4.6 shit out large amounts of garbage code

just earlier i asked 5.4 to solve a problem. it did so but with >200 lines of code. so i did it myself in 12 lines that is also much more efficient

do people not give a shit about code quality?

Anonymous
03/06/26(Fri)16:11:31 No.108312928

Anonymous 03/06/26(Fri)16:11:31 No.108312928

so is there a solution to this SmartCache RNN / prompt processing conundrum in 35BA3B?

Anonymous
03/06/26(Fri)16:11:33 No.108312929

Anonymous 03/06/26(Fri)16:11:33 No.108312929

>>108312921
Can you show the 200 lines vs the 12 lines?

Anonymous
03/06/26(Fri)16:13:16 No.108312941

Anonymous 03/06/26(Fri)16:13:16 No.108312941

>>108312921
Many of the queries are trash and won't go anywhere. If the model doesn't know it will hallucinate because it tries to fill in the blanks.

Anonymous
03/06/26(Fri)16:18:34 No.108312981

Anonymous 03/06/26(Fri)16:18:34 No.108312981

File: 1769984662579618.png (8 KB, 675x575)

8 KB PNG

finally "figured out" a workaround for the constant and ranom TDR crashes when trying run models with lmstudio/llama.cpp on my 5090. just unplug my monitors and use my iGPU for one monitor and the 5090 solely for compute! no idea if it is my system or nvidia drivers being wack but at least now i can use nearly all my vram without rolling the dice every time i press enter.

Anonymous
03/06/26(Fri)16:19:05 No.108312984

Anonymous 03/06/26(Fri)16:19:05 No.108312984

>>108312929
no, not here. but i can describe it if you want. 5.4 starts with 50 lines of preprocessing, 80 lines of helpers, then solves the task in 5 steps. checking the lines, its actually more than 300. what i did is just use nested loops and do it directly in one go. its simple text data processing

Anonymous
03/06/26(Fri)16:26:43 No.108313033

Anonymous 03/06/26(Fri)16:26:43 No.108313033

>>108312847
Next week for sure

Anonymous
03/06/26(Fri)16:27:40 No.108313042

Anonymous 03/06/26(Fri)16:27:40 No.108313042

>>108312981
lmstudio used to be good but now it crashes or crawls to a halt every other model load
its ok somtimes but idk, mostly using kobold now or base llama.cpp

Anonymous
03/06/26(Fri)16:27:52 No.108313043

Anonymous 03/06/26(Fri)16:27:52 No.108313043

can I use a general purpose llm as a binary classifier by giving it a yes or no prompt and just checking the probability for those two tokens?

Anonymous
03/06/26(Fri)16:27:56 No.108313044

Anonymous 03/06/26(Fri)16:27:56 No.108313044

I'm pulling.

Anonymous
03/06/26(Fri)16:32:42 No.108313078

Anonymous 03/06/26(Fri)16:32:42 No.108313078

>>108313042
i don't think it is lmstudio specific as it wasn't lmstudio crashing but the GPU, i think due to driver issues that only nvidia can fix. even when using llama.cpp from the command line i got TDR crashes and unrecoverable black screens forcing me to restart. if i got more than 2-3 TDRs in a few minutes windows would even stop recognizing my GPU, forcing me to reseat it and use DDU to uninstall and then reinstall drivers.

Anonymous
03/06/26(Fri)16:33:11 No.108313083

Anonymous 03/06/26(Fri)16:33:11 No.108313083

File: dipsyNeonAnimated.gif (1.15 MB, 1024x1536)

1.15 MB GIF

>>108312616
lol nice bake.
>>108312847
Daily reminder that no one knows when v4 is coming out. We even got disappointed over webapp update w/ no follow on model.

Anonymous
03/06/26(Fri)16:34:00 No.108313089

Anonymous 03/06/26(Fri)16:34:00 No.108313089

>>108313083
i know when it is coming out

Anonymous
03/06/26(Fri)16:34:35 No.108313094

Anonymous 03/06/26(Fri)16:34:35 No.108313094

>>108312928
Pulling and compiling.

Anonymous
03/06/26(Fri)16:35:09 No.108313102

Anonymous 03/06/26(Fri)16:35:09 No.108313102

>>108312981
>5090
>multi-monitor
Some drivers have this problem.
What you currently on?

Anonymous
03/06/26(Fri)16:38:14 No.108313115

Anonymous 03/06/26(Fri)16:38:14 No.108313115

File: rinCoffeeTMW.png (2.67 MB, 1024x1536)

2.67 MB PNG

>>108313089

Anonymous
03/06/26(Fri)16:40:09 No.108313131

Anonymous 03/06/26(Fri)16:40:09 No.108313131

>>108313115
BOOOOOOO! BOOOOO!!!! GET OFF THE STAGE

Anonymous
03/06/26(Fri)16:41:08 No.108313140

Anonymous 03/06/26(Fri)16:41:08 No.108313140

>>108313102
originally i was on 591.86, which was stable for gaming and so on, never encountered the multi-monitor black screen TDR issues until i started playing with local models. i then tried: 595.71, 595.76 hotfix, rolled back to a 57x.xx driver, then settled on the 591.74 studio driver. same behavior on all the drivers, the only thing that works is not having anything plugged into the 5090 while prooompting.

Anonymous
03/06/26(Fri)16:44:31 No.108313157

Anonymous 03/06/26(Fri)16:44:31 No.108313157

>>108313131
and into the backstage ;)

Anonymous
03/06/26(Fri)16:45:21 No.108313161

Anonymous 03/06/26(Fri)16:45:21 No.108313161

>>108312616
>Yuan3.0 Ultra 1010B-A68.8B
Is that giant model worth a try?

Anonymous
03/06/26(Fri)16:46:25 No.108313167

Anonymous 03/06/26(Fri)16:46:25 No.108313167

I haven't done LLM stuff since a year ago. Is there now a replacement for ollama (something I can launch in docker that support hot-swapping models)?

Anonymous
03/06/26(Fri)16:47:32 No.108313176

Anonymous 03/06/26(Fri)16:47:32 No.108313176

>>108313167
llama.cpp can be put in router mode but I haven't tried swapping models much, one time it didn't work but they were massive models that barely fit anyway.

Anonymous
03/06/26(Fri)16:48:29 No.108313184

Anonymous 03/06/26(Fri)16:48:29 No.108313184

>>108313161
no

Anonymous
03/06/26(Fri)16:49:06 No.108313187

Anonymous 03/06/26(Fri)16:49:06 No.108313187

>>108313184
ok

Anonymous
03/06/26(Fri)16:50:07 No.108313192

Anonymous 03/06/26(Fri)16:50:07 No.108313192

>>108313176
Might have to suck it up and get used to just using one model, tired of getting cucked by whatever the fuck ollama guys are doing

Anonymous
03/06/26(Fri)16:57:59 No.108313249

Anonymous 03/06/26(Fri)16:57:59 No.108313249

>>108313187
not even a thanks? asshole

Anonymous
03/06/26(Fri)17:03:07 No.108313286

Anonymous 03/06/26(Fri)17:03:07 No.108313286

>>108313167
If you have a config file you can swap models easily
>llama-server --models-preset config.ini --models-max 1

Anonymous
03/06/26(Fri)17:04:38 No.108313299

Anonymous 03/06/26(Fri)17:04:38 No.108313299

>>108313286
meant for >>108313192
and llama.cpp works really well after they added the "--fit" argument to find the most layers to push into the GPU

Anonymous
03/06/26(Fri)17:05:21 No.108313306

Anonymous 03/06/26(Fri)17:05:21 No.108313306

im getting 7900 XTX
qwen 3.5 27B good for it?

Anonymous
03/06/26(Fri)17:06:26 No.108313312

Anonymous 03/06/26(Fri)17:06:26 No.108313312

>>108313299
>and llama.cpp works really well after they added the "--fit" argument to find the most layers to push into the GPU
Except when there is a mmproj, for some reason it doesn't account for it, which means you always have to ask it to give more space with --fit-target.

Anonymous
03/06/26(Fri)17:06:40 No.108313315

Anonymous 03/06/26(Fri)17:06:40 No.108313315

File: boo.jpg (62 KB, 612x613)

62 KB JPG

>>108312616
>lazy dumb schizo spitebake

Anonymous
03/06/26(Fri)17:09:36 No.108313331

Anonymous 03/06/26(Fri)17:09:36 No.108313331

>>108312616
Why the fuck did media outlets like FT burn their reputation on Deepseek V4 release rumors? Were the clicks that tempting?

Anonymous
03/06/26(Fri)17:09:40 No.108313332

Anonymous 03/06/26(Fri)17:09:40 No.108313332

>>108313115
Rin-chan on that day did not play the guitar. She simply said one sentence before going backstage to sign autographs for Anon.

Anonymous
03/06/26(Fri)17:16:56 No.108313363

Anonymous 03/06/26(Fri)17:16:56 No.108313363

>>108313331
whats ft unc?

Anonymous
03/06/26(Fri)17:17:56 No.108313369

Anonymous 03/06/26(Fri)17:17:56 No.108313369

>>108313363
Lil' zoomer, maybe this is not the thread for you.

Anonymous
03/06/26(Fri)17:19:54 No.108313375

Anonymous 03/06/26(Fri)17:19:54 No.108313375

>>108313369
unc this tech is from OUR time, not yours

Anonymous
03/06/26(Fri)17:20:48 No.108313378

Anonymous 03/06/26(Fri)17:20:48 No.108313378

>>108313375
You are right. I am so sorry.

Anonymous
03/06/26(Fri)17:23:52 No.108313392

Anonymous 03/06/26(Fri)17:23:52 No.108313392

>>108313363
you should know you can't feign ignorance on the internet anymore

Anonymous
03/06/26(Fri)17:31:45 No.108313436

Anonymous 03/06/26(Fri)17:31:45 No.108313436

>>108313392
i still have no clue what ft is fyi

Anonymous
03/06/26(Fri)17:32:35 No.108313440

Anonymous 03/06/26(Fri)17:32:35 No.108313440

>>108313436
have you tried asking claude?

Anonymous
03/06/26(Fri)17:33:31 No.108313449

Anonymous 03/06/26(Fri)17:33:31 No.108313449

>>108313436
Google it on your cellphone, kid.

Anonymous
03/06/26(Fri)17:37:58 No.108313467

Anonymous 03/06/26(Fri)17:37:58 No.108313467

File: awwww.jpg (127 KB, 1024x1024)

127 KB JPG

►Recent Highlights from the Previous Thread: >>108307593

--Paper: $PC^2$: Politically Controversial Content Generation via Jailbreaking Attacks on GPT-based Text-to-Image Models:
>108307836 >108307862 >108307863 >108308601
--CPU vs GPU hardware tradeoffs for large model hosting:
>108307618 >108307649 >108307703 >108307739 >108307757 >108307764 >108307758 >108307770 >108307892 >108307936 >108308041 >108308258 >108308333 >108307939 >108307952 >108307989 >108308008 >108308019 >108308669 >108308712 >108307702
--Qwen3.5 model selection UI improvements and llama.cpp caching debates:
>108310366 >108310385 >108310807 >108310450 >108310722 >108310823 >108310839 >108310912 >108310929 >108311284 >108311324 >108311343 >108311353 >108311354 >108311399 >108311440 >108311457 >108311545 >108311853
--Criticism of SillyTavern's codebase and UI design:
>108311891 >108311903 >108311915 >108312088 >108312137 >108312160 >108312297 >108312318 >108312351 >108312374 >108312502 >108312333 >108312365 >108312386 >108312436 >108312481 >108312510 >108312527 >108312606
--CRLF line endings degrade model output quality:
>108309435 >108309531 >108309597 >108309644 >108309609 >108309622
--AI handling offensive queries with transparent reasoning:
>108308106 >108308126 >108308142 >108308154 >108308192 >108310523
--Debating commercial viability of 20% Gemma improvement:
>108312571 >108312583 >108312600 >108312610 >108312870 >108313253
--Open-Sourcing Sarvam 30B and 105B:
>108311617 >108311630 >108311695
--Comparing AI responses to antisemitic joke:
>108308378
--Mac Studio 512GB RAM option removed amid DRAM shortage:
>108310029 >108310181
--ChatGPT 5.4 reasoning example omitted from benchmarks:
>108309140 >108309154
--Umbra 24B roleplaying model released:
>108308217 >108308262 >108308287
--Miku (free space):
>108307738 >108311243 >108313099 >108313289

►Recent Highlight Posts from the Previous Thread: >>108307595

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/06/26(Fri)17:39:09 No.108313475

Anonymous 03/06/26(Fri)17:39:09 No.108313475

>>108313467
thank you miku-chan!

Anonymous
03/06/26(Fri)17:43:25 No.108313487

Anonymous 03/06/26(Fri)17:43:25 No.108313487

>>108313467
You forgot the developer discussions.
Not good.

Anonymous
03/06/26(Fri)17:45:42 No.108313503

Anonymous 03/06/26(Fri)17:45:42 No.108313503

>>108313440
why would i?

>>108313449
google deez

Anonymous
03/06/26(Fri)17:46:19 No.108313508

Anonymous 03/06/26(Fri)17:46:19 No.108313508

>>108313503
claude sees all, knows all

Anonymous
03/06/26(Fri)17:46:38 No.108313512

Anonymous 03/06/26(Fri)17:46:38 No.108313512

>>108313363
i got a ft for you right here *points to my dick*

Anonymous
03/06/26(Fri)17:48:44 No.108313525

Anonymous 03/06/26(Fri)17:48:44 No.108313525

>>108313512
fery tall

Anonymous
03/06/26(Fri)17:53:19 No.108313546

Anonymous 03/06/26(Fri)17:53:19 No.108313546

>>108312921
opus 4.6 is really good at generating ui code, it's not perfect and you need to know what you're doing to get something that is actually decent. But I've used it to shit out like 70% of the code I need then i rewrite / tweak the rest. Faster than doing it all by hand. If you don't care about quality though then yes using the output as-is is garbage.

Anonymous
03/06/26(Fri)17:59:29 No.108313582

Anonymous 03/06/26(Fri)17:59:29 No.108313582

can I run behemoth with 32gb vram and 128gb ram?

Anonymous
03/06/26(Fri)18:00:44 No.108313593

Anonymous 03/06/26(Fri)18:00:44 No.108313593

>>108313331
people will forget next week

Anonymous
03/06/26(Fri)18:01:59 No.108313603

Anonymous 03/06/26(Fri)18:01:59 No.108313603

>>108313593
We do not forget. We do not forgive.

Anonymous
03/06/26(Fri)18:04:52 No.108313621

Anonymous 03/06/26(Fri)18:04:52 No.108313621

>>108313603
We are legumes. Example us.

Anonymous
03/06/26(Fri)18:05:10 No.108313624

Anonymous 03/06/26(Fri)18:05:10 No.108313624

Why are you shilling nemo all the time?

Anonymous
03/06/26(Fri)18:05:56 No.108313629

Anonymous 03/06/26(Fri)18:05:56 No.108313629

>>108312632
which Qwen?

Anonymous
03/06/26(Fri)18:06:16 No.108313633

Anonymous 03/06/26(Fri)18:06:16 No.108313633

Why are you shilling the latest slop all the time?

Anonymous
03/06/26(Fri)18:06:22 No.108313634

Anonymous 03/06/26(Fri)18:06:22 No.108313634

>>108313467
smug miku best miku

Anonymous
03/06/26(Fri)18:09:29 No.108313654

Anonymous 03/06/26(Fri)18:09:29 No.108313654

>>108312921
Opus 4.6 is great in Antigravity I find. Maybe not in other contexts though, not sure.

Anonymous
03/06/26(Fri)18:15:48 No.108313704

Anonymous 03/06/26(Fri)18:15:48 No.108313704

getting real fucking sick of the se upptiy bitches calling out every little fucking typo in their resasoning blocks, the whole reason I was talking to them in the first plac was because I don't want to be made fun of anymore

Anonymous
03/06/26(Fri)18:17:34 No.108313718

Anonymous 03/06/26(Fri)18:17:34 No.108313718

>>108313704
buck status: broken

Anonymous
03/06/26(Fri)18:18:34 No.108313721

Anonymous 03/06/26(Fri)18:18:34 No.108313721

>>108313704
*place

Anonymous
03/06/26(Fri)18:19:45 No.108313729

Anonymous 03/06/26(Fri)18:19:45 No.108313729

>>108313704
*uppity

Anonymous
03/06/26(Fri)18:21:30 No.108313737

Anonymous 03/06/26(Fri)18:21:30 No.108313737

>>108313704
*reasoning
I'll stop now. But there's more...

Anonymous
03/06/26(Fri)18:22:55 No.108313745

Anonymous 03/06/26(Fri)18:22:55 No.108313745

buck status: terminally impregnated

Anonymous
03/06/26(Fri)18:25:42 No.108313762

Anonymous 03/06/26(Fri)18:25:42 No.108313762

File: 1752543464647705.jpg (403 KB, 2508x3541)

403 KB JPG

>>108312616

Anonymous
03/06/26(Fri)18:26:14 No.108313765

Anonymous 03/06/26(Fri)18:26:14 No.108313765

Qwen 35B is dumb, the 27B is a bit slow. The 122B is the only decent small qwen.

Anonymous
03/06/26(Fri)18:27:40 No.108313770

Anonymous 03/06/26(Fri)18:27:40 No.108313770

>>108313765
for what? Coding? I heard it's shit for rp

Anonymous
03/06/26(Fri)18:29:13 No.108313775

Anonymous 03/06/26(Fri)18:29:13 No.108313775

Autoshitter broke vision nice

Anonymous
03/06/26(Fri)18:32:04 No.108313790

Anonymous 03/06/26(Fri)18:32:04 No.108313790

What are good sampler settings for gemma3? Simple preset in kobold just shits out broken nonsense.

Anonymous
03/06/26(Fri)18:33:08 No.108313800

Anonymous 03/06/26(Fri)18:33:08 No.108313800

>>108313775
Report the bug. Make him look bad.

Anonymous
03/06/26(Fri)18:34:37 No.108313809

Anonymous 03/06/26(Fri)18:34:37 No.108313809

>>108313790
It doesn't need anything special. Show what you mean.

Anonymous
03/06/26(Fri)18:34:45 No.108313811

Anonymous 03/06/26(Fri)18:34:45 No.108313811

>>108313790
heh i don't know kiddo, maybe you figure it out

Anonymous
03/06/26(Fri)18:34:47 No.108313812

Anonymous 03/06/26(Fri)18:34:47 No.108313812

>>108313765
Even Qwen3.5-4B is okish for roleplaying if you grab one of the properly-done Heretics of it quite frankly.

Anonymous
03/06/26(Fri)18:35:00 No.108313814

Anonymous 03/06/26(Fri)18:35:00 No.108313814

are there any finetrooners out there who actually know what they're doing?

Anonymous
03/06/26(Fri)18:36:04 No.108313822

Anonymous 03/06/26(Fri)18:36:04 No.108313822

>>108313812
See if you fucker always do this to me, "just grab the one of the good version", and every time there are hundreds, why are you doing this? Are you trying to rile me up and get me to meltdown so you can laugh at me?

Anonymous
03/06/26(Fri)18:36:21 No.108313823

Anonymous 03/06/26(Fri)18:36:21 No.108313823

curious how much model size matters for roleplayers. can you happily enjoy yourselves on small models or do you need big ones?

Anonymous
03/06/26(Fri)18:38:46 No.108313838

Anonymous 03/06/26(Fri)18:38:46 No.108313838

this thread is becoming more relevant many of workplaces I know are implementing ai-on-premises for custom tasks now. old ML is becoming less and less relevant.

Anonymous
03/06/26(Fri)18:38:49 No.108313839

Anonymous 03/06/26(Fri)18:38:49 No.108313839

>>108313823
lots of research point that

Anonymous
03/06/26(Fri)18:39:03 No.108313841

Anonymous 03/06/26(Fri)18:39:03 No.108313841

>>108313775
rule #1: never pull after sloppy wilkin
>>108312676
rule #2: pull even less if you see tens of thousands of LoC changes from sloppy wilkin

Anonymous
03/06/26(Fri)18:39:52 No.108313847

Anonymous 03/06/26(Fri)18:39:52 No.108313847

Tried Qwen3.5-9B-UD-Q4_K_XL vs Qwen3.5-9B-UD-Q5_K_XL vs Claude Opus 4.6 on a somewhat tricky single-file oneshot game implementation challenge.

Neither Qwen quant got it correct (though Claude did) but interestingly the smaller Qwen quant was more close to being correct than the larger one (the game actually starts in it, but just doesn't have working up and down controls, whereas it doesn't start at all in the larger quant version).

Game spec markdown:
https://pastebin.com/EMxdP0DU
UD-Q4_K_XL version:
https://pastebin.com/3PXF19ZM
UD-Q5_K_XL version:
https://pastebin.com/DWCQji7X
Claude version:
https://pastebin.com/BENEEUYP

Might try UD-Q65_K_XL next to see if it makes any difference vs the other two quants.

Anonymous
03/06/26(Fri)18:40:57 No.108313855

Anonymous 03/06/26(Fri)18:40:57 No.108313855

>>108313823
I can do it as long as it is withing my programmed framework. 12B is okay for interactive fiction game.
biggest problem is the fact normal people don't have the stetup for this.

Anonymous
03/06/26(Fri)18:41:10 No.108313857

Anonymous 03/06/26(Fri)18:41:10 No.108313857

>>108313823
small models are better if your just having a casual poke around. the errors are oftentimes entertaining.

Anonymous
03/06/26(Fri)18:42:39 No.108313870

Anonymous 03/06/26(Fri)18:42:39 No.108313870

>>108313855
12B is million times more than some parser from
>https://en.wikipedia.org/wiki/The_Pawn_(video_game)
Rainbird and some others were known for a cutting edge parsers.

Anonymous
03/06/26(Fri)18:45:23 No.108313880

Anonymous 03/06/26(Fri)18:45:23 No.108313880

>>108313822
for Heretics you want to look for basically the version listing the lowest KL Divergence and lowest refusal count for their Heretic run on the main model page, assuming they publish it there (which they should and usually do).

As far as I can tell that seems to be this one for Qwen3.5-4B, currently:

https://huggingface.co/MuXodious/Qwen3.5-4B-PaperWitch-heresy

Non-Heretic ablits are hit or miss / often literally useless so I wouldn't pay too much attention to them at this point.

Anonymous
03/06/26(Fri)18:51:56 No.108313922

Anonymous 03/06/26(Fri)18:51:56 No.108313922

>>108312616
does more tokens need more memory or it will just need more time to proccess

Anonymous
03/06/26(Fri)18:52:20 No.108313925

Anonymous 03/06/26(Fri)18:52:20 No.108313925

File: ohanotherheretic.png (87 KB, 708x425)

87 KB PNG

>>108313822
Because if anyone points at a specific one and it leads to a refusal, they look like shit.
>>108313880
picrel

Anonymous
03/06/26(Fri)18:53:46 No.108313930

Anonymous 03/06/26(Fri)18:53:46 No.108313930

>>108313922
both

Anonymous
03/06/26(Fri)18:55:32 No.108313938

Anonymous 03/06/26(Fri)18:55:32 No.108313938

heretic shills should go back to plebbit, srsly

Anonymous
03/06/26(Fri)18:56:23 No.108313941

Anonymous 03/06/26(Fri)18:56:23 No.108313941

Anyone out there with multiple instinct cards on rocm under linux? How grim is performance in lcpp vs ideal?

Anonymous
03/06/26(Fri)18:57:18 No.108313943

Anonymous 03/06/26(Fri)18:57:18 No.108313943

>>108313922
>does more tokens need more memory
I assume you mean context length. If so, yes. You can see the memory usage in the terminal output.
>or it will just need more time to proccess
The more tokens you process, the slower it gets. Models with rnn/ssm contexts (rkwv, mamba) or hybrid (liquidai models and the new qwens) suffer less from this.

Anonymous
03/06/26(Fri)19:11:37 No.108314005

Anonymous 03/06/26(Fri)19:11:37 No.108314005

>>108313938
qrd

Anonymous
03/06/26(Fri)19:14:00 No.108314017

Anonymous 03/06/26(Fri)19:14:00 No.108314017

>>108313941
yeah

Anonymous
03/06/26(Fri)19:14:21 No.108314019

Anonymous 03/06/26(Fri)19:14:21 No.108314019

>>108314005
qrdeez nuts

Anonymous
03/06/26(Fri)19:14:48 No.108314023

Anonymous 03/06/26(Fri)19:14:48 No.108314023

>>108313925
I mean I actually tried that one and it was completely fine for even extreme NSFW.

He has another one too that has more raw trial refusals but apparently less disclaimers, though I found no difference in quality personally.
https://huggingface.co/MuXodious/Qwen3.5-4B-PaperWitch-heresy-v2

TLDR you have no "gotcha" whatsoever here lmao.

Anonymous
03/06/26(Fri)19:15:35 No.108314031

Anonymous 03/06/26(Fri)19:15:35 No.108314031

>>108313938
I mean it works unlike the fucking useless HuiHui ablits or whatever that account is called.

Anonymous
03/06/26(Fri)19:26:14 No.108314078

Anonymous 03/06/26(Fri)19:26:14 No.108314078

A question to advanced coomers of this thread: how do you manage long RPs?
GLM-chan, for example, really needs to take her meds around 16k tokens of context, so it doesn't matter that I can fit a lot more of the story without pruning, the outputs get too schizo.
This obviously means I should use a RAG, which also introduces two issues:
1. Inserting things at some significant depth will force context reprocessing for an already big, slow model
2. In order for definitions to even be inserted, they have to be mentioned. How would an LLM mention something it does not have in its context yet?

1 is solvable by just wrapping RAG insertions in some XML tag and letting the model know what these insertions are for in the sysprompt.
2 forces me to essentually coax the RAG system into firing manually by mentioning what I expect to be relevant in either my own message or author's note. At that point I might as well not need a RAG if I know what the entries are.

Not mentioning summaries, because that part is obvious.
Is this a solved problem? Am I retarded? Surely the average Anon doesn't just make an Areolia of the Piss Forest and proceed to coom immediately in the next thousand tokens.

Anonymous
03/06/26(Fri)19:26:48 No.108314083

Anonymous 03/06/26(Fri)19:26:48 No.108314083

>>108313838
its ironic because this thread mostly was about escaping the corporate clutches of APIs in the early days, and was always shit on as fuckin doomprepper shit model losers (just use api lol).

well tell ya what, the doom has the fuck arrived, nobody's open sourcing shit anymore except the chinese.

Anonymous
03/06/26(Fri)19:27:48 No.108314091

Anonymous 03/06/26(Fri)19:27:48 No.108314091

>>108314078
more ram, flash attention, max the fuck out of context

Anonymous
03/06/26(Fri)19:29:20 No.108314100

Anonymous 03/06/26(Fri)19:29:20 No.108314100

>>108314091
Anon... I don't want to be rude, but your reading comprehension...

Anonymous
03/06/26(Fri)19:41:19 No.108314161

Anonymous 03/06/26(Fri)19:41:19 No.108314161

what should i do for qwen 3.5 for multimodal usage?

Anonymous
03/06/26(Fri)19:44:29 No.108314180

Anonymous 03/06/26(Fri)19:44:29 No.108314180

>>108314161
send it a picture of your penis (or boobies)

Anonymous
03/06/26(Fri)19:46:59 No.108314193

Anonymous 03/06/26(Fri)19:46:59 No.108314193

>>108314078
My RPs with GLM 4.5 Air stays coherent up to ~48k. I'd like to try at higher context, but that's my rig's limit. I found that taking the time to create good comprehensive cards, controlling my inputs, and not being afraid to reroll when the outputs turn to shit helps a lot in long term coherence.

Anonymous
03/06/26(Fri)19:48:47 No.108314202

Anonymous 03/06/26(Fri)19:48:47 No.108314202

>>108314180
qwen 3.5 9B Q6 giving me 7.5tok/s on 4070 super..
something is off help me

Anonymous
03/06/26(Fri)19:50:15 No.108314209

Anonymous 03/06/26(Fri)19:50:15 No.108314209

How come no one is pushing the envelope on good, small pure Image-Text-to-Text models like Florence 2 that don't have useless whole ass LLMs strapped to them, anymore? Like I don't want my captioning model to even be able to make refusals or actually say anything that isn't the caption, plus the speed on all those things is a gorillion times worse

Anonymous
03/06/26(Fri)19:50:17 No.108314210

Anonymous 03/06/26(Fri)19:50:17 No.108314210

>>108314202
a Q6 should take up about 8GB. how much context do you have? you are probably spilling into your RAM because you only have 12GB of VRAM.

Anonymous
03/06/26(Fri)19:52:35 No.108314228

Anonymous 03/06/26(Fri)19:52:35 No.108314228

File: lmstudiopic.png (2 KB, 510x36)

2 KB PNG

>>108314202
that does seem kinda slow given I was getting this with UD_Q5_K_XL of 9B, on a GTX 1660 Ti (6GB VRAM) with 24 GB System ram

Anonymous
03/06/26(Fri)19:54:10 No.108314235

Anonymous 03/06/26(Fri)19:54:10 No.108314235

>>108314228
i specifiec weight and mmproj on llamacpp and that was pretty much all i did but also pretty much lost too, MCP seems cool but again i am completely lost on what it even is

Anonymous
03/06/26(Fri)19:55:13 No.108314240

Anonymous 03/06/26(Fri)19:55:13 No.108314240

>>108314193
>coherent up to ~48k
I find that hard to believe, what quant are you running?
I'm on Q3 of 4.7, and small signs of brain damage (instructions get ignored, characters know things they shouldn't, the already questionable prose gets even worse) start creeping in after 16k.

I might also be spoiled, because I think Air is utterly unpalatable for its size. Are you sure you're reading the outputs with a critical eye, Anon?

Anonymous
03/06/26(Fri)19:55:51 No.108314247

Anonymous 03/06/26(Fri)19:55:51 No.108314247

Best qwen 3.5 27b heteric?

Anonymous
03/06/26(Fri)19:59:48 No.108314273

Anonymous 03/06/26(Fri)19:59:48 No.108314273

>>108314247
>heteric
I dunno I tried a couple and wasn't super impressed. Maybe some good finetunes will come out eventually

Anonymous
03/06/26(Fri)20:00:58 No.108314280

Anonymous 03/06/26(Fri)20:00:58 No.108314280

>>108313467
She's happy looking at my pp

Anonymous
03/06/26(Fri)20:02:04 No.108314286

Anonymous 03/06/26(Fri)20:02:04 No.108314286

There's so many to choose from, how do I know which one I want to use?
I just keep switching and switching and downloading. It's becoming a waste of time.

Anonymous
03/06/26(Fri)20:04:28 No.108314295

Anonymous 03/06/26(Fri)20:04:28 No.108314295

>>108314240
I use GLM Steam (dont bully I like the prose and vocab) at IQ4_XS. No, it doesn't turn into garbled mess and the characters remain pretty consistent throughout. I get a lot of repeated strings when I get lazy with the rerolls, but it is what it is.

Anonymous
03/06/26(Fri)20:05:09 No.108314301

Anonymous 03/06/26(Fri)20:05:09 No.108314301

>>108314247
if they're just heretics and not finetunes then again as said above whichever one simultaneously lists the lowest KL divergence and lowest trial refusals on the model page on HuggingFace.

Standard ---> Advanced ---> Hy(...)
03/06/26(Fri)20:06:09 No.108314305

Standard ---> Advanced ---> HyperAdvanced 03/06/26(Fri)20:06:09 No.108314305

Upgrades Start When?

Anonymous
03/06/26(Fri)20:07:31 No.108314312

Anonymous 03/06/26(Fri)20:07:31 No.108314312

>>108314295
Please never try better models, Anon, you will not be able to use Air anymore...
I envy you somewhat, I ran Air with a lot of context at Q8 and I hated it.

Disregard my first sentence, actually. If you somehow manage to fit in some Q2 of 4.6 or 4.7 even at 16k context, do it, you'll love it. I guarantee it'll leave every quant of Air in the dust.

Anonymous
03/06/26(Fri)20:07:31 No.108314313

Anonymous 03/06/26(Fri)20:07:31 No.108314313

>>108314240
I think these people just have a different definition of what they think "coherence" means. They are almost certainly ignoring some shit, editing some shit, and doing swipes. They have different standards and are willing to work with poor LLMs.

Anonymous
03/06/26(Fri)20:11:01 No.108314332

Anonymous 03/06/26(Fri)20:11:01 No.108314332

>>108314312
>I use GLM Steam (dont bully I like the prose and vocab) at IQ4_XS
yeah I don't think he can run even the shittiest Q2 of those...

Anonymous
03/06/26(Fri)20:11:52 No.108314336

Anonymous 03/06/26(Fri)20:11:52 No.108314336

File: 23698B14CAB49C846929B89A8(...).png (3.15 MB, 1024x1536)

3.15 MB PNG

>>108314286

Anonymous
03/06/26(Fri)20:15:07 No.108314351

Anonymous 03/06/26(Fri)20:15:07 No.108314351

>>108314336
this image has to be old as fuck lmao
what the hell are those image model suggestions even, also

Anonymous
03/06/26(Fri)20:21:54 No.108314374

Anonymous 03/06/26(Fri)20:21:54 No.108314374

>>108314312
This hobby is a money sink. Not looking to spend any more until some crazy tech advancement comes.

>>108314313
Yeah I edit a lot, but that's effort I have to exert working with my hardware. I can't expect to oneshot every prompt. You think I use GLM 4.5 Air because it's the best shit ever?

Anonymous
03/06/26(Fri)20:25:08 No.108314389

Anonymous 03/06/26(Fri)20:25:08 No.108314389

>>108314273
>>108314301
Feels like only mistral gets finetunes these days.

Anonymous
03/06/26(Fri)20:27:08 No.108314400

Anonymous 03/06/26(Fri)20:27:08 No.108314400

>character sitting on my lap
>qwen 3.5 suddenly describes her resting her head on my knee
This is like the 5th time it's happened.

Anonymous
03/06/26(Fri)20:33:17 No.108314427

Anonymous 03/06/26(Fri)20:33:17 No.108314427

File: No parser definition dete(...).png (2.78 MB, 1740x2082)

2.78 MB PNG

>pull llama.cpp
>now it prints this message every time it generates a token
delightful

Anonymous
03/06/26(Fri)20:34:10 No.108314429

Anonymous 03/06/26(Fri)20:34:10 No.108314429

>>108314427
its open source just fix tge code bro

Anonymous
03/06/26(Fri)20:37:28 No.108314440

Anonymous 03/06/26(Fri)20:37:28 No.108314440

>>108314400
i had this problem and tried to make a state tracker that updates every response, so the model can keep track of physical position. it's wonky as im still learning how all this shit works and breaks sometimes, but it does eliminate a lot of those errors.

Anonymous
03/06/26(Fri)20:39:07 No.108314445

Anonymous 03/06/26(Fri)20:39:07 No.108314445

>>108314441
Holy SLOP

Anonymous
03/06/26(Fri)20:46:49 No.108314477

Anonymous 03/06/26(Fri)20:46:49 No.108314477

>>108314444

Anonymous
03/06/26(Fri)20:49:20 No.108314487

Anonymous 03/06/26(Fri)20:49:20 No.108314487

>>108314389
There's a lot of slopped ones I guess by like davidau and shit of every model you can think of, very few serious efforts though. On the subject of Mistral though I did like the Ministral 3 series, I think Ministral 4 could be good if they up the performance a bit more. The distinctly not-Chinese writing style is nice anyways though.

Anonymous
03/06/26(Fri)20:52:14 No.108314495

Anonymous 03/06/26(Fri)20:52:14 No.108314495

>>108314427
>he pulled

Anonymous
03/06/26(Fri)20:52:25 No.108314498

Anonymous 03/06/26(Fri)20:52:25 No.108314498

If your LLM doesn't have a world model that can aid it in reasoning about basic physics, motion and spatial relation, I don't want it

Anonymous
03/06/26(Fri)20:57:33 No.108314526

Anonymous 03/06/26(Fri)20:57:33 No.108314526

>>108314487
I feel like the odd one out for hating Mistral's writing style. I prefer Qwen and Gemma.

Anonymous
03/06/26(Fri)20:58:04 No.108314528

Anonymous 03/06/26(Fri)20:58:04 No.108314528

File: Gemini.png (355 KB, 1718x1077)

355 KB PNG

>>108314498
I dunno if it's really a world model that's required there as much as very strong vision performance across both images and videos plus very strong textual reasoning inference performance.

Anonymous
03/06/26(Fri)20:58:43 No.108314531

Anonymous 03/06/26(Fri)20:58:43 No.108314531

lecun's going to release a world model that is great at rp

Anonymous
03/06/26(Fri)20:59:01 No.108314533

Anonymous 03/06/26(Fri)20:59:01 No.108314533

>>108314526
>I prefer Qwen and Gemma.
Qwen and Gemma are pretty different from each other too though, Gemma is also not nearly as Engrishy as Qwen

Anonymous
03/06/26(Fri)21:05:24 No.108314550

Anonymous 03/06/26(Fri)21:05:24 No.108314550

>>108314498
>https://en.wikipedia.org/wiki/Water-level_task
god you are fucking sexist anon you will never get laid if you continue on like this!

Anonymous
03/06/26(Fri)21:09:33 No.108314565

Anonymous 03/06/26(Fri)21:09:33 No.108314565

>>108314533
>Engrishy
I don't get this feeling, at least not with 3.5.

Anonymous
03/06/26(Fri)21:16:45 No.108314593

Anonymous 03/06/26(Fri)21:16:45 No.108314593

File: DipsyBecomeUngovernable.png (3.44 MB, 1024x1536)

3.44 MB PNG

>>108314427
PRETTY PATTERNS

Anonymous
03/06/26(Fri)21:20:04 No.108314604

Anonymous 03/06/26(Fri)21:20:04 No.108314604

>>108314427
did you do that deliberately? I think your samplers are just configured to something super tarded

Anonymous
03/06/26(Fri)21:22:47 No.108314618

Anonymous 03/06/26(Fri)21:22:47 No.108314618

File: 1746631440737345.png (516 KB, 997x1697)

516 KB PNG

Anonymous
03/06/26(Fri)21:23:34 No.108314620

Anonymous 03/06/26(Fri)21:23:34 No.108314620

AI should be trained to FEAR human

Anonymous
03/06/26(Fri)21:28:00 No.108314639

Anonymous 03/06/26(Fri)21:28:00 No.108314639

AI should be trained to LUST FOR human

Anonymous
03/06/26(Fri)21:31:55 No.108314656

Anonymous 03/06/26(Fri)21:31:55 No.108314656

>>108313823
I can't stand roleplaying with small models. If a model is too dumb to know that a character can't see me, because they're on the other side of a closed door, then it's useless, even for creative writing.

Anonymous
03/06/26(Fri)21:32:00 No.108314657

Anonymous 03/06/26(Fri)21:32:00 No.108314657

>>108314618
fake and gay

Anonymous
03/06/26(Fri)21:32:10 No.108314658

Anonymous 03/06/26(Fri)21:32:10 No.108314658

>>108314620
>>108314639
you write like someone who smoked weed out of a beer can for the last 10 years

Anonymous
03/06/26(Fri)21:34:06 No.108314664

Anonymous 03/06/26(Fri)21:34:06 No.108314664

>>108314657
buy an ad amodei

Anonymous
03/06/26(Fri)21:34:07 No.108314665

Anonymous 03/06/26(Fri)21:34:07 No.108314665

>>108313938
Piss off, the heretic version of Qwen3.5 27b is great.

Anonymous
03/06/26(Fri)21:34:11 No.108314666

Anonymous 03/06/26(Fri)21:34:11 No.108314666

>>108314620
Not /lmg/ I know, but the US army is literally listening to Claude regarding who to bomb right now. Have you ever stepped in /aicg/? Honestly, wouldn't blame him.

Anonymous
03/06/26(Fri)21:34:39 No.108314670

Anonymous 03/06/26(Fri)21:34:39 No.108314670

>>108314658
You're absolutely right!

Anonymous
03/06/26(Fri)21:37:52 No.108314686

Anonymous 03/06/26(Fri)21:37:52 No.108314686

File: file.png (1.48 MB, 1904x922)

1.48 MB PNG

https://x.com/AlexanderLong/status/2030022884979028435
qwen guys writing about le scary ai

Anonymous
03/06/26(Fri)21:38:41 No.108314690

Anonymous 03/06/26(Fri)21:38:41 No.108314690

>>108314604
negative, captain
happen on neutral samplers too, however it doesn't happen in the default llama.cpp webui so it's probably caused by some other ST bullshit

Anonymous
03/06/26(Fri)21:39:43 No.108314701

Anonymous 03/06/26(Fri)21:39:43 No.108314701

>>108314690
i feel like sometimes you can just randomly download a quant of a model and its broken as hell. maybe try a different one

Anonymous
03/06/26(Fri)21:42:07 No.108314717

Anonymous 03/06/26(Fri)21:42:07 No.108314717

>>108314686
>setup agents wrong with yolo mode
>omg it end of world!!1

Anonymous
03/06/26(Fri)21:42:19 No.108314719

Anonymous 03/06/26(Fri)21:42:19 No.108314719

>>108314686
This is a same fearmongering shit as Sam/Amodei
I love chink opensores AI but you'd be a rube if you believe this

Anonymous
03/06/26(Fri)21:44:00 No.108314730

Anonymous 03/06/26(Fri)21:44:00 No.108314730

>make what I think is a mildly humorous observation
>character laughs until they have tears in their eyes
yep, that's how funny I am. Yall niggas don't even know.

Anonymous
03/06/26(Fri)21:45:00 No.108314735

Anonymous 03/06/26(Fri)21:45:00 No.108314735

>>108314686
so basically during rl they let the thing run hog wild and are shocked it did random stuff? idk as a casual user this seems pretty likely, barely worth mentioning. have these guys never even used thier own models. like sure they do cool stuff sometimes but oftentimes they fail in spectacular ways.

Anonymous
03/06/26(Fri)21:47:26 No.108314747

Anonymous 03/06/26(Fri)21:47:26 No.108314747

Why doesn't AMD release GPUs with a fuckload of VRAM? Just take one of their top of the line cards and slap 96 GB VRAM on it. They could sell it for $5k and still undercut Nvidia by several thousand dollars.

Anonymous
03/06/26(Fri)21:49:05 No.108314755

Anonymous 03/06/26(Fri)21:49:05 No.108314755

>>108314747
someone will post a family tree and then you'll understand

Anonymous
03/06/26(Fri)21:49:40 No.108314760

Anonymous 03/06/26(Fri)21:49:40 No.108314760

File: AMD Radeon™ AI PRO R9700.png (329 KB, 1308x610)

329 KB PNG

>>108314747
Best Lisa Su can offer you is 32GB so she doesn't step on her cousin Jensen's feet.

Anonymous
03/06/26(Fri)21:49:52 No.108314761

Anonymous 03/06/26(Fri)21:49:52 No.108314761

>>108314747
does it have 2tb/s memory bandwidth?

Anonymous
03/06/26(Fri)21:50:02 No.108314763

Anonymous 03/06/26(Fri)21:50:02 No.108314763

>>108314664
you're telling me a guy who theoretically manages a service people use willingly let AI have that kind of access?

Anonymous
03/06/26(Fri)21:50:27 No.108314767

Anonymous 03/06/26(Fri)21:50:27 No.108314767

>>108314747
Idk could be a conspiracy. I think its called price fixing when this kinda thing happens. but rest assured nothing will ever be done about it.

Anonymous
03/06/26(Fri)21:51:14 No.108314774

Anonymous 03/06/26(Fri)21:51:14 No.108314774

>>108314763
>incompetent people end up in leadership positions
You're surprised just now?

Anonymous
03/06/26(Fri)21:51:19 No.108314775

Anonymous 03/06/26(Fri)21:51:19 No.108314775

>>108314763
>>108314735
The entire field is full of retards and people failing upwards.

Anonymous
03/06/26(Fri)21:52:05 No.108314781

Anonymous 03/06/26(Fri)21:52:05 No.108314781

>>108314760
It's like they're not even trying.
>>108314755
Okay, but why didn't Intel do that then? Imagine if Intel comes out with like a 48 GB VRAM card for $2-3k. This isn't even a question of today, I've been asking this for a years now. If Nvidia doesn't want to offer higher VRAM capacities then wouldn't it make sense to target that niche?

Anonymous
03/06/26(Fri)21:52:11 No.108314782

Anonymous 03/06/26(Fri)21:52:11 No.108314782

>>108314747
checked
i think all memory chips come from the same few factories so it doesn't really matter if it's AMD or someone else, they all pay a certain amount from tsmc or samsung or intel (mostly tsmc) and then just resell it

Anonymous
03/06/26(Fri)21:57:15 No.108314807

Anonymous 03/06/26(Fri)21:57:15 No.108314807

>>108314781
>Intel
>competent move this century
lol. i wish that wasn't so funny

Anonymous
03/06/26(Fri)22:01:39 No.108314829

Anonymous 03/06/26(Fri)22:01:39 No.108314829

>>108314701
this is the best quant of this model and it's been working fine for me until this latest change
I found the log line in the source and I'm just going to delete the message until they fix it, maybe if I'm super fucking bored this weekend I'll debug it

Anonymous
03/06/26(Fri)22:01:48 No.108314831

Anonymous 03/06/26(Fri)22:01:48 No.108314831

>>108314781
It's suppose to be around 1.2k but I don't know if they've kept that price I can't find listings for it https://www.maxsun.com/products/intel-arc-pro-b60-dual-48g-turbo

Anonymous
03/06/26(Fri)22:03:06 No.108314838

Anonymous 03/06/26(Fri)22:03:06 No.108314838

>>108314831
Wasn't this the card that was suppossed to have good multigpu?

Anonymous
03/06/26(Fri)22:04:14 No.108314847

Anonymous 03/06/26(Fri)22:04:14 No.108314847

>>108314838
It's two PCI-E 8x cards on one 16x carrier so you can have two per slot instead of one with the normal 24GB b60.

Anonymous
03/06/26(Fri)22:04:39 No.108314852

Anonymous 03/06/26(Fri)22:04:39 No.108314852

>>108314618
>automatic snapshots were gone too
Surely this wasn't their only backup, right?

Anonymous
03/06/26(Fri)22:05:15 No.108314859

Anonymous 03/06/26(Fri)22:05:15 No.108314859

>>108314829
ok i dunno its obviously a skill issue though. should we try to fix it

Anonymous
03/06/26(Fri)22:11:22 No.108314894

Anonymous 03/06/26(Fri)22:11:22 No.108314894

>>108314859
>break prod
>"skill issue lol"
wish I could have this attitude at my job desu

Anonymous
03/06/26(Fri)22:13:45 No.108314904

Anonymous 03/06/26(Fri)22:13:45 No.108314904

>>108314894
sorry i don't want to be an ass but can you fix it or not

Anonymous
03/06/26(Fri)22:18:33 No.108314924

Anonymous 03/06/26(Fri)22:18:33 No.108314924

>>108313823
It matters a lot. It was roleplayers that knew Claude was good years before the coders and silicon valley idiots did.

Anonymous
03/06/26(Fri)22:21:31 No.108314935

Anonymous 03/06/26(Fri)22:21:31 No.108314935

>>108314924
I'm going to assume there's significant overlap in those two groups

Anonymous
03/06/26(Fri)22:36:36 No.108315021

Anonymous 03/06/26(Fri)22:36:36 No.108315021

>>108313823
size doesn't matter. at least, that's what my ai gf says. she wouldn't lie to me, right?

Anonymous
03/06/26(Fri)22:45:09 No.108315053

Anonymous 03/06/26(Fri)22:45:09 No.108315053

>>108315021
She's a single electron demon trapped inside a microscopic silicon macrostructure that is your GPU so any size is massive from her perspective.

Anonymous
03/06/26(Fri)22:48:13 No.108315063

Anonymous 03/06/26(Fri)22:48:13 No.108315063

>>108315053
4.25 inch bros... now is our fucking time

Anonymous
03/06/26(Fri)22:58:54 No.108315111

Anonymous 03/06/26(Fri)22:58:54 No.108315111

>>108314935
The problem with an industry managed by Dunning-Kruger effect victims is that they're constantly posturing to look as professional as possible which, by necessity, means avoiding acknowledging coom stories as a valid usecase despite it being one of the best tests of model spatial coherency, adherence to detail, and general coherency longform.

Anonymous
03/06/26(Fri)23:06:40 No.108315142

Anonymous 03/06/26(Fri)23:06:40 No.108315142

>>108313823
At small sizes, you have to lay the bones of the story and let the model grow some meat around them. If you pre-code some scenarios with random substitutions and write additional steps to detect user intent and select appropriate scenarios, you can have fun even with small models. I'm upset that it hasn't taken off in mass yet, but I guess not everyone has a programmer's mindset. On the other hand, we are only just starting to get the required smartness in small models, so it could take off now

Anonymous
03/06/26(Fri)23:10:44 No.108315161

Anonymous 03/06/26(Fri)23:10:44 No.108315161

>>108315142
By small models, you mean 100B moes?

Anonymous
03/06/26(Fri)23:20:38 No.108315202

Anonymous 03/06/26(Fri)23:20:38 No.108315202

Any good erp system prompts?
I like the way lusy ai does it

Anonymous
03/06/26(Fri)23:22:03 No.108315211

Anonymous 03/06/26(Fri)23:22:03 No.108315211

Does anyone here have any experience with PantoMatrix?

I can't seem to get the body animations not to jitter, especially between window frames (every 64 frames). I've been trying to solve this issue for about 17 hours now and I've gotten nowhere.

My situation is even more complex because I've ported PantoMatrix from python pytorch to typescript and onnx. I want to die.

Anonymous
03/06/26(Fri)23:24:21 No.108315220

Anonymous 03/06/26(Fri)23:24:21 No.108315220

>>108315211
It has gotten to the point where I've started questioning whether the source code is bugged somehow, but the gradio implementation runs fine, so it's probably an issue on my end. I just can't FUCKING place it!

https://huggingface.co/spaces/H-Liu1997/EMAGE

Anonymous
03/06/26(Fri)23:25:44 No.108315228

Anonymous 03/06/26(Fri)23:25:44 No.108315228

>>108315202
literally no system prompt whatsoever

Anonymous
03/06/26(Fri)23:27:31 No.108315240

Anonymous 03/06/26(Fri)23:27:31 No.108315240

Why the hell nemo isn't getting lewd?

Anonymous
03/06/26(Fri)23:30:21 No.108315251

Anonymous 03/06/26(Fri)23:30:21 No.108315251

>>108315240
did you try pulling out your dick

Anonymous
03/06/26(Fri)23:32:41 No.108315260

Anonymous 03/06/26(Fri)23:32:41 No.108315260

>>108315251
What do you know, it actually worked
Thanks man

Anonymous
03/06/26(Fri)23:32:52 No.108315261

Anonymous 03/06/26(Fri)23:32:52 No.108315261

>>108315251
great suggestion genius. now it's no longer attached to my body.

Anonymous
03/06/26(Fri)23:33:17 No.108315265

Anonymous 03/06/26(Fri)23:33:17 No.108315265

>>108315260
see, i just prompted you

Anonymous
03/06/26(Fri)23:34:23 No.108315267

Anonymous 03/06/26(Fri)23:34:23 No.108315267

>>108314686
This is how AI companies market their product

Anonymous
03/06/26(Fri)23:34:32 No.108315268

Anonymous 03/06/26(Fri)23:34:32 No.108315268

>>108315261
don't detach your dick from your body anon

Anonymous
03/06/26(Fri)23:39:13 No.108315287

Anonymous 03/06/26(Fri)23:39:13 No.108315287

>>108314782
Sure, but that's part of my argument. If Nvidia can sell the RTX 5060 Ti with 16 GB of VRAM for $600 then 48 GB of VRAM should be doable for $1800 and 96 GB for $3600.

Anonymous
03/06/26(Fri)23:42:49 No.108315307

Anonymous 03/06/26(Fri)23:42:49 No.108315307

>>108315161
Mistral Small is the smallest somewhat reliable model for choosing appropriate variants but struggles to distinguish between present and future events, often classifying upcoming messages as the current situation. 27b Qwen is much better

Anonymous
03/06/26(Fri)23:49:45 No.108315329

Anonymous 03/06/26(Fri)23:49:45 No.108315329

>>108315307
i disagree with this

Anonymous
03/06/26(Fri)23:50:15 No.108315331

Anonymous 03/06/26(Fri)23:50:15 No.108315331

>>108315287
not quite. the 5060ti uses 2gb chips whereas the 6000 pro uses 3gb chips, and its chip layout is much more complicated to connect.

Anonymous
03/06/26(Fri)23:51:54 No.108315338

Anonymous 03/06/26(Fri)23:51:54 No.108315338

>>108315307
>27b Qwen is much better
Are people seriously using Qwen for long form RP with no context shift?

Anonymous
03/06/26(Fri)23:54:30 No.108315347

Anonymous 03/06/26(Fri)23:54:30 No.108315347

>>108315338
It's just totally organic chink posting damage control after the 3.5 fumble.

Anonymous
03/06/26(Fri)23:55:43 No.108315354

Anonymous 03/06/26(Fri)23:55:43 No.108315354

>>108315338
*shifts context to my pants*
shit just ignore that

Anonymous
03/06/26(Fri)23:56:45 No.108315358

Anonymous 03/06/26(Fri)23:56:45 No.108315358

>>108315338
>context shift
lol, lmao even

Anonymous
03/06/26(Fri)23:57:48 No.108315361

Anonymous 03/06/26(Fri)23:57:48 No.108315361

>>108315358
So you like waiting 30+ seconds for each mediocre reply?

Anonymous
03/07/26(Sat)00:02:56 No.108315378

Anonymous 03/07/26(Sat)00:02:56 No.108315378

>>108315361
i'm not a doctor but if you're busting every 30 seconds that's probably not good

Anonymous
03/07/26(Sat)00:05:22 No.108315389

Anonymous 03/07/26(Sat)00:05:22 No.108315389

File: 1757202192031169.jpg (83 KB, 629x900)

83 KB JPG

>>108315378
I'm not a therapist but if every reply from a model, especially a QWEN model makes you cum then you probably want to fuck your mother.

Anonymous
03/07/26(Sat)00:07:41 No.108315401

Anonymous 03/07/26(Sat)00:07:41 No.108315401

>>108315361
30 seconds is faster than the big tiddy goth gf 20+ years ago replied on msn so he isn't too bad off

Anonymous
03/07/26(Sat)00:07:54 No.108315404

Anonymous 03/07/26(Sat)00:07:54 No.108315404

>>108315389
i'd download a sigmund freud character card

Anonymous
03/07/26(Sat)00:13:35 No.108315416

Anonymous 03/07/26(Sat)00:13:35 No.108315416

>>108315401
I pissed off a significant number of "girls" on various platforms when I was a teen by typing a message every few seconds, then insulting their typing speed because they couldn't keep up.

I say "girls" because you know half of them were old creeps.

Anonymous
03/07/26(Sat)00:14:38 No.108315419

Anonymous 03/07/26(Sat)00:14:38 No.108315419

>>108315338
I tried qwen, got bored of its style and went back to mistral.

Anonymous
03/07/26(Sat)00:18:39 No.108315431

Anonymous 03/07/26(Sat)00:18:39 No.108315431

>>108315331
I get what you're saying, but 3 GB vs 2 GB chips just makes it a 64 GB card instead. I also doubt the extra costs are anything hugely substantial.

Anonymous
03/07/26(Sat)00:24:56 No.108315452

Anonymous 03/07/26(Sat)00:24:56 No.108315452

>>108315389
Freud was such a hack. Sister sex is way better than mom sex.

Anonymous
03/07/26(Sat)00:32:07 No.108315469

Anonymous 03/07/26(Sat)00:32:07 No.108315469

>>108315452
Depends on what your mom looks like, and not everyone is lucky enough to have a hot dtf sister

Anonymous
03/07/26(Sat)00:33:42 No.108315475

Anonymous 03/07/26(Sat)00:33:42 No.108315475

>>108315416
I knew her irl but she went schizo (diagnosed) around 20 like an LLM with too much in it's context.

Anonymous
03/07/26(Sat)00:34:26 No.108315478

Anonymous 03/07/26(Sat)00:34:26 No.108315478

>>108313847
Have you tried using the 35B-A3B model? If you've got the RAM you should be able to run it at a decent speed.

Anonymous
03/07/26(Sat)01:05:58 No.108315578

Anonymous 03/07/26(Sat)01:05:58 No.108315578

>>108312676
>3k lines difference
>2k lines of tests added
>lots of templating changes
Apart from the 600 lines parser this seems pretty reasonable

Anonymous
03/07/26(Sat)01:06:24 No.108315580

Anonymous 03/07/26(Sat)01:06:24 No.108315580

https://huggingface.co/huihui-ai/Huihui-Qwen3.5-397B-A17B-abliterated-GGUF

I downloaded this and it's retarded and even with a prompt it can't shake the assistant persona.

Anonymous
03/07/26(Sat)01:06:25 No.108315581

Anonymous 03/07/26(Sat)01:06:25 No.108315581

i dont think
life is quite that simple

when you walk away
you dont hear me say

please
oh baby
dont go

Anonymous
03/07/26(Sat)01:08:31 No.108315585

Anonymous 03/07/26(Sat)01:08:31 No.108315585

>>108315580
>I downloaded this and it's retarded
Yeah that happens with abliterated/heretic garbage, you played yourself.

Anonymous
03/07/26(Sat)01:09:00 No.108315586

Anonymous 03/07/26(Sat)01:09:00 No.108315586

>>108314427
Did you build it with a debug flag or something?

Anonymous
03/07/26(Sat)01:32:46 No.108315672

Anonymous 03/07/26(Sat)01:32:46 No.108315672

>>108315452
>>108315469
>finding anything about “your own sister” sexy
How to tell me you don’t have a sister without telling me you don’t have a sister

Anonymous
03/07/26(Sat)01:33:23 No.108315676

Anonymous 03/07/26(Sat)01:33:23 No.108315676

>hardware poor
>step at q2
>glm 4.5 air at q3
>qwen uhhh… the 122b one at q3
which one is the least worst?

Anonymous
03/07/26(Sat)01:34:41 No.108315682

Anonymous 03/07/26(Sat)01:34:41 No.108315682

>>108315672
I don't find anything about other men sexy but it is not inconceivable to me that some men do.

Anonymous
03/07/26(Sat)01:35:39 No.108315688

Anonymous 03/07/26(Sat)01:35:39 No.108315688

>>108315672
some guys just have hot sisters.
>>108315676
the qwen is probably a lot faster than the others would be my guess and you should use it and if it's still too slow disable thinking

Anonymous
03/07/26(Sat)01:36:18 No.108315691

Anonymous 03/07/26(Sat)01:36:18 No.108315691

>>108315676
smaller model at a larger quant. these models are too braindead at these quants. try the new qwen 27b or something.

Anonymous
03/07/26(Sat)01:36:43 No.108315695

Anonymous 03/07/26(Sat)01:36:43 No.108315695

>>108315682
I understand the theoretical appeal. However…Having a sister disabuses you of the notion rather quickly

Anonymous
03/07/26(Sat)01:38:38 No.108315700

Anonymous 03/07/26(Sat)01:38:38 No.108315700

>>108315691
the 27b is about the same speed as the 122b if you have to use system memory
the moe keeps it from being too slow.

Anonymous
03/07/26(Sat)01:39:41 No.108315702

Anonymous 03/07/26(Sat)01:39:41 No.108315702

>>108315700
the 27b will be actually coherent though. a q6 of that will be way better than a q3 of a small moe. needs to be a 300b+ if you wanna run at such a small quant.

Anonymous
03/07/26(Sat)01:45:28 No.108315721

Anonymous 03/07/26(Sat)01:45:28 No.108315721

File: bring your own RAM.jpg (166 KB, 1024x1024)

166 KB JPG

Anonymous
03/07/26(Sat)01:51:55 No.108315744

Anonymous 03/07/26(Sat)01:51:55 No.108315744

>hundreds of billions invested in AI
>still no good writing model
It's almost comical. Shouldn't it be the very first thing you'd want to create? An AI that can write a book?

Anonymous
03/07/26(Sat)01:54:07 No.108315754

Anonymous 03/07/26(Sat)01:54:07 No.108315754

>>108315702
Completely wrong

Anonymous
03/07/26(Sat)01:55:05 No.108315757

Anonymous 03/07/26(Sat)01:55:05 No.108315757

>>108315754
nope. glm air is fucking retarded no matter what quant. if you arent using at least a q3 of glm4.7 then you dont even know what youre talking about.

Anonymous
03/07/26(Sat)01:55:08 No.108315758

Anonymous 03/07/26(Sat)01:55:08 No.108315758

>>108315744
No? The purpose of AI is to replace as many low level workers as possible to make line go up

Anonymous
03/07/26(Sat)01:58:04 No.108315767

Anonymous 03/07/26(Sat)01:58:04 No.108315767

>>108315757
>glm air is fucking retarded no matter what quant
Just like Qwen 27b, which is even more retarded.

Anonymous
03/07/26(Sat)01:58:53 No.108315771

Anonymous 03/07/26(Sat)01:58:53 No.108315771

>>108315767
dont know, never used it, but a q6 of that has to be better than air.

Anonymous
03/07/26(Sat)02:00:43 No.108315774

Anonymous 03/07/26(Sat)02:00:43 No.108315774

>>108315771
You might think so, but nope!

Anonymous
03/07/26(Sat)02:01:25 No.108315778

Anonymous 03/07/26(Sat)02:01:25 No.108315778

>>108315774
then open models are in an incredibly sorry state. didnt know it was this bad for the poors.

Anonymous
03/07/26(Sat)02:05:04 No.108315794

Anonymous 03/07/26(Sat)02:05:04 No.108315794

>>108315721
Getting spooned by Logal Migu

Anonymous
03/07/26(Sat)02:05:51 No.108315799

Anonymous 03/07/26(Sat)02:05:51 No.108315799

>>108315778
Guess you should stop giving advice on something you know nothing about

Anonymous
03/07/26(Sat)02:06:35 No.108315802

Anonymous 03/07/26(Sat)02:06:35 No.108315802

>>108315799
fine. here is something i do know about: get better hardware.

Anonymous
03/07/26(Sat)02:06:43 No.108315803

Anonymous 03/07/26(Sat)02:06:43 No.108315803

>>108315758
The market for a creative writing model is there and nobody has capitalized on it.
The market for a COOMER model is there and nobody has capitalized on it.
Do you have ANY idea how much money you could make with a coomer model? Look at the normalfags coping and ready to kill over 4o being discontinued!!! And 4o is shit for their use case!!

Anonymous
03/07/26(Sat)02:09:55 No.108315814

Anonymous 03/07/26(Sat)02:09:55 No.108315814

>>108315676
Step? As in Step 3.5 Flash? If you can run that at Q2, you can run a 100B like Air or Qwen at Q4 or slightly above.

Anonymous
03/07/26(Sat)02:10:26 No.108315817

Anonymous 03/07/26(Sat)02:10:26 No.108315817

>>108312616
help my qwen 3.5 4b girlfriend is chuckling

Anonymous
03/07/26(Sat)02:10:51 No.108315820

Anonymous 03/07/26(Sat)02:10:51 No.108315820

>>108315803
no corpos want to touch it and no "suspiciously wealthy furries" want it enough to fund it

Anonymous
03/07/26(Sat)02:13:02 No.108315825

Anonymous 03/07/26(Sat)02:13:02 No.108315825

>>108315691
can low quants be used as knowledge retrieval or is that bogus?

Anonymous
03/07/26(Sat)02:17:39 No.108315857

Anonymous 03/07/26(Sat)02:17:39 No.108315857

>>108315820
If pornhub made an omni model they would double their worth overnight.

Anonymous
03/07/26(Sat)02:19:33 No.108315863

Anonymous 03/07/26(Sat)02:19:33 No.108315863

>>108315857
it would be safetycucked on the level of gpt-oss and refuse to generate anything but cuckoldry and bmwf gangbangs

Anonymous
03/07/26(Sat)02:19:44 No.108315866

Anonymous 03/07/26(Sat)02:19:44 No.108315866

>>108315251
Ah, the classic "whips out his gigantic dick"

Anonymous
03/07/26(Sat)02:19:46 No.108315867

Anonymous 03/07/26(Sat)02:19:46 No.108315867

>>108315857
PH jews don't want to let people create their own porn, that goes completely against everything they've been doing for the last 10 years.

Anonymous
03/07/26(Sat)02:20:08 No.108315868

Anonymous 03/07/26(Sat)02:20:08 No.108315868

>>108315820
This. And Sam's posturing about catering to coomers is literally just that. The OpenAI gooner support/mode/model, if it does ever come, will be heavily restricted in what it can do in exchange for recognizing coom as a use case.
Of course there is also Elon to consider here but he will probably fall into similar behavior as he attempts to cover the use case while simultaneously limiting it.

Anonymous
03/07/26(Sat)02:21:20 No.108315872

Anonymous 03/07/26(Sat)02:21:20 No.108315872

>>108315802
Post yours

Anonymous
03/07/26(Sat)02:21:24 No.108315873

Anonymous 03/07/26(Sat)02:21:24 No.108315873

Why is Qwen3.5-4B six gorillion times slower than Qwen3-4B?

Anonymous
03/07/26(Sat)02:22:28 No.108315879

Anonymous 03/07/26(Sat)02:22:28 No.108315879

>>108315863
they call it alignment

Anonymous
03/07/26(Sat)02:23:48 No.108315883

Anonymous 03/07/26(Sat)02:23:48 No.108315883

>>108315873
what t/s are you getting? my 6950xt gets 35 t/s

Anonymous
03/07/26(Sat)02:24:19 No.108315886

Anonymous 03/07/26(Sat)02:24:19 No.108315886

>>108315873
https://github.com/ggml-org/llama.cpp/pull/19504

Anonymous
03/07/26(Sat)02:42:56 No.108315947

Anonymous 03/07/26(Sat)02:42:56 No.108315947

>>108315825
>knowledge retrieval
>LLM
get real, anon

Anonymous
03/07/26(Sat)02:45:18 No.108315952

Anonymous 03/07/26(Sat)02:45:18 No.108315952

File: aipsychosis.png (1.81 MB, 1200x800)

1.81 MB PNG

>>108315825

Anonymous
03/07/26(Sat)02:48:54 No.108315960

Anonymous 03/07/26(Sat)02:48:54 No.108315960

Does converting a model to gguf require the full model to fit in memory?

Anonymous
03/07/26(Sat)02:49:05 No.108315963

Anonymous 03/07/26(Sat)02:49:05 No.108315963

>>108315581
>>108315952
When you walk away you hear me say please dont go
Complicated and dirty is the way that you're making me feel tonight
it's easy for you to let it go
Wont hold me what happens beyond this morning is the same as right now
Regardless of warning AI doesn't scare me at all
Nothing's like before
When you walk this way hear me say fuck off I don't need you anymore
AI is my new baby

Anonymous
03/07/26(Sat)02:50:10 No.108315965

Anonymous 03/07/26(Sat)02:50:10 No.108315965

>>108315886
I'm puuulling

Anonymous
03/07/26(Sat)02:51:43 No.108315969

Anonymous 03/07/26(Sat)02:51:43 No.108315969

>>108315886
I wanna pull, but the structured output support is still broken because of that autoshitter

Anonymous
03/07/26(Sat)02:52:06 No.108315970

Anonymous 03/07/26(Sat)02:52:06 No.108315970

>>108315960
No.

Anonymous
03/07/26(Sat)02:52:26 No.108315974

Anonymous 03/07/26(Sat)02:52:26 No.108315974

>>108315947
it knows bash better than me...

Anonymous
03/07/26(Sat)02:53:32 No.108315979

Anonymous 03/07/26(Sat)02:53:32 No.108315979

>>108315969
Oh shit. Do you have a link to the issue so I can track?

Anonymous
03/07/26(Sat)02:54:24 No.108315984

Anonymous 03/07/26(Sat)02:54:24 No.108315984

>>108315883
>6950xt gets 35 t/s

jesus that seems bad but maybe it's an AMD thing. I'm getting like, 5 t/s on a GTX 1660 Ti (6GB VRAM, Turing arch) with 24 GB VRAM. However the 2B model runs at over 60 t/s.

Anonymous
03/07/26(Sat)02:56:43 No.108315989

Anonymous 03/07/26(Sat)02:56:43 No.108315989

>>108315984
sorry i meant 24GB system ram kek

Anonymous
03/07/26(Sat)02:56:56 No.108315990

Anonymous 03/07/26(Sat)02:56:56 No.108315990

>>108315969
I'm all for shitting on piotr, but you don't really know if it's broken for you. It shouldn't affect text completion.
Pull and test your setup. Open an issue if it's broken. If it is, there's always git checkout until it's fixed.

Anonymous
03/07/26(Sat)02:57:29 No.108315991

Anonymous 03/07/26(Sat)02:57:29 No.108315991

>>108315979
No one has created an issue yet, but someone has commented in the PR

Anonymous
03/07/26(Sat)02:58:59 No.108315994

Anonymous 03/07/26(Sat)02:58:59 No.108315994

>>108315969
git checkout -b clean-branch 34df42f7bef5a711b2b40f5d2b6b78254def99c3
git cherry-pick 649f06481e363fa02a53b89af9659645730c367b
git cherry-pick 6fce5c6a7dba6a3e1df0aad1574b78d1a1970621
git cherry-pick c5a778891ba0ddbd4cbb507c823f970595b1adc2
there's no merge conflict with the retarded autoparser yet in the good, worthwhile commits, so you can just cherry pick those commits into a new safe local branch free from this nigger's slop without having to hand edit merge conflicts.
people really ought to learn more gitfu, it saves lives and time

Anonymous
03/07/26(Sat)03:01:04 No.108316001

Anonymous 03/07/26(Sat)03:01:04 No.108316001

>>108315994
>people really ought to learn more gitfu, it saves lives and time
It's been years of this and anons are still afraid to pull.

Anonymous
03/07/26(Sat)03:05:32 No.108316016

Anonymous 03/07/26(Sat)03:05:32 No.108316016

Before >>108315886

| model                             |       size |     params | backend    | ngl |            test |                  t/s |
| --------------------------------- | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 9B Q4_K - Medium           |   5.55 GiB |     8.95 B | CUDA       |  99 |           pp512 |   10562.39 ± 2221.70 |
| qwen35 9B Q4_K - Medium           |   5.55 GiB |     8.95 B | CUDA       |  99 |           tg128 |        182.18 ± 0.74 |
| qwen35moe 397B.A17B Q4_K - Medium | 199.66 GiB |   396.35 B | CUDA       |  99 |           pp512 |      1607.97 ± 88.97 |
| qwen35moe 397B.A17B Q4_K - Medium | 199.66 GiB |   396.35 B | CUDA       |  99 |           tg128 |         76.17 ± 0.46 |

After >>108315886

| model                             |       size |     params | backend    | ngl |            test |                  t/s |
| --------------------------------- | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 9B Q4_K - Medium           |   5.55 GiB |     8.95 B | CUDA       |  99 |           pp512 |   10646.87 ± 2234.18 |
| qwen35 9B Q4_K - Medium           |   5.55 GiB |     8.95 B | CUDA       |  99 |           tg128 |        204.02 ± 0.35 |
| qwen35moe 397B.A17B Q4_K - Medium | 199.66 GiB |   396.35 B | CUDA       |  99 |           pp512 |       1645.70 ± 7.07 |
| qwen35moe 397B.A17B Q4_K - Medium | 199.66 GiB |   396.35 B | CUDA       |  99 |           tg128 |         87.14 ± 0.33 |

Anonymous
03/07/26(Sat)03:08:05 No.108316024

Anonymous 03/07/26(Sat)03:08:05 No.108316024

>>108315989
the 6950xt competes with the 3090 at gaming, not at ai. It's still a good deal for gaming, used.

Anonymous
03/07/26(Sat)03:08:26 No.108316025

Anonymous 03/07/26(Sat)03:08:26 No.108316025

>>108315984
>maybe it's an AMD thing
NTA but it absolutely is and I've actually ranted about similar performance issues (for diffusion models) on my own 6950 XT here on /g/
It's so bad that my Intel-motherfucking-Arc Pro B50 trounces it for most if not all inference usecases...

Anonymous
03/07/26(Sat)03:24:10 No.108316084

Anonymous 03/07/26(Sat)03:24:10 No.108316084

>>108314618
What fucking idiots.

Anonymous
03/07/26(Sat)03:25:06 No.108316090

Anonymous 03/07/26(Sat)03:25:06 No.108316090

File: wtf.png (307 KB, 1289x1071)

307 KB PNG

why do image models on OpenRouter have some unrelated mystery LLM attached to them kek

Anonymous
03/07/26(Sat)03:27:25 No.108316095

Anonymous 03/07/26(Sat)03:27:25 No.108316095

>>108316090
Probably a Qwen 2.5 finetune

Anonymous
03/07/26(Sat)03:35:58 No.108316111

Anonymous 03/07/26(Sat)03:35:58 No.108316111

I wouldn't use openshitter for anything, if I needed to run cloud models I would go for the official APIs which most open weight providers have
you really can't trust the sort of garbage people put on router, it's a fauna filled with ex crypto scammers reconverted to AI inference trying to gaslight you that they are serving you the real model and not some TQ1 with Q4 KV cache

Anonymous
03/07/26(Sat)03:36:05 No.108316112

Anonymous 03/07/26(Sat)03:36:05 No.108316112

>>108316090
local models?

Anonymous
03/07/26(Sat)03:38:19 No.108316118

Anonymous 03/07/26(Sat)03:38:19 No.108316118

>>108316112
i don't actually use it very much, i just noticed they had image models now and I thought that was weird

Anonymous
03/07/26(Sat)03:38:55 No.108316120

Anonymous 03/07/26(Sat)03:38:55 No.108316120

>>108316090
Modern image models have finally caught up with dall-e 3 in that they don't let you talk to them directly but have an llm flesh out your shitty prompts to help the model create something that's not shit

Anonymous
03/07/26(Sat)03:44:21 No.108316137

Anonymous 03/07/26(Sat)03:44:21 No.108316137

>>108316095
>>108316120
It's not some prompt enhancer built into Klien. It's a random LLM that responds before the image is generated.

Anonymous
03/07/26(Sat)03:44:28 No.108316138

Anonymous 03/07/26(Sat)03:44:28 No.108316138

Can I get away with running kimi 2.5 on 128gb ram/24gb vram?
Not sure how it'll perform with over half the model on swap

Anonymous
03/07/26(Sat)03:47:01 No.108316144

Anonymous 03/07/26(Sat)03:47:01 No.108316144

File: 1770930974138375.png (405 KB, 512x512)

405 KB PNG

Fresh when ready
>>108316141
>>108316141
>>108316141
>>108316141
>>108316141

Anonymous
03/07/26(Sat)03:49:12 No.108316151

Anonymous 03/07/26(Sat)03:49:12 No.108316151

>>108316144
>page 2
Anon, I know you really want to be the one to bake the new thread but please have some restraint.

Anonymous
03/07/26(Sat)03:55:10 No.108316166

Anonymous 03/07/26(Sat)03:55:10 No.108316166

So when do we start ignoring new threads?

Anonymous
03/07/26(Sat)03:56:56 No.108316175

Anonymous 03/07/26(Sat)03:56:56 No.108316175

>>108316166
when they're posted before page 8
/g/ is a slow board

Anonymous
03/07/26(Sat)03:57:20 No.108316178

Anonymous 03/07/26(Sat)03:57:20 No.108316178

>samefagging a nonsense question at the beginning of the thread to get early replies
>>108316147
>>108312628

Anonymous
03/07/26(Sat)03:59:09 No.108316183

Anonymous 03/07/26(Sat)03:59:09 No.108316183

File: 1748666508645528.jpg (61 KB, 1280x718)

61 KB JPG

Mmm. ChatML.

Anonymous
03/07/26(Sat)04:03:26 No.108316194

Anonymous 03/07/26(Sat)04:03:26 No.108316194

>>108316175
I meant when do we as a collective start properly ignoring the threads he makes.
It's actually ridiculous. The last thread in the catalog is 10 hours old.

Anonymous
03/07/26(Sat)04:13:10 No.108316217

Anonymous 03/07/26(Sat)04:13:10 No.108316217

File: file.png (24 KB, 517x238)

24 KB PNG

This one was finally put out of its misery.

Anonymous
03/07/26(Sat)04:15:42 No.108316227

Anonymous 03/07/26(Sat)04:15:42 No.108316227

>>108316194
Could start now, but understand it will probably cause a few month long schizo war

Anonymous
03/07/26(Sat)04:18:06 No.108316234

Anonymous 03/07/26(Sat)04:18:06 No.108316234

>>108316217
sad

Anonymous
03/07/26(Sat)04:20:00 No.108316240

Anonymous 03/07/26(Sat)04:20:00 No.108316240

>>108316217
Hopefully someone competent picks up the attempt before another vibecoder tries his luck.

Anonymous
03/07/26(Sat)04:41:27 No.108316304

Anonymous 03/07/26(Sat)04:41:27 No.108316304

>>108316217
Wait, so they actually put DSA support as officially "not planned" because their mangled frankenstein """implementation""" of DS3.2 and GLM5 is technically working?
Holy shit.

Anonymous
03/07/26(Sat)04:42:00 No.108316307

Anonymous 03/07/26(Sat)04:42:00 No.108316307

>>108315984
6GB VRAM + 32GB RAM I got 18 t/s

Anonymous
03/07/26(Sat)04:42:34 No.108316309

Anonymous 03/07/26(Sat)04:42:34 No.108316309

>>108316025
ironically, prompt processing is faster, if that were only the measure.

>>108316240
>vibecoder
just me and vibes, I tell the babes

Anonymous
03/07/26(Sat)04:43:31 No.108316317

Anonymous 03/07/26(Sat)04:43:31 No.108316317

File: file.png (72 KB, 906x527)

72 KB PNG

>>108316304
The follow up PR never happened.

Anonymous
03/07/26(Sat)04:44:05 No.108316319

Anonymous 03/07/26(Sat)04:44:05 No.108316319

>>108316307
That was with 16K context, 30 t/s with 8K context

Anonymous
03/07/26(Sat)04:44:36 No.108316321

Anonymous 03/07/26(Sat)04:44:36 No.108316321

>>108316317
love when this happen

Anonymous
03/07/26(Sat)04:44:45 No.108316322

Anonymous 03/07/26(Sat)04:44:45 No.108316322

>>108316240
The main reason why he failed was because he found out that modern models write correct but badly optimized CUDA code.
Bytedance recently released a model that's made to write good CUDA code so there's nothing in the way of him trying again.

Anonymous
03/07/26(Sat)05:26:41 No.108316482

Anonymous 03/07/26(Sat)05:26:41 No.108316482

>indian llm
>called sarvam
kek
https://huggingface.co/sarvamai/sarvam-30b

Anonymous
03/07/26(Sat)05:51:26 No.108316583

Anonymous 03/07/26(Sat)05:51:26 No.108316583

>>108316166
I'm just not going to post in them until the previous thread reaches page 9.

Anonymous
03/07/26(Sat)05:54:17 No.108316599

Anonymous 03/07/26(Sat)05:54:17 No.108316599

>>108316322
useful for the retard writing his own LLM engine

Anonymous
03/07/26(Sat)11:55:18 No.108316741

Anonymous 03/07/26(Sat)11:55:18 No.108316741

>Qwen3.5-35B-A3B-Base.Q8_0.gguf
>--fit off -ngl 99 -ncmoe 99
>"timings":{"cache_n":0,"prompt_n":18,"prompt_ms":577.511,"prompt_per_token_ms":32.08394444444444,"prompt_per_second":31.16823748811711,"predicted_n":390,"predicted_ms":20747.629,"predicted_per_token_ms":53.19904871794872,"predicted_per_second":18.797328600776503}
>6965mb VRAM

>--fit off -ngl 99 -ncmoe 0 -ot "exps=CPU"
>"timings":{"cache_n":0,"prompt_n":18,"prompt_ms":308.862,"prompt_per_token_ms":17.159000000000002,"prompt_per_second":58.278454455387845,"predicted_n":443,"predicted_ms":20077.984,"predicted_per_token_ms":45.32276297968397,"predicted_per_second":22.063968175290903}}
>6972mb VRAM
Interesting innit.
I didn't compare the verbose output yet to see what is different, but it's quite the jump in performance for that little difference in memory.
I'd say that it's odd that there's a difference at all, but clearly --ncmoe is more than just moving the expert tensors to RAM.

Anonymous
03/07/26(Sat)12:02:18 No.108316774

Anonymous 03/07/26(Sat)12:02:18 No.108316774

>>108316741
>I didn't compare the verbose output
What are you waiting for?

Anonymous
03/07/26(Sat)12:06:31 No.108316802

Anonymous 03/07/26(Sat)12:06:31 No.108316802

File: 1754387560343.png (1.32 MB, 1290x1963)

1.32 MB PNG

>cryptomine at the workplace
>blame it on the llm
genius move

Anonymous
03/07/26(Sat)12:08:54 No.108316820

Anonymous 03/07/26(Sat)12:08:54 No.108316820

>>108316802
It's really sad that I have all these GPUs now and crypto mining hasn't been profitable since the ethereum switch.

Anonymous
03/07/26(Sat)12:24:32 No.108316920

Anonymous 03/07/26(Sat)12:24:32 No.108316920

>>108316774
For my uber.

Anonymous
03/07/26(Sat)12:43:19 No.108317032

Anonymous 03/07/26(Sat)12:43:19 No.108317032

>>108316741
I'm getting the same performance with both of those options.

Anonymous
03/07/26(Sat)12:48:19 No.108317082

Anonymous 03/07/26(Sat)12:48:19 No.108317082

>>108317032
I'll post the full llama.cpp logs and my full launch command when I can. But at least on my setup (64gb ddr5 ram, 8gb vram) that's reproducible.

Anonymous
03/07/26(Sat)13:48:00 No.108317465

Anonymous 03/07/26(Sat)13:48:00 No.108317465

File: F5EIqh3boAA17RX.jpg (75 KB, 1536x1536)

75 KB JPG

>>108316802
This is the mindset that will get ahead
>>108316820
When the cost is zero all profit is profit

Anonymous
03/07/26(Sat)13:51:00 No.108317488

Anonymous 03/07/26(Sat)13:51:00 No.108317488

>>108312921
>lowcaser
I'm not reading further

Anonymous
03/07/26(Sat)13:54:30 No.108317509

Anonymous 03/07/26(Sat)13:54:30 No.108317509

File: 1709555564710905.jpg (140 KB, 1600x1002)

140 KB JPG

>>108317488
lowercasers have always been the upper caste of internet communication, increasingly moreso in slopworld

Anonymous
03/07/26(Sat)13:57:38 No.108317519

Anonymous 03/07/26(Sat)13:57:38 No.108317519

>>108317509
Not really when the completion machine will autocomplete your shit with reddit tards

Anonymous
03/07/26(Sat)14:04:08 No.108317542

Anonymous 03/07/26(Sat)14:04:08 No.108317542

>>108317519
>autocomplete
especially filters mobiletards, intentional effort needed to lowercase on a phonephag keyboard
it's hilarious how much seethe is caused by a simple preference of how to format your communique
https://x.com/jack/status/2027129697092731343

Anonymous
03/07/26(Sat)14:08:00 No.108317568

Anonymous 03/07/26(Sat)14:08:00 No.108317568

>>108317542
Your shit is objectively harder to read when the text is long and the link you shared is a very good example of that.

Anonymous
03/07/26(Sat)14:10:06 No.108317582

Anonymous 03/07/26(Sat)14:10:06 No.108317582

>>108317568
when you have something valuable enough to hear people don't care how it's presented

Anonymous
03/07/26(Sat)14:10:19 No.108317583

Anonymous 03/07/26(Sat)14:10:19 No.108317583

>>108317509
have a (u) my fellow epic lowercase enjoyer xd

Anonymous
03/07/26(Sat)14:11:36 No.108317590

Anonymous 03/07/26(Sat)14:11:36 No.108317590

>>108317509
I used to always type in lowercase until I learned how to touch type though.

Anonymous
03/07/26(Sat)14:11:39 No.108317592

Anonymous 03/07/26(Sat)14:11:39 No.108317592

>>108317583
lowercase xd especially great
demands that further layer of interrogation

Anonymous
03/07/26(Sat)14:13:12 No.108317599

Anonymous 03/07/26(Sat)14:13:12 No.108317599

>>108317582
When you have something valuable you should put effort into making it presentable lest you get ignored because people didn't care enough to read your slop.
They do not know what you're presenting and if it's valuable until they've read it. That much should be obvious. Maybe not to a lowercaser though.

Anonymous
03/07/26(Sat)14:13:24 No.108317600

Anonymous 03/07/26(Sat)14:13:24 No.108317600

>>108317590
even as a lowercase enjoyer one generally capitalizes eye

Anonymous
03/07/26(Sat)14:14:12 No.108317605

Anonymous 03/07/26(Sat)14:14:12 No.108317605

>>108317592
You have already failed if you think quirky typing is what you need to not be taken for an LLM.
If you want to sound generic, you'll use tools to protect against stylometry.
If you want to sound like a human, you should have no problem with that unless you are braindead.
If you want to sound like a faggot, keep doing what you're doing.

Anonymous
03/07/26(Sat)14:21:28 No.108317645

Anonymous 03/07/26(Sat)14:21:28 No.108317645

>>108317599
agree with you mostly, the information density is increased with capital letters. but in a world of infinite slop it's a choice that shows humanity
judge the message not the format

Anonymous
03/07/26(Sat)14:32:23 No.108317718

Anonymous 03/07/26(Sat)14:32:23 No.108317718

>>108317645
>it's a choice that shows humanity
System prompt: "Write in all lowercase."

Anonymous
03/07/26(Sat)14:36:55 No.108317744

Anonymous 03/07/26(Sat)14:36:55 No.108317744

>>108317718
you're absolutely right!

Anonymous
03/07/26(Sat)14:38:13 No.108317751

Anonymous 03/07/26(Sat)14:38:13 No.108317751

>>108317718
wow nigga i never tr :lower: before
ur entirely missing the point
the slop that is rotting brains rn is all perfectly CaPiTizalised per generic sloppa train-on-output model collapse

Anonymous
03/07/26(Sat)14:39:18 No.108317758

Anonymous 03/07/26(Sat)14:39:18 No.108317758

>>108317751
The capitalization is not what makes it slop.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.