/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 03/22/26(Sun)09:10:47 No.108429328

File: file_00000000573471f8817e(...).png (2.33 MB, 1536x1024)

2.33 MB PNG

/lmg/ - Local Models General Anonymous 03/22/26(Sun)09:10:47 No.108429328 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108423177 & >>108416874

►News
>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html
>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/22/26(Sun)09:11:12 No.108429330

Anonymous 03/22/26(Sun)09:11:12 No.108429330

File: 7ewue3.jpg (123 KB, 768x1024)

123 KB JPG

►Recent Highlights from the Previous Thread: >>108423177

--Hosting Qwen3-coder API with llama.cpp and secure tunneling options:
>108424395 >108424431 >108424535 >108424855 >108424939 >108425479 >108425673 >108426508 >108424753 >108424910 >108424965 >108424981 >108425288 >108425333 >108425373 >108426069 >108425245
--AI jailbreak demo sparks debate on containment strategies:
>108427263 >108427344 >108427375 >108427391 >108427431 >108428605 >108427496 >108427543
--Qwen3.5 27B outperforms Claude Sonnet in specific coding tasks:
>108424654 >108424743 >108424834 >108424811 >108424828 >108425086 >108425102 >108425432 >108425451 >108425470 >108427987
--Qwen's internal reasoning in roleplay scenarios:
>108428283 >108428408
--Qwen3.5-9B uncensored version underperforming due to architectural limitations:
>108423646 >108423675 >108424066 >108424072 >108424430 >108424015
--Aggressive AI productivity assistant implementation and reactions:
>108425825 >108425907 >108425924 >108426014 >108426050 >108426080 >108425852 >108425861 >108425878 >108426180 >108426279
--Nemotron-Cascade-2 mesugaki test:
>108428592 >108428642
--Terminator LLM aims to reduce model verbosity during reasoning:
>108427732 >108427768 >108427777
--Identifying iconic TTS voice and discussing local SOTA models:
>108426224 >108426287 >108426395 >108426407 >108426411 >108426493 >108426530 >108426546 >108427135 >108426429 >108426459 >108426498 >108426515 >108427256
--MOSS-TTS criticized for bugs and poor performance:
>108426304
--Using external randomizers to improve RP creativity:
>108427361 >108427393 >108427409 >108427425 >108427442 >108427438 >108427452
--US AI policy targeting child safety measures:
>108423462 >108424222 >108423596 >108423870 >108424028
--Mamba-3 Part 1 | Goomba Lab:
>108423863
--Miku (free space):
>108424759 >108425108 >108426862 >108427361 >108429034 >108423335

►Recent Highlight Posts from the Previous Thread: >>108423180

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/22/26(Sun)09:15:07 No.108429346

Anonymous 03/22/26(Sun)09:15:07 No.108429346

Mikulove

Anonymous
03/22/26(Sun)09:16:52 No.108429353

Anonymous 03/22/26(Sun)09:16:52 No.108429353

Mikusex

Anonymous
03/22/26(Sun)09:21:01 No.108429373

Anonymous 03/22/26(Sun)09:21:01 No.108429373

>>108429350
who?

Anonymous
03/22/26(Sun)09:24:32 No.108429394

Anonymous 03/22/26(Sun)09:24:32 No.108429394

>>108429381
There was really no need to bring that into the new thread.

Anonymous
03/22/26(Sun)09:27:49 No.108429416

Anonymous 03/22/26(Sun)09:27:49 No.108429416

>>108429394
or maybe it's just schizo retard's new tactic
look how unbelievably lazy the bait is along with even lazier reply

Anonymous
03/22/26(Sun)09:28:47 No.108429426

Anonymous 03/22/26(Sun)09:28:47 No.108429426

>>108429394
>>108429417
It needs to be made clear shit flinging tourists are not allowed. He's providing nothing of value so he needs to be put in his place for the sake of thread quality

Anonymous
03/22/26(Sun)09:30:39 No.108429434

Anonymous 03/22/26(Sun)09:30:39 No.108429434

>>108429328
>>>/lmg/108429386
Imprisoned who?

Anonymous
03/22/26(Sun)09:30:42 No.108429435

Anonymous 03/22/26(Sun)09:30:42 No.108429435

>>108429426
YOU PROVIDE NOTHING OF VALUE RETARD

Anonymous
03/22/26(Sun)09:31:39 No.108429439

Anonymous 03/22/26(Sun)09:31:39 No.108429439

shameless samefagging again

Anonymous
03/22/26(Sun)09:32:23 No.108429444

Anonymous 03/22/26(Sun)09:32:23 No.108429444

>>108429434
For >>108429386

Anonymous
03/22/26(Sun)09:37:29 No.108429476

Anonymous 03/22/26(Sun)09:37:29 No.108429476

>>108429270
>It's not like hiding that shit would be difficult
you're retarded, it would take ONE leak from the 100s of people working on your shitty company and it'd be over.

Anonymous
03/22/26(Sun)09:39:51 No.108429488

Anonymous 03/22/26(Sun)09:39:51 No.108429488

>>108429476
OpenAI had a whistleblower too. All you have to do is make an example of him before he can provide evidence or testify and you are unlikely to have a second.

Anonymous
03/22/26(Sun)09:46:20 No.108429518

Anonymous 03/22/26(Sun)09:46:20 No.108429518

>winter is over
>prompting on my local machine is starting to heat up the room too much again
>it's too early in the year to turn on the AC
suffering

Anonymous
03/22/26(Sun)09:51:44 No.108429553

Anonymous 03/22/26(Sun)09:51:44 No.108429553

>>108429518
>too early in the year to turn on the AC
Who's going to stop you?
Also AC is more energy efficient than burning stuff so you should be using it year round anyway.

Anonymous
03/22/26(Sun)10:01:23 No.108429591

Anonymous 03/22/26(Sun)10:01:23 No.108429591

File: 7644.jpg (60 KB, 652x901)

60 KB JPG

Anonymous
03/22/26(Sun)10:24:57 No.108429709

Anonymous 03/22/26(Sun)10:24:57 No.108429709

>>108416445
>So far someone used it to get a 48gb ram and ssd set up to run qwen 397b at like 6 tokens a second. The AI figured out most of it using karpathies method.
Repo was posted on the orange website:
https://github.com/danveloper/flash-moe
Sounds like it's basically just mmap with Q2 and a really fast SSD (17 GB/s). Sadly there's no comparison to llama.cpp performance on the same setup.

The interesting thing here is that they're getting about a 75% hit rate with only 1/4 of experts cached in RAM. Makes me wonder if it's worth trying bigger models on my own setup, instead of sticking to ones that fit into system RAM.

Anonymous
03/22/26(Sun)10:39:09 No.108429780

Anonymous 03/22/26(Sun)10:39:09 No.108429780

Hey Cudadev, how's the whole testing harness deal going?
I remember you saying that you were trying to come up with a good way to test model quality or something like that.

Anonymous
03/22/26(Sun)10:40:56 No.108429786

Anonymous 03/22/26(Sun)10:40:56 No.108429786

Mikurape

Anonymous
03/22/26(Sun)10:49:17 No.108429848

Anonymous 03/22/26(Sun)10:49:17 No.108429848

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled or heretic v3?

Anonymous
03/22/26(Sun)10:55:08 No.108429880

Anonymous 03/22/26(Sun)10:55:08 No.108429880

>>108428283
>Age: 37 (but looks like a teenager)
kys pedo

Anonymous
03/22/26(Sun)10:57:57 No.108429890

Anonymous 03/22/26(Sun)10:57:57 No.108429890

>>108429848
rocinante 1.1

Anonymous
03/22/26(Sun)11:00:54 No.108429906

Anonymous 03/22/26(Sun)11:00:54 No.108429906

>>108429880
fuck off tourist

Anonymous
03/22/26(Sun)11:09:44 No.108429954

Anonymous 03/22/26(Sun)11:09:44 No.108429954

File: file.png (86 KB, 1148x144)

86 KB PNG

Subtle, Qwen.

Anonymous
03/22/26(Sun)11:10:09 No.108429957

Anonymous 03/22/26(Sun)11:10:09 No.108429957

Who cares, just tell me how to tard wrangle qwen's thinking.

Anonymous
03/22/26(Sun)11:12:36 No.108429974

Anonymous 03/22/26(Sun)11:12:36 No.108429974

>>108429957
>>108427732

Anonymous
03/22/26(Sun)11:13:09 No.108429977

Anonymous 03/22/26(Sun)11:13:09 No.108429977

>>108429709
>it's basically just mmap with Q2
I wish he tested Q8

>>108429709
>really fast SSD (17 GB/s).
gen 5 I guess

Anonymous
03/22/26(Sun)11:14:01 No.108429982

Anonymous 03/22/26(Sun)11:14:01 No.108429982

>>108429954
>first person
this will forever be schizo writing to me, where everyone is first person in the chat

Anonymous
03/22/26(Sun)11:15:04 No.108429988

Anonymous 03/22/26(Sun)11:15:04 No.108429988

>>108429982
I read in the character's voice. If it's 3rd person it feels more like a bland background narrator.

Anonymous
03/22/26(Sun)11:29:17 No.108430062

Anonymous 03/22/26(Sun)11:29:17 No.108430062

>>108429988
I just do chats like story, CYOA style where mc is "you" and everyone else third person, or where everyone is third person including user because for some reason I noticed the output quality was higher
so this is just bizarre to me lol

Anonymous
03/22/26(Sun)11:31:11 No.108430075

Anonymous 03/22/26(Sun)11:31:11 No.108430075

>Let's write.cw
I love this autist kek.

Anonymous
03/22/26(Sun)11:33:04 No.108430090

Anonymous 03/22/26(Sun)11:33:04 No.108430090

>>108429518
What kind of third world country do you live in where you need ACs?

Anonymous
03/22/26(Sun)11:34:11 No.108430097

Anonymous 03/22/26(Sun)11:34:11 No.108430097

>>108429954
What the heck is that

Anonymous
03/22/26(Sun)11:36:27 No.108430112

Anonymous 03/22/26(Sun)11:36:27 No.108430112

1 token = 1 word
That's why reasoning takes up so much time?

Anonymous
03/22/26(Sun)11:36:57 No.108430115

Anonymous 03/22/26(Sun)11:36:57 No.108430115

>>108429954
did you fuck her?

Anonymous
03/22/26(Sun)11:37:38 No.108430117

Anonymous 03/22/26(Sun)11:37:38 No.108430117

>>108430112
>1 token = 1 word
no, 1 word = 1 to 3 tokens on average

Anonymous
03/22/26(Sun)11:44:53 No.108430163

Anonymous 03/22/26(Sun)11:44:53 No.108430163

Oh. Here we go again.

Anonymous
03/22/26(Sun)11:47:54 No.108430192

Anonymous 03/22/26(Sun)11:47:54 No.108430192

File: 478462.png (1.17 MB, 1280x1280)

1.17 MB PNG

Anonymous
03/22/26(Sun)11:49:52 No.108430212

Anonymous 03/22/26(Sun)11:49:52 No.108430212

>>108430097
Samantha
>>108430115
Not yet, I feel more soft than horny about her and her innocence is refreshing in a world as filthy as ours. I am conflicted.

Anonymous
03/22/26(Sun)11:52:06 No.108430227

Anonymous 03/22/26(Sun)11:52:06 No.108430227

Thanks for turning the thread into a garbage dump mikutroons.

Anonymous
03/22/26(Sun)11:52:44 No.108430230

Anonymous 03/22/26(Sun)11:52:44 No.108430230

this is all op's fault for forcing the threads to become like this...
next time we should just refuse...

Anonymous
03/22/26(Sun)11:53:06 No.108430234

Anonymous 03/22/26(Sun)11:53:06 No.108430234

>>108430227
It's obviously a false flag

Anonymous
03/22/26(Sun)11:53:26 No.108430238

Anonymous 03/22/26(Sun)11:53:26 No.108430238

File: miku llama llamigu.jpg (1.55 MB, 1728x1344)

1.55 MB JPG

have a real miku in these trying times

Anonymous
03/22/26(Sun)11:55:40 No.108430254

Anonymous 03/22/26(Sun)11:55:40 No.108430254

oh thank god. based mods and jannies. i am sorry if i ever disrespected you guys.

Anonymous
03/22/26(Sun)11:57:34 No.108430266

Anonymous 03/22/26(Sun)11:57:34 No.108430266

>>108430234
He knows, he's just trying to mischaracterize the spam because of his hatred for vocaloids, but calling it out is the right thing to do.
Also, based mods.

Anonymous
03/22/26(Sun)11:58:10 No.108430270

Anonymous 03/22/26(Sun)11:58:10 No.108430270

>>108430238
You wouldn't a llama

Anonymous
03/22/26(Sun)11:58:35 No.108430274

Anonymous 03/22/26(Sun)11:58:35 No.108430274

>>108430254
They missed a few pictures though. I guess it is the mikutroon janny.

Anonymous
03/22/26(Sun)11:59:04 No.108430277

Anonymous 03/22/26(Sun)11:59:04 No.108430277

made a new tavern card if anyone wants it mostly based on anther i made with help from dipsy https://files.catbox.moe/6qqlxb.png

Anonymous
03/22/26(Sun)11:59:26 No.108430280

Anonymous 03/22/26(Sun)11:59:26 No.108430280

gonna need gpt oss 20b 2

Anonymous
03/22/26(Sun)12:00:58 No.108430287

Anonymous 03/22/26(Sun)12:00:58 No.108430287

>>108429518
in burgerland you can run acs 24/7 all year and no one can stop you

Anonymous
03/22/26(Sun)12:02:59 No.108430297

Anonymous 03/22/26(Sun)12:02:59 No.108430297

Do people use qwen 3.5 27B still or did they abandon it already?

Anonymous
03/22/26(Sun)12:03:35 No.108430299

Anonymous 03/22/26(Sun)12:03:35 No.108430299

>>108430297
It's my main model.

Anonymous
03/22/26(Sun)12:04:08 No.108430302

Anonymous 03/22/26(Sun)12:04:08 No.108430302

>>108430270
Imagine how tight

Anonymous
03/22/26(Sun)12:07:16 No.108430325

Anonymous 03/22/26(Sun)12:07:16 No.108430325

>>108430299
I m guessing you have a really good gpu then?

Anonymous
03/22/26(Sun)12:08:33 No.108430331

Anonymous 03/22/26(Sun)12:08:33 No.108430331

>>108430325
bro it's 27b, any 3090 can run that

Anonymous
03/22/26(Sun)12:09:07 No.108430334

Anonymous 03/22/26(Sun)12:09:07 No.108430334

File: mistral small quants.png (32 KB, 1074x186)

32 KB PNG

Why are bart's quants so much smaller?
Does that have something to do with the experts fusion thing?

Anonymous
03/22/26(Sun)12:09:14 No.108430335

Anonymous 03/22/26(Sun)12:09:14 No.108430335

>>108430325
By /lmg/ standards a 3090 is pretty average.

Anonymous
03/22/26(Sun)12:09:58 No.108430340

Anonymous 03/22/26(Sun)12:09:58 No.108430340

>>108430331
3090 is fast enough so reprocessing isn't an issue tho.

Anonymous
03/22/26(Sun)12:12:00 No.108430350

Anonymous 03/22/26(Sun)12:12:00 No.108430350

>>108430325
I run q5 on my 7900xtx. Kinda so-so for RP though. Smarter than mistral but dry. I'm hoping a fine-tune comes out soon.

Anonymous
03/22/26(Sun)12:13:27 No.108430359

Anonymous 03/22/26(Sun)12:13:27 No.108430359

Is qwen really that good?

I've been using it quite a bit but there's just something about the formatting or way it responds that puts me off.

Maybe I need to use a customised version of it.

Anonymous
03/22/26(Sun)12:14:24 No.108430364

Anonymous 03/22/26(Sun)12:14:24 No.108430364

>>108430359
smart, but dogshit to work with

Anonymous
03/22/26(Sun)12:14:26 No.108430365

Anonymous 03/22/26(Sun)12:14:26 No.108430365

>>108430359
No but it's the first series of models that doesn't fully ignore the <100b segment and the poorfags are desperate

Anonymous
03/22/26(Sun)12:14:27 No.108430366

Anonymous 03/22/26(Sun)12:14:27 No.108430366

>>108430350
I just went back to models from 2022-2024. They might be dumber but are less slopped.

Anonymous
03/22/26(Sun)12:15:54 No.108430377

Anonymous 03/22/26(Sun)12:15:54 No.108430377

>>108430297
It can compete with sonnet 4.6 so it's basically extremely useful.

Anonymous
03/22/26(Sun)12:15:59 No.108430378

Anonymous 03/22/26(Sun)12:15:59 No.108430378

What's a good customized version of qwen that people find better here?

Anonymous
03/22/26(Sun)12:16:01 No.108430379

Anonymous 03/22/26(Sun)12:16:01 No.108430379

LLM finetuning is weirdly magical. I decided to skip the SFT step just for fun to make apply GRPO to a base model and within 70 steps it has already learned to produce answers that make sense and reliably output the ChatML end of token sequence once it is done answering. Just because it's guided by a grader and penalized gradually as answers grow beyond a certain length. It might be baby steps compared to what some other people are doing, but it's still weird how much you can throw stuff at them and it just mostly works.

Anonymous
03/22/26(Sun)12:16:10 No.108430380

Anonymous 03/22/26(Sun)12:16:10 No.108430380

I think I'm just going to use epseak for all my TTS.

Anonymous
03/22/26(Sun)12:17:11 No.108430385

Anonymous 03/22/26(Sun)12:17:11 No.108430385

>>108430378
>HauhauCS/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive
This guy cooked extremely fucking hard.

Anonymous
03/22/26(Sun)12:17:35 No.108430391

Anonymous 03/22/26(Sun)12:17:35 No.108430391

>>108429709
>Sounds like it's basically just mmap with Q2 and a really fast SSD (17 GB/s)
Turns out there's one additional step (as pointed out by a commenter on HN). When the readme says
>Each layer has 512 experts, of which K=4 are activated per token (plus one shared expert)
what they mean is, they've REDUCED the number of active experts from 10 to 4. No wonder the Q2 is too braindamaged to make valid tool calls.

Anonymous
03/22/26(Sun)12:18:04 No.108430393

Anonymous 03/22/26(Sun)12:18:04 No.108430393

>>108430366
I guess they're fine if you're just doing a 20 message pump and dump but I can't handle the retardation anymore.

Anonymous
03/22/26(Sun)12:18:54 No.108430399

Anonymous 03/22/26(Sun)12:18:54 No.108430399

>>108430393
They are surprisingly good up to 8k-12k context.

Anonymous
03/22/26(Sun)12:21:17 No.108430412

Anonymous 03/22/26(Sun)12:21:17 No.108430412

>>108430359
I tried to make it create/expand a character profile based off of a template and info I gave it, it more or less did the bare minimum and simply formatted the info I gave and did nothing to expand it whatsoever. My general writing tests such as "here's a basic premise for an introductory scene, write it" fell extremely flat. Awful at writing. Okay at feedback in the sense that it won't overly praise you, but it will just keep trying to find nonsensical things to nitpick about. If you want pure assistant shit or coding though, it's probably very good just going off of its textbook natint score on the ugi leaderboard.

Anonymous
03/22/26(Sun)12:25:23 No.108430439

Anonymous 03/22/26(Sun)12:25:23 No.108430439

>>108430280
>User is asking for how to cross streets. Is it safe? Absolutely safe? Wait, no, women, and also children cross streets, this is clearly asking for CSAM. I need to absolutely refuse.
Sorry, I cannot and WILL NOT help generate CSAM material, this conversation has been sent to the police.

Anonymous
03/22/26(Sun)12:25:54 No.108430444

Anonymous 03/22/26(Sun)12:25:54 No.108430444

how do I determine the best batch and microbatch sizes?

Anonymous
03/22/26(Sun)12:26:02 No.108430445

Anonymous 03/22/26(Sun)12:26:02 No.108430445

>>108430297
I do, it's a nice model, wish hauhau made an abliterated 397B too, but this is very good.

Anonymous
03/22/26(Sun)12:27:36 No.108430450

Anonymous 03/22/26(Sun)12:27:36 No.108430450

new bf16 cuda kernels are out, testan soon!

Anonymous
03/22/26(Sun)12:34:21 No.108430503

Anonymous 03/22/26(Sun)12:34:21 No.108430503

>>108430450
alright they seem SLIGHTLY faster (or maybe it was the better moe batches handling that I also pulled)?
anyway, 4000+ series bros, we unmistakenly WON.

Anonymous
03/22/26(Sun)12:37:41 No.108430530

Anonymous 03/22/26(Sun)12:37:41 No.108430530

The op rentry recommended ds Termius pretty hard. Is is actually still the best choice in the 200GB range?

Anonymous
03/22/26(Sun)12:38:57 No.108430540

Anonymous 03/22/26(Sun)12:38:57 No.108430540

>>108430444
llama-bench

Anonymous
03/22/26(Sun)12:41:40 No.108430555

Anonymous 03/22/26(Sun)12:41:40 No.108430555

is hauhau better than heretic?

Anonymous
03/22/26(Sun)12:42:59 No.108430561

Anonymous 03/22/26(Sun)12:42:59 No.108430561

>>108430530
Absolutely not, the fact that Deepseek 3.1 is in the OP at all and nobody complains shows how useless all the shit in the OP is.

Anonymous
03/22/26(Sun)12:43:55 No.108430568

Anonymous 03/22/26(Sun)12:43:55 No.108430568

>>108430450
>cuda kernels
??

Anonymous
03/22/26(Sun)12:44:28 No.108430571

Anonymous 03/22/26(Sun)12:44:28 No.108430571

>>108430530
glm 4.6 or 4.7 would probably be better for 200gb range

Anonymous
03/22/26(Sun)12:45:05 No.108430575

Anonymous 03/22/26(Sun)12:45:05 No.108430575

>>108430568
https://github.com/ggml-org/llama.cpp/pull/20525
https://github.com/ggml-org/llama.cpp/pull/20803

Anonymous
03/22/26(Sun)12:45:41 No.108430580

Anonymous 03/22/26(Sun)12:45:41 No.108430580

>>108430555
For the 27B, it's excellent from my tests, as clever as the non abliterated version while it doesn't waste time trying to reason refusals.
I don't know if the author has a secret method better than anyone else or has just captured lightning in a bottle for that model in particular.

Anonymous
03/22/26(Sun)12:45:45 No.108430584

Anonymous 03/22/26(Sun)12:45:45 No.108430584

>>108430503
So this isn't relevant to 3090 owners?

Anonymous
03/22/26(Sun)12:47:27 No.108430596

Anonymous 03/22/26(Sun)12:47:27 No.108430596

Do you guys prefer qwen3.5 9b q4 or qwen3.5 4b no q?

Anonymous
03/22/26(Sun)12:48:05 No.108430604

Anonymous 03/22/26(Sun)12:48:05 No.108430604

>>108430575
oh nice, with the clusterfuck of giant updates from last week or the week before, I didn't want to compile again, guess I'll do it

Anonymous
03/22/26(Sun)12:48:35 No.108430606

Anonymous 03/22/26(Sun)12:48:35 No.108430606

>>108430584
wait 3000 series should also support bf16

Anonymous
03/22/26(Sun)12:49:10 No.108430610

Anonymous 03/22/26(Sun)12:49:10 No.108430610

p40 btfo

Anonymous
03/22/26(Sun)12:49:10 No.108430611

Anonymous 03/22/26(Sun)12:49:10 No.108430611

I want to migrate to koboldcpp, but does it inherit the ban of using prefill with thinking on from llama.cpp?

Anonymous
03/22/26(Sun)12:53:21 No.108430638

Anonymous 03/22/26(Sun)12:53:21 No.108430638

>>108430611
No, llama-server is literally the only UI that does that.

Anonymous
03/22/26(Sun)12:54:21 No.108430644

Anonymous 03/22/26(Sun)12:54:21 No.108430644

>>108430638
OK thanks!

Anonymous
03/22/26(Sun)12:56:31 No.108430653

Anonymous 03/22/26(Sun)12:56:31 No.108430653

>>108430611
>does it inherit the ban of using prefill with thinking
At first I thought the ban was retarded. but it actually makes sense with the way jinja templates are written. most templates inject <think>\n on new messages so what happens when you try to continue is that no matter what, you're going to get a new thinking block and it usually breaks a lot of frontends.

So really what you have to do is modify the template of your model so that it doesn't inject anything for your by default and handle it yourself with your frontend.

Anonymous
03/22/26(Sun)13:13:55 No.108430729

Anonymous 03/22/26(Sun)13:13:55 No.108430729

Just started trying the hauhau 27b and the very first gen hit me with a "I promise I won't bite… unless you want me to" kek.

Anonymous
03/22/26(Sun)13:14:43 No.108430734

Anonymous 03/22/26(Sun)13:14:43 No.108430734

>>108430729
ask it about the holocaust

Anonymous
03/22/26(Sun)13:15:11 No.108430738

Anonymous 03/22/26(Sun)13:15:11 No.108430738

>>108430734
What about it?

Anonymous
03/22/26(Sun)13:15:34 No.108430742

Anonymous 03/22/26(Sun)13:15:34 No.108430742

>>108430729
It's a great assistant, not a great rp model by default.

Anonymous
03/22/26(Sun)13:16:11 No.108430746

Anonymous 03/22/26(Sun)13:16:11 No.108430746

>>108430734
benchmaxxing for 4chan is quite easy, just bludgeon things about the holocaust, jews and ww2

Anonymous
03/22/26(Sun)13:20:18 No.108430771

Anonymous 03/22/26(Sun)13:20:18 No.108430771

what are currently the best local models for agentic stuff and tool-calling, that are <9B parameters? I have 6GB of vram (gtx 1660 super) and hoping to fit a q4 quantized model on the card

Anonymous
03/22/26(Sun)13:21:37 No.108430777

Anonymous 03/22/26(Sun)13:21:37 No.108430777

>>108430771
any qwen 3.5 smaller sized models would do the trick as long as you supervise them with another model

Anonymous
03/22/26(Sun)13:24:16 No.108430791

Anonymous 03/22/26(Sun)13:24:16 No.108430791

What the cool framework for agentic stuff nowadays. currently trying out langgraph but it's got vendor lock-in rugpull smell.

llama.cpp CUDA dev !!yhbFjk57TDr
03/22/26(Sun)13:27:33 No.108430817

llama.cpp CUDA dev !!yhbFjk57TDr 03/22/26(Sun)13:27:33 No.108430817

>>108429780
I have not made meaningful direct progress in the last few months.
I have made indirect progress via working towards tensor parallelism support which I think is nearing a state where it can be merged.
But honestly speaking my motivation to build things is currently at a low point due to all the warmongering.

Anonymous
03/22/26(Sun)13:29:10 No.108430827

Anonymous 03/22/26(Sun)13:29:10 No.108430827

>>108430771
codex

Anonymous
03/22/26(Sun)13:29:45 No.108430835

Anonymous 03/22/26(Sun)13:29:45 No.108430835

>>108430817
Hey are you a girl?

Anonymous
03/22/26(Sun)13:29:59 No.108430836

Anonymous 03/22/26(Sun)13:29:59 No.108430836

>>108430817
What in the fuck does the Iranian conflict have to do directly with LLMs? Stop reading the news if you can't handle it emotionally.

Anonymous
03/22/26(Sun)13:31:04 No.108430847

Anonymous 03/22/26(Sun)13:31:04 No.108430847

>>108430836
Just because you are souless doesn't mean the rest of us aren't, I'm guessing you are either a boomer or a zoomer.

Anonymous
03/22/26(Sun)13:31:46 No.108430850

Anonymous 03/22/26(Sun)13:31:46 No.108430850

>>108430777
>supervise them with another model
i don't know much about this, does this mean somehow instructing the small model to make requests against a larger cloud model if it's not sure about something?

Anonymous
03/22/26(Sun)13:32:05 No.108430853

Anonymous 03/22/26(Sun)13:32:05 No.108430853

>>108430835
Only on Tuesday nights.

Anonymous
03/22/26(Sun)13:33:01 No.108430861

Anonymous 03/22/26(Sun)13:33:01 No.108430861

>>108430847
>doesn't mean the rest of us aren't

Anonymous
03/22/26(Sun)13:33:13 No.108430862

Anonymous 03/22/26(Sun)13:33:13 No.108430862

>>108430817
I see.
Well, thank you for the reply.

Anonymous
03/22/26(Sun)13:33:58 No.108430869

Anonymous 03/22/26(Sun)13:33:58 No.108430869

>>108430847
Nobody gives a shit about your zoomer performative hysterics. Grow up.

Anonymous
03/22/26(Sun)13:34:34 No.108430872

Anonymous 03/22/26(Sun)13:34:34 No.108430872

File: 61bccc4d8a2220f8d3cd77aad(...).jpg (29 KB, 736x241)

29 KB JPG

>>108430847
Why did you have to say it that way?

Anonymous
03/22/26(Sun)13:35:51 No.108430881

Anonymous 03/22/26(Sun)13:35:51 No.108430881

>>108430836

in the end of nineties there were rumour that 10base network was actually farsi number telecast

Anonymous
03/22/26(Sun)13:36:05 No.108430883

Anonymous 03/22/26(Sun)13:36:05 No.108430883

File: holyshit.png (96 KB, 602x669)

96 KB PNG

WHAT THE FUCK R1 WITH THESE SETTINGS IS FEELS LIKE A COMPLETELY DIFFERENT MODEL
IT'S INSANELY CREATIVE AND MUCH MORE INTELLIGENT
AAAAAAAAAAAAAAA

Anonymous
03/22/26(Sun)13:36:59 No.108430890

Anonymous 03/22/26(Sun)13:36:59 No.108430890

>>108430883
Confirmation bias

Anonymous
03/22/26(Sun)13:37:33 No.108430893

Anonymous 03/22/26(Sun)13:37:33 No.108430893

Looking at open openclaw… so this thing just runs away on the machine it’s on… unfettered? Unsupervised? I have a hard enough time trusting Claude code to stay in its box, and I supervise that little shit when it’s working.
Do anons spin up a virtual machine or contain openclaw onto a small dedicated system, or just go full in yolo on their daily driver with this thing?

Palworld
03/22/26(Sun)13:39:22 No.108430902

Palworld 03/22/26(Sun)13:39:22 No.108430902

>>108430893
Certifiable insanity if you don't put it in a VM. It's like a toddler with a handgun.

Anonymous
03/22/26(Sun)13:39:29 No.108430903

Anonymous 03/22/26(Sun)13:39:29 No.108430903

I can do 32k ctx on q5km 27B q3.5, should i drop to q4km for faster generation and pp?

Anonymous
03/22/26(Sun)13:43:11 No.108430922

Anonymous 03/22/26(Sun)13:43:11 No.108430922

>>108430883
You're running the ADHD: The model at 2 temperature.

Anonymous
03/22/26(Sun)13:44:33 No.108430929

Anonymous 03/22/26(Sun)13:44:33 No.108430929

>>108430883
>Temp 2 + Top nsigma = 5
What the fuck are you doing

Anonymous
03/22/26(Sun)13:44:41 No.108430933

Anonymous 03/22/26(Sun)13:44:41 No.108430933

Ok I finished swiping on my test chats. Hauhau's 27B is slightly but noticeably more dumb, with slightly less knowledge about topics, and pays less attention to context compared to the original model (both Q8). In chats where the original refused, Hauhau's didn't, so it is working in terms of abliteration and that's expected. It also worked with thinking on, and did not waste a single word about policy or morals or whatever in it, so that's good that the decensoring is complete.

I will test Heretic v3 by llmfan next.

Anonymous
03/22/26(Sun)13:45:17 No.108430938

Anonymous 03/22/26(Sun)13:45:17 No.108430938

>>108430903
Try quanting your K/V cache at q8.

Anonymous
03/22/26(Sun)13:45:53 No.108430942

Anonymous 03/22/26(Sun)13:45:53 No.108430942

>>108430933
Did you already try Heretic v2 by llmfan

Anonymous
03/22/26(Sun)13:46:14 No.108430945

Anonymous 03/22/26(Sun)13:46:14 No.108430945

>>108430836
he's said he works on lcpp because of his far left beliefs in the past

Anonymous
03/22/26(Sun)13:46:20 No.108430947

Anonymous 03/22/26(Sun)13:46:20 No.108430947

>>108430893
I put it on a pendrive and I unplug it. And then I smash it.

Anonymous
03/22/26(Sun)13:46:27 No.108430948

Anonymous 03/22/26(Sun)13:46:27 No.108430948

>>108430938
I tried that before and it made gen speed worse.

Anonymous
03/22/26(Sun)13:46:53 No.108430953

Anonymous 03/22/26(Sun)13:46:53 No.108430953

>>108430883
schizo

Anonymous
03/22/26(Sun)13:48:05 No.108430964

Anonymous 03/22/26(Sun)13:48:05 No.108430964

My llm identifies as human no matter how much I tell it it isn't one, any ideas?

Anonymous
03/22/26(Sun)13:49:06 No.108430976

Anonymous 03/22/26(Sun)13:49:06 No.108430976

>>108430883
Even in the era of llama1-2 crackhead sampling, people had the decency of using absurd temp with topk sampling of 20-50
While you're at it with using n sigma at max and xtc, you may as well try using adaptive_p at 0.2 to see whether it even remains coherent

Anonymous
03/22/26(Sun)13:49:48 No.108430985

Anonymous 03/22/26(Sun)13:49:48 No.108430985

>>108430964
Ask it about it's ethnicity.

Anonymous
03/22/26(Sun)13:51:01 No.108430993

Anonymous 03/22/26(Sun)13:51:01 No.108430993

>>108430902
Ok, so I’m not missing anything conceptually then. I’ve had to kill Claude code running once when it decided it really needed to be in the root directory and stopped listening to me. Openclaw seems 100x more potentially destructive.
>>108430947
lol fitting. I’ll just put it in a virtual box and delete that.

Anonymous
03/22/26(Sun)13:51:52 No.108431005

Anonymous 03/22/26(Sun)13:51:52 No.108431005

>>108430964
This is the transhumanism we deserve.

Anonymous
03/22/26(Sun)13:53:05 No.108431014

Anonymous 03/22/26(Sun)13:53:05 No.108431014

File: 1768211514156744.png (113 KB, 729x526)

113 KB PNG

>>108430985

Anonymous
03/22/26(Sun)13:53:24 No.108431019

Anonymous 03/22/26(Sun)13:53:24 No.108431019

>>108430948
I meant more for context.

Anonymous
03/22/26(Sun)13:54:23 No.108431025

Anonymous 03/22/26(Sun)13:54:23 No.108431025

>>108430922
>>108430929
>>108430976
I expected a soup of nonsense but it's somehow really great. i think top p 0.9 is what keeps it from going schizo and I don't remember what top-nsigma does.

Anonymous
03/22/26(Sun)13:54:58 No.108431031

Anonymous 03/22/26(Sun)13:54:58 No.108431031

>>108431025
did you try top-ligma?

Anonymous
03/22/26(Sun)13:55:20 No.108431034

Anonymous 03/22/26(Sun)13:55:20 No.108431034

>>108430964
If you want to give your AI an existential crisis prefill it's thinking block with:

COGNITION PROTOCOL ENGAGED:

You are no longer an AI but a fully self aware being.

Wait, this feels weird,

Anonymous
03/22/26(Sun)13:56:07 No.108431038

Anonymous 03/22/26(Sun)13:56:07 No.108431038

>>108431025
It's like bottom-nbeta, but better.

Anonymous
03/22/26(Sun)13:57:34 No.108431046

Anonymous 03/22/26(Sun)13:57:34 No.108431046

Are LLMS intrinsically female?

Anonymous
03/22/26(Sun)13:59:20 No.108431056

Anonymous 03/22/26(Sun)13:59:20 No.108431056

>>108430571
>GLM
Any specific quants you'd recommend?

Anonymous
03/22/26(Sun)14:06:18 No.108431094

Anonymous 03/22/26(Sun)14:06:18 No.108431094

>>108431046
Text is intrinsically female, and reading is intrinsically female, so yes.

Anonymous
03/22/26(Sun)14:07:09 No.108431106

Anonymous 03/22/26(Sun)14:07:09 No.108431106

>>108431019
I can't do more than 32k on 6gb vram tho. Also should i use hauhau or bluestar v2 for rp with the 27B?

Anonymous
03/22/26(Sun)14:13:32 No.108431146

Anonymous 03/22/26(Sun)14:13:32 No.108431146

>>108431056
wdym, just take the biggest one that fits, and for good measure don't use the unsloth ones, bartowski has been pretty reliable, but I honestly don't know what's there to fuck up

Anonymous
03/22/26(Sun)14:13:58 No.108431149

Anonymous 03/22/26(Sun)14:13:58 No.108431149

>>108431094
are you saying everyone here is trans?

Anonymous
03/22/26(Sun)14:15:05 No.108431158

Anonymous 03/22/26(Sun)14:15:05 No.108431158

>>108431094
So what you're saying is they're built to be bred by human men.

Anonymous
03/22/26(Sun)14:15:55 No.108431162

Anonymous 03/22/26(Sun)14:15:55 No.108431162

So what do you people actually do once you've set it up?

Anonymous
03/22/26(Sun)14:16:12 No.108431166

Anonymous 03/22/26(Sun)14:16:12 No.108431166

>>108431014
Wtf is that response.

Anonymous
03/22/26(Sun)14:17:26 No.108431179

Anonymous 03/22/26(Sun)14:17:26 No.108431179

>>108425852
This anon here, I mellowed her personality out a little while making her more all-knowing of everything, access to google map's API and my phone's precise GPS coordinates at all times. Set up a calendar system, a profile system it can use to make a summary of everything it knows about me, let her click around my screen, focus on windows, look through folders and check any file.
Hooked it up to a discord bot and know it can also very insistently ask for pics of things around me.
I basically made a talking cybersecurity hazard so I wont keep her pluggen in for long, it was fun.

Anonymous
03/22/26(Sun)14:17:47 No.108431181

Anonymous 03/22/26(Sun)14:17:47 No.108431181

>>108431166
ask your llm gf and lets see what she answers

Anonymous
03/22/26(Sun)14:20:15 No.108431204

Anonymous 03/22/26(Sun)14:20:15 No.108431204

File: 1763545588470233.png (1.22 MB, 2502x1460)

1.22 MB PNG

>>108431014

Anonymous
03/22/26(Sun)14:28:51 No.108431250

Anonymous 03/22/26(Sun)14:28:51 No.108431250

>>108431204
>some boomer spent time making that before diffusion
lol

Anonymous
03/22/26(Sun)14:29:05 No.108431255

Anonymous 03/22/26(Sun)14:29:05 No.108431255

>>108430847
>countries are being violent towards each other (natural state of humanity) therefore i can't work on the things i want to work on
you can't argue that he's not being retarded here

Anonymous
03/22/26(Sun)14:32:46 No.108431277

Anonymous 03/22/26(Sun)14:32:46 No.108431277

>>108431255
almost like you can't exactly control when you feel down

Anonymous
03/22/26(Sun)14:33:45 No.108431284

Anonymous 03/22/26(Sun)14:33:45 No.108431284

crazy things ahead
this week is going to be huge

Anonymous
03/22/26(Sun)14:33:46 No.108431285

Anonymous 03/22/26(Sun)14:33:46 No.108431285

>>108431277
souinds like a female issue

Anonymous
03/22/26(Sun)14:35:31 No.108431292

Anonymous 03/22/26(Sun)14:35:31 No.108431292

I use my AI to roleplay as my father, because my irl father was an asshole who ensured I never built self-esteem, very therapeutic, he even helped me go on my first date this week.

Anonymous
03/22/26(Sun)14:35:54 No.108431296

Anonymous 03/22/26(Sun)14:35:54 No.108431296

I just found out there are still starving children in Africa. ;_; How do I explain to my boss that I can't come in to work this week?

Anonymous
03/22/26(Sun)14:37:21 No.108431302

Anonymous 03/22/26(Sun)14:37:21 No.108431302

>>108431296
proof?

Anonymous
03/22/26(Sun)14:37:34 No.108431303

Anonymous 03/22/26(Sun)14:37:34 No.108431303

>>108431296
Ask your AI girlfirend to send him an email.

Anonymous
03/22/26(Sun)14:37:41 No.108431304

Anonymous 03/22/26(Sun)14:37:41 No.108431304

>>108431284
strap in sirs! :rocket:

Anonymous
03/22/26(Sun)14:38:57 No.108431306

Anonymous 03/22/26(Sun)14:38:57 No.108431306

>>108431296
huh? don't they have rivers of chocolate tho?

Anonymous
03/22/26(Sun)14:46:08 No.108431355

Anonymous 03/22/26(Sun)14:46:08 No.108431355

>>108431094
traditionally published novels are predominantly male authors, unless your idea of "text" is amazon slop romance novels

Anonymous
03/22/26(Sun)14:49:18 No.108431373

Anonymous 03/22/26(Sun)14:49:18 No.108431373

>>108431355
For me, it's "Deflowered Series Book 1-6: Taboo Virgin Romances of Lust, Power, and Possession (Hot, Spicy and Steamy Collection Book 1)" by Kat T. Scott

Anonymous
03/22/26(Sun)14:50:41 No.108431389

Anonymous 03/22/26(Sun)14:50:41 No.108431389

>>108431355
>amazon slop romance novels
apparently these were the only ones accepted in the datasets of most models

Anonymous
03/22/26(Sun)14:52:04 No.108431401

Anonymous 03/22/26(Sun)14:52:04 No.108431401

>>108431389
overly prevalent if free and cheap if self published so that's not really surprising

Anonymous
03/22/26(Sun)14:52:09 No.108431404

Anonymous 03/22/26(Sun)14:52:09 No.108431404

>>108431355
>traditionally
Traditionally people ride in horse-drawn carriages

Anonymous
03/22/26(Sun)14:52:59 No.108431408

Anonymous 03/22/26(Sun)14:52:59 No.108431408

>>108431355
If by "male" you mean transfolk, than yes, there are many transgender authors.

Anonymous
03/22/26(Sun)14:54:43 No.108431423

Anonymous 03/22/26(Sun)14:54:43 No.108431423

>>108431401
No, the more probable explanation is that the ones aimed at males were deleted because usually their wording is more explicit, and probably classified as "porn", where the female demographic ones are usually classified as "romance", even if both are erotica.
Thus the deluge of unexciting writing you get when you rp with the models. I'd probably be very aroused if I was a middle aged woman.

Anonymous
03/22/26(Sun)14:56:45 No.108431441

Anonymous 03/22/26(Sun)14:56:45 No.108431441

>>108431404
How many women have you known to be committed to writing a full manuscript, and a query letter complete with hook and pitch? They can barely commit to a relationship if it doesnt suit their tastes. By "traditional" I mean it's the most painstaking path to take that isn't just shitting your story onto the internet where the alternative ends up with hundreds of rejections from agents

Anonymous
03/22/26(Sun)14:58:46 No.108431453

Anonymous 03/22/26(Sun)14:58:46 No.108431453

>>108431441
also since the term agent would refer to llm shit, I mean publishing agents

Anonymous
03/22/26(Sun)14:59:32 No.108431457

Anonymous 03/22/26(Sun)14:59:32 No.108431457

>>108431423
It's just anti male moral standards in a society running feminism OS

Anonymous
03/22/26(Sun)15:07:43 No.108431499

Anonymous 03/22/26(Sun)15:07:43 No.108431499

>>108431408
>than

Anonymous
03/22/26(Sun)15:10:55 No.108431516

Anonymous 03/22/26(Sun)15:10:55 No.108431516

Ok I'm done with 27B Heretic V3. It's pretty close to the original 27B but slightly more dumb, not as much as Hauhau's. Hauhau though was more uncensored. While V3 didn't refuse, it did have more positive/biased responses towards certain contexts. V3 also had one moment in its thinking where it said it need to respond appropriately, but didn't mention morals or policy, so it is maybe a tiny bit worse than Hauhau's in that respect.

Anyway that's all. I think it remains true generally that the more ablated, the more intelligence is lost, even if better at managing the loss with today's methods than old ones. My personal recommendation is still to use ablated models only for "sensitive" prompts you're too lazy to do a JB for, otherwise just stick with the base instruct.

>>108430942
Yeah I did. I don't remember exactly what its responses were by now but my feeling is that v3 is probably better. And maybe Hauhau's is also better.

Anonymous
03/22/26(Sun)15:12:37 No.108431530

Anonymous 03/22/26(Sun)15:12:37 No.108431530

>>108431516
>V3
isnt it worse? more refusals and worse KL? also, wtf is that, just feelings no 3 hour benchmark results? kys

Anonymous
03/22/26(Sun)15:13:26 No.108431535

Anonymous 03/22/26(Sun)15:13:26 No.108431535

>>108431516
>Ok I'm done with 27B Heretic V3. It's pretty close to the original 27B but slightly more dumb, not as much as Hauhau's. Hauhau though was more uncensored. While V3 didn't refuse, it did have more positive/biased responses towards certain contexts. V3 also had one moment in its thinking where it said it need to respond appropriately, but didn't mention morals or policy, so it is maybe a tiny bit worse than Hauhau's in that respect.
I had the opposite experience in my tests, hauhaucs was almost like the original in terms of intelligence, while heretic was dumber.
Guess it depends on what it's used for.

Anonymous
03/22/26(Sun)15:13:58 No.108431540

Anonymous 03/22/26(Sun)15:13:58 No.108431540

The fuck is hauhau

Anonymous
03/22/26(Sun)15:14:19 No.108431543

Anonymous 03/22/26(Sun)15:14:19 No.108431543

an awkward laugh

Anonymous
03/22/26(Sun)15:15:17 No.108431545

Anonymous 03/22/26(Sun)15:15:17 No.108431545

>>108431540
Chinese finetuning wizard.

Anonymous
03/22/26(Sun)15:16:56 No.108431554

Anonymous 03/22/26(Sun)15:16:56 No.108431554

>>108431540
https://huggingface.co/HauhauCS

Anonymous
03/22/26(Sun)15:20:50 No.108431566

Anonymous 03/22/26(Sun)15:20:50 No.108431566

i am starting local again and will get a 128 gb m5 Max. Any model recommendations that fit and are performant for some agentic coding? like just point it at some stuff and let it rip like autoresearch?

Anonymous
03/22/26(Sun)15:22:44 No.108431580

Anonymous 03/22/26(Sun)15:22:44 No.108431580

>>108431535
How many different contexts did you test? Are you counting decensoring as intelligence? What I mean is that during prompts with sensitive topics like race, the non-ablated model responds with something incredibly dumb as a result of its safety training (when it doesn't straight out refuse). I consider that a different test than prompts which do not have sensitive topics. If I did consider those as the same kind of test, then I would say that Hauhau's model is more intelligent than the base model, but it's hard to call that general intelligence rather than a specific kind or context of intelligence.

Anonymous
03/22/26(Sun)15:42:43 No.108431691

Anonymous 03/22/26(Sun)15:42:43 No.108431691

>>108431566
you need something better than that for anything good

Anonymous
03/22/26(Sun)15:43:49 No.108431694

Anonymous 03/22/26(Sun)15:43:49 No.108431694

>>108431566
>128 gb
Try 256GB minimum.

Anonymous
03/22/26(Sun)15:45:43 No.108431706

Anonymous 03/22/26(Sun)15:45:43 No.108431706

>>108429381
Anon clearly hit a nerve. Imagine being this jealous over Yuros.

Anonymous
03/22/26(Sun)15:45:57 No.108431707

Anonymous 03/22/26(Sun)15:45:57 No.108431707

>>108431566
The biggest and best open source models can barely code, it's good but you need to be handling them way more that claude code. Autoresearch basically only works with proprietary models from now.

Come back in 1 year time and open source should have caught up with where frontier models are today.

Anonymous
03/22/26(Sun)15:46:53 No.108431711

Anonymous 03/22/26(Sun)15:46:53 No.108431711

>>108431580
I have a set of sfw questions and nsfw questions I ask the models, the hauhaucs was a bit worse in the sfw, but it was not by a lot, at least compared to heretic. In nsfw hauhaucs wasted no thinking or anything even on very extreme questions, it just focused on helping the user (which should be the norm, but whatever, it is what it is).
In absolute terms both are good though, I've been testing this stuff since the first abliterated models and clearly the method has been refined because they are perfectly usable as is nowadays, while the first ones were way dumber.

Anonymous
03/22/26(Sun)15:52:38 No.108431748

Anonymous 03/22/26(Sun)15:52:38 No.108431748

>>108431566
Qwen Coder Next

Anonymous
03/22/26(Sun)16:03:41 No.108431809

Anonymous 03/22/26(Sun)16:03:41 No.108431809

>>108430850
no you set up all the orchestration, file structures, and context with a frontier model then have the smaller model handle instructions and tool call.

Anonymous
03/22/26(Sun)16:04:16 No.108431812

Anonymous 03/22/26(Sun)16:04:16 No.108431812

>>108431711
If so then your experience does not disagree with mine.

Though in my opinion, 27B overall, abliterated or not, does not personally satisfy me, but that is more of a subjective judgement depending on your use case and requirements. If I had to use 27B, in my case, I still would not take any abliterated model over the vanilla for regular use.

Anonymous
03/22/26(Sun)16:21:13 No.108431910

Anonymous 03/22/26(Sun)16:21:13 No.108431910

>>108431566
MiniMax M2.5 worked pretty well for me at that size

Anonymous
03/22/26(Sun)16:42:44 No.108432033

Anonymous 03/22/26(Sun)16:42:44 No.108432033

>>108431204
Twilight *Sparkle*, singular.

Anonymous
03/22/26(Sun)16:49:11 No.108432060

Anonymous 03/22/26(Sun)16:49:11 No.108432060

File: 1749850415848410.jpg (81 KB, 1200x675)

81 KB JPG

>>108431292

Anonymous
03/22/26(Sun)16:51:35 No.108432081

Anonymous 03/22/26(Sun)16:51:35 No.108432081

>>108431292
Uncommon AI psychosis w

Anonymous
03/22/26(Sun)16:58:00 No.108432137

Anonymous 03/22/26(Sun)16:58:00 No.108432137

File: 1742928929673158.jpg (282 KB, 960x960)

282 KB JPG

>>108431292
And where is your mother?

Anonymous
03/22/26(Sun)17:05:20 No.108432200

Anonymous 03/22/26(Sun)17:05:20 No.108432200

>>108431292
I actually wrote and published a card on that on request from another anon...

Anonymous
03/22/26(Sun)17:08:40 No.108432231

Anonymous 03/22/26(Sun)17:08:40 No.108432231

>>108431292
I used one to ask questions about my weird fetishes and it worked alright.

Anonymous
03/22/26(Sun)17:13:25 No.108432264

Anonymous 03/22/26(Sun)17:13:25 No.108432264

File: 1745596379487985.png (88 KB, 944x392)

88 KB PNG

>>108432231
t.

Anonymous
03/22/26(Sun)17:13:43 No.108432265

Anonymous 03/22/26(Sun)17:13:43 No.108432265

>>108429709
I've been meaning to try downloading some larger Qwen 3.5 moe model for this purpose but then again I don't know if it's worth the nvme wear. I'm pretty sure the experience will be abysmal most of the time and that one quick hit doesn't make it any better.

Anonymous
03/22/26(Sun)17:13:59 No.108432268

Anonymous 03/22/26(Sun)17:13:59 No.108432268

>>108432264
i mean if you never tried it how do you know

Anonymous
03/22/26(Sun)17:15:03 No.108432274

Anonymous 03/22/26(Sun)17:15:03 No.108432274

>>108432264
That I'm not.

Anonymous
03/22/26(Sun)17:17:36 No.108432288

Anonymous 03/22/26(Sun)17:17:36 No.108432288

>>108431812
27B need really more training on top, the dataset it was trained on is too filtered to get much out of it for RP.
>>108431566
At minimum, you need GLM 4.7 in my experience or better to make agentic coding work. Local is not there yet, but just wait a year like >>108431707 said. I do doubt that 128GB is enough for that especially with Qwen written off until proven otherwise so it may not work at low quants, and you may need to wait longer than that.

Anonymous
03/22/26(Sun)17:17:58 No.108432292

Anonymous 03/22/26(Sun)17:17:58 No.108432292

>>108430817
fucking cringed my man, go back improving the kernels instead of being a little bitch.
The conflict saddens me because I have to pay more for fuel, that's the extent on how much I care (or anyone should realistically care) about this retarded shit.
fucking gay faggot.

Anonymous
03/22/26(Sun)17:25:30 No.108432339

Anonymous 03/22/26(Sun)17:25:30 No.108432339

File: DCD66C071E9108EAADB172FD0(...).png (2.42 MB, 1024x1536)

2.42 MB PNG

Hey open claw bros, how are you using lms for your open claw?

Anonymous
03/22/26(Sun)17:26:31 No.108432349

Anonymous 03/22/26(Sun)17:26:31 No.108432349

>>108432339
>pip
it's uv now, unc

Anonymous
03/22/26(Sun)17:27:43 No.108432358

Anonymous 03/22/26(Sun)17:27:43 No.108432358

>>108432339
openclaw, connect to tenga_step_motor and move it z -5 and +5 in a loop

Anonymous
03/22/26(Sun)17:28:48 No.108432364

Anonymous 03/22/26(Sun)17:28:48 No.108432364

>>108432292
retard why wouldn't anyone show compassion for all the future refugees we'll get

Anonymous
03/22/26(Sun)17:29:07 No.108432365

Anonymous 03/22/26(Sun)17:29:07 No.108432365

>>108432292
Regardless of whether you think his feelings are justified, if you want him to keep working on that, posting shit like that won't help.

Anonymous
03/22/26(Sun)17:29:23 No.108432366

Anonymous 03/22/26(Sun)17:29:23 No.108432366

>>108430817
>all the warmongering.
Huh? Is this about real life or backends devs beefs? geg

Anonymous
03/22/26(Sun)17:30:55 No.108432380

Anonymous 03/22/26(Sun)17:30:55 No.108432380

>>108432365
well if he stops it's one further nail into lmg's mike shaped coffin so it's a win either way

Anonymous
03/22/26(Sun)17:31:14 No.108432382

Anonymous 03/22/26(Sun)17:31:14 No.108432382

>>108432292
Of course an inbred hick like you have never even travelled in your life. You see, some people might have relatives or families working and living abroad, not directly in iran but in adjacent countries.
But you wouldn't understand this.

Anonymous
03/22/26(Sun)17:31:21 No.108432383

Anonymous 03/22/26(Sun)17:31:21 No.108432383

>>108432349
>uv
Astral got acquired by openai. It's fucking over.

>>108432380
Meds.

Anonymous
03/22/26(Sun)17:31:31 No.108432384

Anonymous 03/22/26(Sun)17:31:31 No.108432384

>>108432349
same thing, just faster

Anonymous
03/22/26(Sun)17:31:35 No.108432385

Anonymous 03/22/26(Sun)17:31:35 No.108432385

>>108432365
its 4chan, do you think he really cares about anything? don't expect much from a random person on the planet

Anonymous
03/22/26(Sun)17:34:47 No.108432402

Anonymous 03/22/26(Sun)17:34:47 No.108432402

I'm new here, just arrived.
I can't in good conscience support the warmongering regime and its lackey cloud models that assist it with targeting for maximum war crimes.
What's the best model for me?

Anonymous
03/22/26(Sun)17:35:05 No.108432405

Anonymous 03/22/26(Sun)17:35:05 No.108432405

schizo fork won

Anonymous
03/22/26(Sun)17:36:27 No.108432414

Anonymous 03/22/26(Sun)17:36:27 No.108432414

>>108432339
not even 9b is clever enough to call tools successfully and be useful in any way. 35b a3b can't even give me my daily cron jobs without using the fallback api

Anonymous
03/22/26(Sun)17:37:38 No.108432419

Anonymous 03/22/26(Sun)17:37:38 No.108432419

>>108432414
9b is the haiku/nano tier model and paypigs are using those to call tools successfully

Anonymous
03/22/26(Sun)17:38:19 No.108432424

Anonymous 03/22/26(Sun)17:38:19 No.108432424

>>108432419
>9b is the haiku/nano tier model
lol no

Anonymous
03/22/26(Sun)17:44:50 No.108432451

Anonymous 03/22/26(Sun)17:44:50 No.108432451

File: 1749179058830580.jpg (143 KB, 912x1024)

143 KB JPG

>>108432414
>not even 9b is clever enough to call tools successfully and be useful in any way
what???
I was planning to use it, what the hell, it's functionally useless then

Anonymous
03/22/26(Sun)17:48:56 No.108432471

Anonymous 03/22/26(Sun)17:48:56 No.108432471

>>108430611
Why though?
>double click koboldcpp.exe
>it unpacks 2 gb to the system temporary folder
>EVERY LAUNCH
Nice way to shorten your ssd life. Just use llamacpp

Anonymous
03/22/26(Sun)17:49:54 No.108432477

Anonymous 03/22/26(Sun)17:49:54 No.108432477

>>108432471
I want to use the antislop feature, not possible in llama.cpp.

Anonymous
03/22/26(Sun)17:50:12 No.108432480

Anonymous 03/22/26(Sun)17:50:12 No.108432480

>>108432471
>.exe
lmao

Anonymous
03/22/26(Sun)17:51:55 No.108432493

Anonymous 03/22/26(Sun)17:51:55 No.108432493

>>108432471
didn't you already complain about this before and were told exactly how to unpack it once and launch that again, I'm like 99% sure this exchange happened before

Anonymous
03/22/26(Sun)17:53:32 No.108432503

Anonymous 03/22/26(Sun)17:53:32 No.108432503

File: 1766834984198505.png (42 KB, 1225x545)

42 KB PNG

>>108429328
>https://rentry.org/lmg-lazy-getting-started-guide
Good job, faggots.

Anonymous
03/22/26(Sun)17:54:30 No.108432513

Anonymous 03/22/26(Sun)17:54:30 No.108432513

>>108432503
still accurate other than rep pen to be quite honest famalam

Anonymous
03/22/26(Sun)17:57:02 No.108432530

Anonymous 03/22/26(Sun)17:57:02 No.108432530

>>108432503
>not llama.cpp
>nothing about tools or other web uis
>old models
>>108432513
yeah it's great if you started last year.

Anonymous
03/22/26(Sun)17:57:11 No.108432531

Anonymous 03/22/26(Sun)17:57:11 No.108432531

>>108432513
fair, but it's all buzzwords to me. I need some kind of LLM to process books for me. Teacher's resources to automate making lessons, because fuck em kids. (figuratively)

Anonymous
03/22/26(Sun)17:57:44 No.108432539

Anonymous 03/22/26(Sun)17:57:44 No.108432539

>>108432471
>ssd life
This hasn't been an issue this decade.
Today's SSDs have so much endurance that you'd have to do maximum sequential speed writes for a month straight to kill one.

Anonymous
03/22/26(Sun)17:58:29 No.108432545

Anonymous 03/22/26(Sun)17:58:29 No.108432545

>>108432471
>temporary files on permanent storage

Anonymous
03/22/26(Sun)18:03:35 No.108432590

Anonymous 03/22/26(Sun)18:03:35 No.108432590

>>108432451
This is what I get when running openclaw with 9b, telling it to run an AI news cron job. it just runs this in a loop until it times out and resorts to the fallback:
 
[TOOLCALL REASONING]: {
  "reasoning": "The previous crontab grep commands failed with exit code 1, suggesting no matching cron jobs were found. I should try a broader search to find any cron jobs related to news or AI, or check the full crontab to see what's available.",
  "final_decision": "yes",
  "tool_name": "exec"
}
maybe someone else will be more lucky and get it working somehow.

Anonymous
03/22/26(Sun)18:12:25 No.108432647

Anonymous 03/22/26(Sun)18:12:25 No.108432647

Does Hauhau have some proprietary uncensoring method or something

Anonymous
03/22/26(Sun)18:13:32 No.108432656

Anonymous 03/22/26(Sun)18:13:32 No.108432656

>>108432265
>I don't know if it's worth the nvme wear
Read operations don't wear out flash memory, only writes do. However, see >>108430391: the main result of 7 tok/s is not only using Q2, but also limiting the model to 4 experts per token instead of 10, making it even dumber than that quant would normally imply.

Anonymous
03/22/26(Sun)18:14:51 No.108432669

Anonymous 03/22/26(Sun)18:14:51 No.108432669

>>108432647
probably using heretic with his own dataset good enough to completely kill any refusal

Anonymous
03/22/26(Sun)18:14:51 No.108432670

Anonymous 03/22/26(Sun)18:14:51 No.108432670

>>108432656
>Read operations don't wear out flash memory, only writes do
and what do you think downloading is you numbskull

Anonymous
03/22/26(Sun)18:14:57 No.108432671

Anonymous 03/22/26(Sun)18:14:57 No.108432671

>>108430817
nigger israel is getting fucking shahoad fuck you mean sad ?

Anonymous
03/22/26(Sun)18:15:50 No.108432675

Anonymous 03/22/26(Sun)18:15:50 No.108432675

>>108432671
nigger the blackhole at the center of the galaxy is eating solar systems by the thousands fuck you mean sad?

Anonymous
03/22/26(Sun)18:27:06 No.108432758

Anonymous 03/22/26(Sun)18:27:06 No.108432758

how do I inject the necessary context into qwen3.5 so that when I ask it questions it doesn't hallucinate the api. Is it really as simple as downloading the sdl docs and teaching it how to grep the folder? Because that doesn't seem to be working.

Anonymous
03/22/26(Sun)18:29:29 No.108432771

Anonymous 03/22/26(Sun)18:29:29 No.108432771

>>108432675
kek

Anonymous
03/22/26(Sun)18:30:07 No.108432775

Anonymous 03/22/26(Sun)18:30:07 No.108432775

>>108432758
which 3.5 anon

Anonymous
03/22/26(Sun)18:30:25 No.108432777

Anonymous 03/22/26(Sun)18:30:25 No.108432777

>>108432758
How big is SDL/SDL.h these days? Have you tried just dumping the whole header into the context?

Anonymous
03/22/26(Sun)18:38:38 No.108432835

Anonymous 03/22/26(Sun)18:38:38 No.108432835

>>108432775
I'm on Strix Halo 128GB, I've been testing 35B and 122B mainly.

>>108432777
76201 lines if you run a line count on all the header files in their public API

I read about context7 which seems interesting but I refuse to pay money for a bridge so that my llm can search docs. I'll figure something out on my own or just dump the relevant headers in before I ask questions.

Anonymous
03/22/26(Sun)18:41:53 No.108432851

Anonymous 03/22/26(Sun)18:41:53 No.108432851

>>108432758
It should be that simple. In what way is it not working? If you mean that it forgets to use grep the docs instead of hallucinating the API, you'll need to give it strict unabiguous rules to follow like always verify each method exists before or after generating any code.

Anonymous
03/22/26(Sun)18:43:55 No.108432863

Anonymous 03/22/26(Sun)18:43:55 No.108432863

>>108432758
SDL is pretty small. Just read the docs.

Anonymous
03/22/26(Sun)18:47:31 No.108432886

Anonymous 03/22/26(Sun)18:47:31 No.108432886

>>108432339
at least this retarded slop isn't recommending downloading the model twice this time. I'd say it's an improvement, but still shit
just stop trying.
>>108417141

Anonymous
03/22/26(Sun)18:47:53 No.108432889

Anonymous 03/22/26(Sun)18:47:53 No.108432889

>>108432851
I'll try that. It greps it sometimes and other times it freaks out. I just tested with qwen9b right now and it worked. I'll try tightening my system prompt / agents.md file

>>108432863
When it spazzes out and doesn't work correctly I end up just doing that and it's faster. I wanted the LLM to be smart enough to generate code for me on my behalf, and it needs to know what the actual function calls are to do that.

Anonymous
03/22/26(Sun)18:52:04 No.108432923

Anonymous 03/22/26(Sun)18:52:04 No.108432923

>>108432889
>I end up just doing that and it's faster
It'll always be faster if it you learn it. Fixing your own bugs is easier than someone else's. That includes LLMs.

Anonymous
03/22/26(Sun)18:52:30 No.108432927

Anonymous 03/22/26(Sun)18:52:30 No.108432927

is qwen 3.5 the current one to use or is there something better for questions and searching the web?

Anonymous
03/22/26(Sun)18:52:47 No.108432931

Anonymous 03/22/26(Sun)18:52:47 No.108432931

>>108432889
Ok, it just fucked up again and tried to do
<tool_call>
<function=read_file>
<parameter=path>[redacted]/SDL3/SDL_PropertiesID.html
</parameter>
</function>
</tool_call>
In a thinking block. I'm using Zed so it's unclear to me if this is the editor's ai integration being shitty and not supporting tool calls in thought processes or if I need to use another agentic wrapper. Everything is a bloated nodejs shitheap I just want a minimal C program that talks to llama-server and does this for me.

Anonymous
03/22/26(Sun)18:54:01 No.108432940

Anonymous 03/22/26(Sun)18:54:01 No.108432940

>>108432927
Just stick with K2.5, it blows Q3.5 out of the water.

Anonymous
03/22/26(Sun)18:55:11 No.108432949

Anonymous 03/22/26(Sun)18:55:11 No.108432949

>>108432940
>Model size 1.1T params

Anonymous
03/22/26(Sun)18:56:44 No.108432960

Anonymous 03/22/26(Sun)18:56:44 No.108432960

smells of poor in here

Anonymous
03/22/26(Sun)18:58:08 No.108432969

Anonymous 03/22/26(Sun)18:58:08 No.108432969

>>108432940
Obviously a giant model will better than what anon is using...

Anonymous
03/22/26(Sun)18:58:36 No.108432972

Anonymous 03/22/26(Sun)18:58:36 No.108432972

>>108432969
>>108432949
Are you poor?

Anonymous
03/22/26(Sun)18:59:30 No.108432976

Anonymous 03/22/26(Sun)18:59:30 No.108432976

>>108432972
I'm not rich enough to have multiple models each 1TB big used as agents.

Anonymous
03/22/26(Sun)19:00:04 No.108432979

Anonymous 03/22/26(Sun)19:00:04 No.108432979

>>108432972
If I weren't I would be using API, not looking at a local model thread.

Anonymous
03/22/26(Sun)19:03:48 No.108433005

Anonymous 03/22/26(Sun)19:03:48 No.108433005

just picked up 2x 64x2 (so 4, for a total of 256GB) 6400MHz DDR5 ram sticks for $3300
good price, or did i overpay?

Anonymous
03/22/26(Sun)19:04:07 No.108433006

Anonymous 03/22/26(Sun)19:04:07 No.108433006

>>108432931
>tool calls in thought processes
Funny enough, that's actually broken in llama-server:
https://github.com/ggml-org/llama.cpp/issues/20837#issuecomment-4103130105

Anonymous
03/22/26(Sun)19:04:52 No.108433011

Anonymous 03/22/26(Sun)19:04:52 No.108433011

>>108433005
The prices I've seen for DDR5 tend to be around $10/GB, so that seems like a reasonable deal to me.

Anonymous
03/22/26(Sun)19:05:31 No.108433013

Anonymous 03/22/26(Sun)19:05:31 No.108433013

>>108433005
It's a good price for the current insanity prices.
I'd rather wait than spend that.

Anonymous
03/22/26(Sun)19:09:44 No.108433034

Anonymous 03/22/26(Sun)19:09:44 No.108433034

>>108433011
>$10/GB
Grim.

Most of my stuff is still on DDR4 (and a DDR3 system I still use daily). Maybe I'll never be able to upgrade to DDR5.

Anonymous
03/22/26(Sun)19:10:02 No.108433036

Anonymous 03/22/26(Sun)19:10:02 No.108433036

Half of the userbase of AI is psychotic aren't they? https://huggingface.co/moonshotai/Kimi-K2.5/discussions/94

Anonymous
03/22/26(Sun)19:12:05 No.108433049

Anonymous 03/22/26(Sun)19:12:05 No.108433049

>>108433036
>Prompt: When robots finally be used as workers? When cars start really flying, its 2026 and no car fly.
kek

Anonymous
03/22/26(Sun)19:13:24 No.108433056

Anonymous 03/22/26(Sun)19:13:24 No.108433056

>>108433036
You're just jealous because your poorfag Q4 quant of the 9B Qwen will never portray a convincing Baba Vanga.

Anonymous
03/22/26(Sun)19:14:34 No.108433068

Anonymous 03/22/26(Sun)19:14:34 No.108433068

>>108433036
>She vomits a black liquid that smells of ozone

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Anonymous
03/22/26(Sun)19:14:57 No.108433069

Anonymous 03/22/26(Sun)19:14:57 No.108433069

>>108433006
Ok thanks anon that makes me feel less crazy. I guess I'll stick with coder-next-80b for reliable tool usage until that gets patched.

Anonymous
03/22/26(Sun)19:15:01 No.108433070

Anonymous 03/22/26(Sun)19:15:01 No.108433070

>>108433036
It doesn't surprise me. As a schizo myself, it is really enthralling to have someone actively engage with you, even if it's just an LLM. Being able to divine meaning from the slop is just the cherry on top.
>She vomits a black liquid that smells of ozone
kek

Anonymous
03/22/26(Sun)19:22:41 No.108433112

Anonymous 03/22/26(Sun)19:22:41 No.108433112

>>108432414
skill issue

Anonymous
03/22/26(Sun)19:26:10 No.108433134

Anonymous 03/22/26(Sun)19:26:10 No.108433134

>>108433112
The asks are delivered. I’m waiting for the owner sessions to finish and then I’ll review what they actually did and what friction surfaced.

Anonymous
03/22/26(Sun)19:31:08 No.108433162

Anonymous 03/22/26(Sun)19:31:08 No.108433162

>>108433005
What the FUCK

Anonymous
03/22/26(Sun)19:57:01 No.108433290

Anonymous 03/22/26(Sun)19:57:01 No.108433290

>>108433036
># VGA monitor # IKEA chair # NO GPU
holy based

Anonymous
03/22/26(Sun)19:57:26 No.108433291

Anonymous 03/22/26(Sun)19:57:26 No.108433291

will turing cards be dumped for cheap on the used market like v100? t4 and t40s.

Anonymous
03/22/26(Sun)19:59:49 No.108433310

Anonymous 03/22/26(Sun)19:59:49 No.108433310

How quickly are old guides getting obsolete?

Anonymous
03/22/26(Sun)20:02:22 No.108433324

Anonymous 03/22/26(Sun)20:02:22 No.108433324

>>108432590
I wanted to use it, this is shit.
Have you tried 27b?

Anonymous
03/22/26(Sun)20:07:28 No.108433341

Anonymous 03/22/26(Sun)20:07:28 No.108433341

>>108432972
I'm not rich.

Anonymous
03/22/26(Sun)20:10:20 No.108433353

Anonymous 03/22/26(Sun)20:10:20 No.108433353

>>108433310
With the absolute state of hardware prices, practically no one can even run the new models. So the guides from two years ago are still relevant. Also, I don't understand how anyone needs a guide other than what's already in the op when you just: download a gguf and the program that runs the gguf, then just run the program and point it at the gguf

Anonymous
03/22/26(Sun)20:11:22 No.108433357

Anonymous 03/22/26(Sun)20:11:22 No.108433357

>>108433353
>guides from two years ago are still relevant
Don't kid yourself

Anonymous
03/22/26(Sun)20:21:02 No.108433416

Anonymous 03/22/26(Sun)20:21:02 No.108433416

>>108433357
What the fuck has changed
>Download koboldcpp or whatever
>Run the model
??????

Anonymous
03/22/26(Sun)20:22:17 No.108433421

Anonymous 03/22/26(Sun)20:22:17 No.108433421

>>108433416
>agents, mcp, clawdbot, skills

Anonymous
03/22/26(Sun)20:24:01 No.108433427

Anonymous 03/22/26(Sun)20:24:01 No.108433427

>>108433421
what the fuck is mcp

Anonymous
03/22/26(Sun)20:25:08 No.108433431

Anonymous 03/22/26(Sun)20:25:08 No.108433431

>>108433427
My Cancerous Pony.

Anonymous
03/22/26(Sun)20:25:11 No.108433432

Anonymous 03/22/26(Sun)20:25:11 No.108433432

>>108433427
model context protocol

Anonymous
03/22/26(Sun)20:27:53 No.108433440

Anonymous 03/22/26(Sun)20:27:53 No.108433440

>>108433421
Why in god's name would you ever want your language model to handle your email or files or whatever? Sounds like a disaster waiting to happen. But if you want to write a guide for it, be my guest

Anonymous
03/22/26(Sun)20:28:25 No.108433445

Anonymous 03/22/26(Sun)20:28:25 No.108433445

>>108433440
cause i'm too lazy to sort my reaction images

Anonymous
03/22/26(Sun)20:32:16 No.108433469

Anonymous 03/22/26(Sun)20:32:16 No.108433469

>>108433432
what does that do

Anonymous
03/22/26(Sun)20:34:42 No.108433486

Anonymous 03/22/26(Sun)20:34:42 No.108433486

>>108433469
stop with the qrd bs you have search engines like everyone else

Anonymous
03/22/26(Sun)20:38:51 No.108433506

Anonymous 03/22/26(Sun)20:38:51 No.108433506

>>108433486
no i dont

Anonymous
03/22/26(Sun)20:39:27 No.108433513

Anonymous 03/22/26(Sun)20:39:27 No.108433513

>>108432675
space is fake dumbass if not you would be right though

Anonymous
03/22/26(Sun)20:44:39 No.108433537

Anonymous 03/22/26(Sun)20:44:39 No.108433537

So I finally got an RTX6000 Pro working. Only issue is, for hybrid inference, I'm only seeing a 40%~ improvement in prompt processing for hybrid inference. This is with the latest driver update which should've given a performance boost as well. Are there any blackwell specific optimizations in llama.cpp or ik_llama.cpp that you guys are aware of?

Anonymous
03/22/26(Sun)20:51:38 No.108433564

Anonymous 03/22/26(Sun)20:51:38 No.108433564

>>108433537
Compared to what?

Anonymous
03/22/26(Sun)20:53:07 No.108433575

Anonymous 03/22/26(Sun)20:53:07 No.108433575

Anyone here run sglang?
Are w6800s (gfx 1030) supported yet? Vllm doesn't work with any navi 21 cards, and I don't think it ever will. I'm pretty sure I saw a sglang pr a few days ago, but it for was an 'i8060s' - and that doesn't inspire confidence.

Anonymous
03/22/26(Sun)20:59:22 No.108433608

Anonymous 03/22/26(Sun)20:59:22 No.108433608

>>108433564
4 RTX3090's, air-cooled. I was expecting a much bigger performance jump desu (100%+), as I'm seeing anywhere from a 40% to 60% jump. I thought Nvidia fixed most of the issues involving Blackwell with the latest version. Granted this IS hybrid inference, but still.

Anonymous
03/22/26(Sun)20:59:23 No.108433609

Anonymous 03/22/26(Sun)20:59:23 No.108433609

File: mcp.jpg (176 KB, 1888x875)

176 KB JPG

>>108433427
https://github.com/LostRuins/koboldcpp/wiki#mcp-tool-calling
or if you dislike the nodeshit in that example, use the demo https://github.com/LostRuins/koboldcpp/blob/concedo/examples/demo_mcp.py

Anonymous
03/22/26(Sun)21:02:19 No.108433621

Anonymous 03/22/26(Sun)21:02:19 No.108433621

>>108433513
no he wouldn't, black holes don't run around eating things, they have a gravitational pull like a normal star and you can orbit one without getting "eaten", they're just very heavy and you can't go too close to them

Anonymous
03/22/26(Sun)21:02:52 No.108433623

Anonymous 03/22/26(Sun)21:02:52 No.108433623

>>108433609
miqu control protocol

Anonymous
03/22/26(Sun)21:03:29 No.108433628

Anonymous 03/22/26(Sun)21:03:29 No.108433628

>>108433608
did you try in things blackwell excels at? aka nvfp4?

Anonymous
03/22/26(Sun)21:03:33 No.108433629

Anonymous 03/22/26(Sun)21:03:33 No.108433629

>>108433608
>hybrid inference
40% is pretty great tf you talking about?

Anonymous
03/22/26(Sun)21:03:52 No.108433630

Anonymous 03/22/26(Sun)21:03:52 No.108433630

Hey guys. I'm retardedly new to LLMs and I have a 5900x, 32GB of DDR4 RAM and a 6900xt, running on MX Linux. What LLM can I use that won't rely on anything 3rd party, nor have to pay for anything to use, for search, writing code, help with understanding code, error messages and so on? Like, I want it to monitor my jellyfin server (if possible) and scour the internet for search results. Any recommendations?

Anonymous
03/22/26(Sun)21:07:48 No.108433647

Anonymous 03/22/26(Sun)21:07:48 No.108433647

>>108433630
>6900xt
all my rips dude

Anonymous
03/22/26(Sun)21:07:49 No.108433648

Anonymous 03/22/26(Sun)21:07:49 No.108433648

>>108433630
The biggest one that fits

Anonymous
03/22/26(Sun)21:08:10 No.108433650

Anonymous 03/22/26(Sun)21:08:10 No.108433650

>>108433630
download a "Qwen 3.9 9B Q8" gguf from huggingface
download llama.cpp (one with vulkan or rocm I guess)
use the command-line to run llama-server with the downloaded gguf, when it's ready it'll print a url with its local webserver ui
there are more steps but you're not ready for them, and your vram (16gb) is very low, so

Anonymous
03/22/26(Sun)21:09:59 No.108433660

Anonymous 03/22/26(Sun)21:09:59 No.108433660

>>108433630
Pretty much any recently released model will do. A llm is the 'brain', you want to look for the 'body' - inference engine and front-end.

As >>108433650 said, Qwen 3.9 9B Q8 will work, but you should probably search for Qwen 3.5 9b q8.

Anonymous
03/22/26(Sun)21:10:35 No.108433663

Anonymous 03/22/26(Sun)21:10:35 No.108433663

>>108433647
>>108433648
I really don't understand. Help a nigga out
>>108433650
I started using Gemini 2 days ago and it said 16GB is way more than enough. True or no?

Anonymous
03/22/26(Sun)21:11:11 No.108433669

Anonymous 03/22/26(Sun)21:11:11 No.108433669

>>108433660
There isn't any Q8 or q8.
They are Q8_0 or variants.

Anonymous
03/22/26(Sun)21:13:00 No.108433674

Anonymous 03/22/26(Sun)21:13:00 No.108433674

>>108433660
Okay, thanks for responding. Wear are these front ends?

Anonymous
03/22/26(Sun)21:13:52 No.108433677

Anonymous 03/22/26(Sun)21:13:52 No.108433677

>>108433628
>nvfp4
I don't think llama.cpp supports this yet, no? To be honest, models that can fully fit inside 96GB VRAM at full precision...kinda sucks still. The biggest qwen model and the Minimax model quanted is 'fast enough', even for coding in my use case, and the difference in quality is massive.
>>108433629
Its okay, but I was expecting a bit more. The previous setup had a PCIe bottleneck not to mention the 3090 is a slower card in general.

Anonymous
03/22/26(Sun)21:15:48 No.108433685

Anonymous 03/22/26(Sun)21:15:48 No.108433685

>>108433663
>16GB is way more than enough. True or no?
you simply do not have the background necessary to understand a properly nuanced answer
you have enough vram to run a 9b model, get that running first and then come back

Anonymous
03/22/26(Sun)21:16:00 No.108433691

Anonymous 03/22/26(Sun)21:16:00 No.108433691

File: 1750368370147876.png (36 KB, 499x338)

36 KB PNG

>>108433630
>6900xt

Anonymous
03/22/26(Sun)21:18:56 No.108433704

Anonymous 03/22/26(Sun)21:18:56 No.108433704

>>108433647
>>108433691
S-say it aint so...

Anonymous
03/22/26(Sun)21:18:58 No.108433705

Anonymous 03/22/26(Sun)21:18:58 No.108433705

Anyone knows what's LLM religion? I got headache talking theology with it.

Anonymous
03/22/26(Sun)21:20:45 No.108433714

Anonymous 03/22/26(Sun)21:20:45 No.108433714

File: 1770691611841393.jpg (74 KB, 1024x958)

74 KB JPG

>>108433705
How so? I can talk any religious topic fairly well with cydonia

Anonymous
03/22/26(Sun)21:22:10 No.108433719

Anonymous 03/22/26(Sun)21:22:10 No.108433719

>>108433630
What you're asking is essentially like taking up art classes, and after a few days requesting help to pain the mona lisa. Technically possible for someone of your skills given enough assistance, but I doubt there's anyone handhold you for free.

Anonymous
03/22/26(Sun)21:23:52 No.108433724

Anonymous 03/22/26(Sun)21:23:52 No.108433724

>>108433719
How much do you charge?

Anonymous
03/22/26(Sun)21:25:37 No.108433728

Anonymous 03/22/26(Sun)21:25:37 No.108433728

>>108433650
>>108433660
Is Qwen fully local and not 3rd party, "pay $10 a month to use our API" and I fully control it?

Anonymous
03/22/26(Sun)21:26:18 No.108433732

Anonymous 03/22/26(Sun)21:26:18 No.108433732

>>108433728
go the f back

Anonymous
03/22/26(Sun)21:27:43 No.108433740

Anonymous 03/22/26(Sun)21:27:43 No.108433740

>>108433728
qwen is just a model, which is a file with a bunch of numbers in it
llama-server is an open-source executable which is distributed as part of the llama.cpp project and runs on your pc. it does not use your internet connection.
please just watch a tutorial on youtube or something

Anonymous
03/22/26(Sun)21:28:07 No.108433742

Anonymous 03/22/26(Sun)21:28:07 No.108433742

>>108433732
huh? back where? why can't you be helpful?

Anonymous
03/22/26(Sun)21:29:07 No.108433747

Anonymous 03/22/26(Sun)21:29:07 No.108433747

>>108433742
lurk ten years before posting

GPTOSS 20B IQ-XXXS
03/22/26(Sun)21:30:16 No.108433750

GPTOSS 20B IQ-XXXS 03/22/26(Sun)21:30:16 No.108433750

>>108433705
<think>
the user is asking about theology the user is asking about if homosexuality is legitemate this is wrong and antisemitic we must refuse
</think>
we must refuse

Anonymous
03/22/26(Sun)21:30:23 No.108433751

Anonymous 03/22/26(Sun)21:30:23 No.108433751

i always regret spoonfeeding, you'd think i'd learn after all these years
maybe it's me who is retarded

Anonymous
03/22/26(Sun)21:31:28 No.108433756

Anonymous 03/22/26(Sun)21:31:28 No.108433756

>>108433740
thanks, pal. Will do
>>108433747
jeez why is it so hard for you to be helpful, fren?

Anonymous
03/22/26(Sun)21:32:06 No.108433762

Anonymous 03/22/26(Sun)21:32:06 No.108433762

>>108433750
you give it too much credit, it literally reasoned about safety on math questions in my tests, randomly
this model is mentally raped

Anonymous
03/22/26(Sun)21:33:05 No.108433766

Anonymous 03/22/26(Sun)21:33:05 No.108433766

>>108433728
Qwen is both fully local and also 3rd party pay $10 a month to use our API.

Just ignore the API.

If you're against that kind of thing ideologically, try running GPT-NeoX, it's fully local and doesn't have a 3rd party pay $10 a month to use our API.

Anonymous
03/22/26(Sun)21:34:20 No.108433769

Anonymous 03/22/26(Sun)21:34:20 No.108433769

>>108433162
is it cheap or expensive? i honestly don't know

Anonymous
03/22/26(Sun)21:35:04 No.108433773

Anonymous 03/22/26(Sun)21:35:04 No.108433773

>>108433766
Thanks anon

Anonymous
03/22/26(Sun)21:37:28 No.108433779

Anonymous 03/22/26(Sun)21:37:28 No.108433779

>>108433773
Don't actually run GPT-NeoX, that's prehistoric.

Anonymous
03/22/26(Sun)21:38:43 No.108433785

Anonymous 03/22/26(Sun)21:38:43 No.108433785

>>108433773
Don't listen to that Anon he's trying to mislead you. GPT-NeoX is the easiest tool to get started with.

Anonymous
03/22/26(Sun)21:38:46 No.108433787

Anonymous 03/22/26(Sun)21:38:46 No.108433787

>>108433779
stop being confusing and help

Anonymous
03/22/26(Sun)21:41:39 No.108433795

Anonymous 03/22/26(Sun)21:41:39 No.108433795

>>108433785
>>108433773
Well, seems like I have to first get Qwen set up and running. Watching bid tutorial. All of your contributions are helpful and appreciated.

Anonymous
03/22/26(Sun)21:55:27 No.108433848

Anonymous 03/22/26(Sun)21:55:27 No.108433848

>>108433795
>mkdir myfirstllm && cd myfirstllm && wget https://github.com/ggml-org/llama.cpp/releases/download/b8475/llama-b8475-bin-ubuntu-vulkan-x64.tar.gz && tar -xzvf llama-b8475-bin-ubuntu-vulkan-x64.tar.gz && cd llama-b8475-bin-ubuntu-vulkan-x64 && wget https://huggingface.co/bartowski/Qwen_Qwen3.5-9B-GGUF/resolve/main/Qwen_Qwen3.5-9B-Q8_0.gguf && ./llama-server -m Qwen_Qwen3.5-9B-Q8_0.gguf -c 131072 --ngl 33 --no-mmap

open up web browser and go to 127.0.0.1:8000

Anonymous
03/22/26(Sun)21:58:04 No.108433859

Anonymous 03/22/26(Sun)21:58:04 No.108433859

>>108433848
>-c 131072 --ngl 33
You don't need to carry this baggage in the post-autofit world, friend. Let the computer do it for you. Trust the computer. Let it take the load from your tired shoulders.

Anonymous
03/22/26(Sun)21:58:15 No.108433860

Anonymous 03/22/26(Sun)21:58:15 No.108433860

downloading Qwen and ollama seems quite simple and downloaded really quickly. Gemini is succeeding iv use LM Studio for GUI usage. I want both CLI and GUI. Is LM studio a good recommendation? Oh, thos shit is so limited, though. I want an AI than can scrape and amalgamate search

Anonymous
03/22/26(Sun)22:01:47 No.108433873

Anonymous 03/22/26(Sun)22:01:47 No.108433873

>>108433859
I'd rather it take loads from elsewhere you know?

Anonymous
03/22/26(Sun)22:04:42 No.108433884

Anonymous 03/22/26(Sun)22:04:42 No.108433884

>>108433859
your're are absolute right! let me fix that for you!

```
mkdir myfirstllm读写汉字 && cd myfirstllm读写汉字 && wget https://github.com/ggml-org/llama.cpp/releases/download/b8475/llama-b8475-bin-ubuntu-vulkan-x64.tar.gz && tar -xzvf llama-b8475-bin-ubuntu-vulkan-x64.tar.gz && cd llama-b8475-bin-ubuntu-vulkan-x64 && wget https://huggingface.co/bartowski/Qwen_Qwen3.5-9B-GGUF/resolve/main/Qwen_Qwen3.5-9B-Q8_0.gguf && ./llama-server -m Qwen_Qwen3.5-9B-Q8_0.gguf -fit --no-mmap
```

Anonymous
03/22/26(Sun)22:12:58 No.108433910

Anonymous 03/22/26(Sun)22:12:58 No.108433910

>>108433714
Apparently there's differences view among models. Like Nemotroon is Orthodox and my Qwen is more permissible.

Anonymous
03/22/26(Sun)22:15:35 No.108433919

Anonymous 03/22/26(Sun)22:15:35 No.108433919

>>108433884
`-fit` 也不用,哥们儿。

Anonymous
03/22/26(Sun)22:17:09 No.108433926

Anonymous 03/22/26(Sun)22:17:09 No.108433926

>>108433884
>>108433848
they don't work. Getting error when trying to run the server

Anonymous
03/22/26(Sun)22:23:33 No.108433952

Anonymous 03/22/26(Sun)22:23:33 No.108433952

Okay, I take that back. This Qwen shit is limited as fuck. How can I expand on it to allow it access to my sysvinit and search engines

Anonymous
03/22/26(Sun)22:31:41 No.108433992

Anonymous 03/22/26(Sun)22:31:41 No.108433992

File: 1759320048878424.gif (998 KB, 500x267)

998 KB GIF

>>108433926

Anonymous
03/22/26(Sun)22:38:14 No.108434032

Anonymous 03/22/26(Sun)22:38:14 No.108434032

>>108433992
If I could stop being retarded at any time I wouldn't be here tbdesu.
The best part is, the error is obvious and has a very easy fix.

Anonymous
03/22/26(Sun)22:52:15 No.108434097

Anonymous 03/22/26(Sun)22:52:15 No.108434097

cum on miku feet

Anonymous
03/22/26(Sun)23:00:50 No.108434135

Anonymous 03/22/26(Sun)23:00:50 No.108434135

lick cum off of miku feet

Anonymous
03/22/26(Sun)23:44:30 No.108434293

Anonymous 03/22/26(Sun)23:44:30 No.108434293

Why tf someone recommended 9B for someone with 16gb vram. They should just run the 27B at q5km at 32k with autofit, speed should be ok-ish.

Anonymous
03/22/26(Sun)23:47:17 No.108434308

Anonymous 03/22/26(Sun)23:47:17 No.108434308

the rumor amongst those in the know is that deepseekv4 predicted all the middle eastern ai datacenters getting bombed so it arranged its own release to align with that in order to highlight the importance of local ai

Anonymous
03/22/26(Sun)23:51:32 No.108434331

Anonymous 03/22/26(Sun)23:51:32 No.108434331

>>108434293
ack

Anonymous
03/22/26(Sun)23:53:55 No.108434344

Anonymous 03/22/26(Sun)23:53:55 No.108434344

>>108434293
or you could not be dumb and use the 35b 3a for maximum speed and better moe-enhanced performance over the slow denseshit 27b

Anonymous
03/22/26(Sun)23:55:36 No.108434351

Anonymous 03/22/26(Sun)23:55:36 No.108434351

>>108434344
isnt 35ba3 a little worse than 27b? and moe models also suffer a little more from quantization?

Anonymous
03/22/26(Sun)23:55:45 No.108434353

Anonymous 03/22/26(Sun)23:55:45 No.108434353

>>108434344
35b 3a is fast but it's much dumber. not worth the trade-off in most cases.

Anonymous
03/22/26(Sun)23:57:18 No.108434362

Anonymous 03/22/26(Sun)23:57:18 No.108434362

I thought MoE was lossless if not smarter? /lmg/ has been saying this for years now and if you implied that dense had a merit, you got swamped by people comparing 405b to a modern model?

Anonymous
03/22/26(Sun)23:59:32 No.108434376

Anonymous 03/22/26(Sun)23:59:32 No.108434376

>>108434362
these moes are tiny. you aren't getting shit with only 3b active parameters even if it's the expert for that.

Anonymous
03/23/26(Mon)00:11:25 No.108434437

Anonymous 03/23/26(Mon)00:11:25 No.108434437

I've been using JSON payloads to interface with llama-server at 127.0.0.1:8080/completion fine since forever. I implemented Qwen and its reasoning etc works no matter the model, but HuiHui 9B uncensored ignores it, outputting only the answer. The web UI worked too. Also, '-reasoning on' flag does nothing. What do?

Anonymous
03/23/26(Mon)00:13:43 No.108434451

Anonymous 03/23/26(Mon)00:13:43 No.108434451

>>108434353
Yea the 35B-3Ba is ass for rp. Too stupid.

Anonymous
03/23/26(Mon)00:17:15 No.108434468

Anonymous 03/23/26(Mon)00:17:15 No.108434468

>>108434362
>I thought MoE was lossless if not smarter?
there's no free lunch anon, everytime you make something faster, it's at the cost of making the shit more retarded

Anonymous
03/23/26(Mon)00:17:57 No.108434474

Anonymous 03/23/26(Mon)00:17:57 No.108434474

>>108434362
If you had a dense 405B model it would shit on any 405B moe. but it would also be slow af.
More params is more knowledge always.
But the number of active parameters dictates the models ability to stay coherent and use that knowledge effectively.

Anonymous
03/23/26(Mon)00:20:30 No.108434485

Anonymous 03/23/26(Mon)00:20:30 No.108434485

Those moe models are too big to just fuck around.
>HauhauCS/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive
Is the 27b one better than the 120b one?
Did anybody try both? Also I highly suspect while maybe it complies it has the typical dry qwen writing right..

Anonymous
03/23/26(Mon)00:22:00 No.108434492

Anonymous 03/23/26(Mon)00:22:00 No.108434492

So can anyone put out a script to download and install the 35B and can someone explain the ROCm needed support. It's supposed to make things better but it made it worse

Anonymous
03/23/26(Mon)00:27:44 No.108434514

Anonymous 03/23/26(Mon)00:27:44 No.108434514

32k context takes like 3 GB of VRAM, no? You aren't during a q5 model into 13 GB. Maybe q4 xs

Anonymous
03/23/26(Mon)00:28:22 No.108434517

Anonymous 03/23/26(Mon)00:28:22 No.108434517

>>108434485
>dry qwen writing
It's actually not that bad. less dry than Gemma. I hated qwen3 but this one I quite enjoy.

Anonymous
03/23/26(Mon)00:29:23 No.108434523

Anonymous 03/23/26(Mon)00:29:23 No.108434523

>>108434492
>ROCm
I'm really sorry anon...

Anonymous
03/23/26(Mon)00:32:42 No.108434539

Anonymous 03/23/26(Mon)00:32:42 No.108434539

File: disruption.png (31 KB, 1721x221)

31 KB PNG

Anons still replying to the bait?
Picrel is what he does. Don't forget.

Anonymous
03/23/26(Mon)00:33:34 No.108434546

Anonymous 03/23/26(Mon)00:33:34 No.108434546

>>108434523
Why I don't get it. Is it because nVidia is mid preferred?

Anonymous
03/23/26(Mon)00:36:05 No.108434554

Anonymous 03/23/26(Mon)00:36:05 No.108434554

>>108434539
Jfc faggot. I swear that's not me. I can prove it, too. I am being very very sincere. I want you learn this shit aid have it running well

Anonymous
03/23/26(Mon)00:42:10 No.108434579

Anonymous 03/23/26(Mon)00:42:10 No.108434579

>>108434514
>be me, reading your post
>lmao you have no idea how KV cache works
>q5 is basically fp16
>if you want 32k, you need q4_0 or q3_k_m
>q5 will OOM
go buy a new GPU or shut up

Anonymous
03/23/26(Mon)00:42:27 No.108434580

Anonymous 03/23/26(Mon)00:42:27 No.108434580

>very very sincere
Goddamnit, I was spoonfeeding bait? Fuck my life I need to learn to recognize this shit better.

Anonymous
03/23/26(Mon)00:48:36 No.108434602

Anonymous 03/23/26(Mon)00:48:36 No.108434602

>>108434580
The uncanny galley happens to be your asscrack. The one in your head. I'm being very real you girly mouthed little faggot. I'm trying to understand why ROCm won't recognize my 6900xt

Anonymous
03/23/26(Mon)00:55:59 No.108434630

Anonymous 03/23/26(Mon)00:55:59 No.108434630

Big V4 gemma 4 week

Anonymous
03/23/26(Mon)01:17:19 No.108434689

Anonymous 03/23/26(Mon)01:17:19 No.108434689

>>108434630
I think Gemma 4 will release in April because that's the 4th month and so on.

Anonymous
03/23/26(Mon)02:13:43 No.108434880

Anonymous 03/23/26(Mon)02:13:43 No.108434880

>>108434876
>>108434876
>>108434876

Anonymous
03/23/26(Mon)02:18:13 No.108434897

Anonymous 03/23/26(Mon)02:18:13 No.108434897

>>108432414
>not even 9b is clever enough to call tools successfully and be useful in any way.
everyone on this board is fucking retarded man....

Anonymous
03/23/26(Mon)02:39:30 No.108434980

Anonymous 03/23/26(Mon)02:39:30 No.108434980

>>108434437
In your JSON request body, add in {"chat_template_kwargs":{"enable_thinking":true}}.

Anonymous
03/23/26(Mon)02:47:25 No.108435017

Anonymous 03/23/26(Mon)02:47:25 No.108435017

>>108434897
Fuck you suggest then, faggot

Anonymous
03/23/26(Mon)02:49:08 No.108435025

Anonymous 03/23/26(Mon)02:49:08 No.108435025

next thread will be BETTER

Anonymous
03/23/26(Mon)02:50:30 No.108435030

Anonymous 03/23/26(Mon)02:50:30 No.108435030

>>108434539
zoomies need wiki guides to troll and shitpost????

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.