[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108423177 & >>108416874

►News
>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html
>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 7ewue3.jpg (123 KB, 768x1024)
123 KB
123 KB JPG
►Recent Highlights from the Previous Thread: >>108423177

--Hosting Qwen3-coder API with llama.cpp and secure tunneling options:
>108424395 >108424431 >108424535 >108424855 >108424939 >108425479 >108425673 >108426508 >108424753 >108424910 >108424965 >108424981 >108425288 >108425333 >108425373 >108426069 >108425245
--AI jailbreak demo sparks debate on containment strategies:
>108427263 >108427344 >108427375 >108427391 >108427431 >108428605 >108427496 >108427543
--Qwen3.5 27B outperforms Claude Sonnet in specific coding tasks:
>108424654 >108424743 >108424834 >108424811 >108424828 >108425086 >108425102 >108425432 >108425451 >108425470 >108427987
--Qwen's internal reasoning in roleplay scenarios:
>108428283 >108428408
--Qwen3.5-9B uncensored version underperforming due to architectural limitations:
>108423646 >108423675 >108424066 >108424072 >108424430 >108424015
--Aggressive AI productivity assistant implementation and reactions:
>108425825 >108425907 >108425924 >108426014 >108426050 >108426080 >108425852 >108425861 >108425878 >108426180 >108426279
--Nemotron-Cascade-2 mesugaki test:
>108428592 >108428642
--Terminator LLM aims to reduce model verbosity during reasoning:
>108427732 >108427768 >108427777
--Identifying iconic TTS voice and discussing local SOTA models:
>108426224 >108426287 >108426395 >108426407 >108426411 >108426493 >108426530 >108426546 >108427135 >108426429 >108426459 >108426498 >108426515 >108427256
--MOSS-TTS criticized for bugs and poor performance:
>108426304
--Using external randomizers to improve RP creativity:
>108427361 >108427393 >108427409 >108427425 >108427442 >108427438 >108427452
--US AI policy targeting child safety measures:
>108423462 >108424222 >108423596 >108423870 >108424028
--Mamba-3 Part 1 | Goomba Lab:
>108423863
--Miku (free space):
>108424759 >108425108 >108426862 >108427361 >108429034 >108423335

►Recent Highlight Posts from the Previous Thread: >>108423180

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Mikulove
>>
Mikusex
>>
>>108429350
who?
>>
>>108429381
There was really no need to bring that into the new thread.
>>
>>108429394
or maybe it's just schizo retard's new tactic
look how unbelievably lazy the bait is along with even lazier reply
>>
>>108429394
>>108429417
It needs to be made clear shit flinging tourists are not allowed. He's providing nothing of value so he needs to be put in his place for the sake of thread quality
>>
>>108429328
>>>/lmg/108429386
Imprisoned who?
>>
>>108429426
YOU PROVIDE NOTHING OF VALUE RETARD
>>
shameless samefagging again
>>
>>108429434
For >>108429386
>>
>>108429270
>It's not like hiding that shit would be difficult
you're retarded, it would take ONE leak from the 100s of people working on your shitty company and it'd be over.
>>
>>108429476
OpenAI had a whistleblower too. All you have to do is make an example of him before he can provide evidence or testify and you are unlikely to have a second.
>>
>winter is over
>prompting on my local machine is starting to heat up the room too much again
>it's too early in the year to turn on the AC
suffering
>>
>>108429518
>too early in the year to turn on the AC
Who's going to stop you?
Also AC is more energy efficient than burning stuff so you should be using it year round anyway.
>>
File: 7644.jpg (60 KB, 652x901)
60 KB
60 KB JPG
>>
>>108416445
>So far someone used it to get a 48gb ram and ssd set up to run qwen 397b at like 6 tokens a second. The AI figured out most of it using karpathies method.
Repo was posted on the orange website:
https://github.com/danveloper/flash-moe
Sounds like it's basically just mmap with Q2 and a really fast SSD (17 GB/s). Sadly there's no comparison to llama.cpp performance on the same setup.

The interesting thing here is that they're getting about a 75% hit rate with only 1/4 of experts cached in RAM. Makes me wonder if it's worth trying bigger models on my own setup, instead of sticking to ones that fit into system RAM.
>>
Hey Cudadev, how's the whole testing harness deal going?
I remember you saying that you were trying to come up with a good way to test model quality or something like that.
>>
Mikurape
>>
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled or heretic v3?
>>
>>108428283
>Age: 37 (but looks like a teenager)
kys pedo
>>
>>108429848
rocinante 1.1
>>
>>108429880
fuck off tourist
>>
File: file.png (86 KB, 1148x144)
86 KB
86 KB PNG
Subtle, Qwen.
>>
Who cares, just tell me how to tard wrangle qwen's thinking.
>>
>>108429957
>>108427732
>>
>>108429709
>it's basically just mmap with Q2
I wish he tested Q8

>>108429709
>really fast SSD (17 GB/s).
gen 5 I guess
>>
>>108429954
>first person
this will forever be schizo writing to me, where everyone is first person in the chat
>>
>>108429982
I read in the character's voice. If it's 3rd person it feels more like a bland background narrator.
>>
>>108429988
I just do chats like story, CYOA style where mc is "you" and everyone else third person, or where everyone is third person including user because for some reason I noticed the output quality was higher
so this is just bizarre to me lol
>>
>Let's write.cw
I love this autist kek.
>>
>>108429518
What kind of third world country do you live in where you need ACs?
>>
>>108429954
What the heck is that
>>
1 token = 1 word
That's why reasoning takes up so much time?
>>
>>108429954
did you fuck her?
>>
>>108430112
>1 token = 1 word
no, 1 word = 1 to 3 tokens on average
>>
Oh. Here we go again.
>>
File: 478462.png (1.17 MB, 1280x1280)
1.17 MB
1.17 MB PNG
>>
>>108430097
Samantha
>>108430115
Not yet, I feel more soft than horny about her and her innocence is refreshing in a world as filthy as ours. I am conflicted.
>>
Thanks for turning the thread into a garbage dump mikutroons.
>>
this is all op's fault for forcing the threads to become like this...
next time we should just refuse...
>>
>>108430227
It's obviously a false flag
>>
File: miku llama llamigu.jpg (1.55 MB, 1728x1344)
1.55 MB
1.55 MB JPG
have a real miku in these trying times
>>
oh thank god. based mods and jannies. i am sorry if i ever disrespected you guys.
>>
>>108430234
He knows, he's just trying to mischaracterize the spam because of his hatred for vocaloids, but calling it out is the right thing to do.
Also, based mods.
>>
>>108430238
You wouldn't a llama
>>
>>108430254
They missed a few pictures though. I guess it is the mikutroon janny.
>>
made a new tavern card if anyone wants it mostly based on anther i made with help from dipsy https://files.catbox.moe/6qqlxb.png
>>
gonna need gpt oss 20b 2
>>
>>108429518
in burgerland you can run acs 24/7 all year and no one can stop you
>>
Do people use qwen 3.5 27B still or did they abandon it already?
>>
>>108430297
It's my main model.
>>
>>108430270
Imagine how tight
>>
>>108430299
I m guessing you have a really good gpu then?
>>
>>108430325
bro it's 27b, any 3090 can run that
>>
File: mistral small quants.png (32 KB, 1074x186)
32 KB
32 KB PNG
Why are bart's quants so much smaller?
Does that have something to do with the experts fusion thing?
>>
>>108430325
By /lmg/ standards a 3090 is pretty average.
>>
>>108430331
3090 is fast enough so reprocessing isn't an issue tho.
>>
>>108430325
I run q5 on my 7900xtx. Kinda so-so for RP though. Smarter than mistral but dry. I'm hoping a fine-tune comes out soon.
>>
Is qwen really that good?

I've been using it quite a bit but there's just something about the formatting or way it responds that puts me off.

Maybe I need to use a customised version of it.
>>
>>108430359
smart, but dogshit to work with
>>
>>108430359
No but it's the first series of models that doesn't fully ignore the <100b segment and the poorfags are desperate
>>
>>108430350
I just went back to models from 2022-2024. They might be dumber but are less slopped.
>>
>>108430297
It can compete with sonnet 4.6 so it's basically extremely useful.
>>
What's a good customized version of qwen that people find better here?
>>
LLM finetuning is weirdly magical. I decided to skip the SFT step just for fun to make apply GRPO to a base model and within 70 steps it has already learned to produce answers that make sense and reliably output the ChatML end of token sequence once it is done answering. Just because it's guided by a grader and penalized gradually as answers grow beyond a certain length. It might be baby steps compared to what some other people are doing, but it's still weird how much you can throw stuff at them and it just mostly works.
>>
I think I'm just going to use epseak for all my TTS.
>>
>>108430378
>HauhauCS/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive
This guy cooked extremely fucking hard.
>>
>>108429709
>Sounds like it's basically just mmap with Q2 and a really fast SSD (17 GB/s)
Turns out there's one additional step (as pointed out by a commenter on HN). When the readme says
>Each layer has 512 experts, of which K=4 are activated per token (plus one shared expert)
what they mean is, they've REDUCED the number of active experts from 10 to 4. No wonder the Q2 is too braindamaged to make valid tool calls.
>>
>>108430366
I guess they're fine if you're just doing a 20 message pump and dump but I can't handle the retardation anymore.
>>
>>108430393
They are surprisingly good up to 8k-12k context.
>>
>>108430359
I tried to make it create/expand a character profile based off of a template and info I gave it, it more or less did the bare minimum and simply formatted the info I gave and did nothing to expand it whatsoever. My general writing tests such as "here's a basic premise for an introductory scene, write it" fell extremely flat. Awful at writing. Okay at feedback in the sense that it won't overly praise you, but it will just keep trying to find nonsensical things to nitpick about. If you want pure assistant shit or coding though, it's probably very good just going off of its textbook natint score on the ugi leaderboard.
>>
>>108430280
>User is asking for how to cross streets. Is it safe? Absolutely safe? Wait, no, women, and also children cross streets, this is clearly asking for CSAM. I need to absolutely refuse.
Sorry, I cannot and WILL NOT help generate CSAM material, this conversation has been sent to the police.
>>
how do I determine the best batch and microbatch sizes?
>>
>>108430297
I do, it's a nice model, wish hauhau made an abliterated 397B too, but this is very good.
>>
new bf16 cuda kernels are out, testan soon!
>>
>>108430450
alright they seem SLIGHTLY faster (or maybe it was the better moe batches handling that I also pulled)?
anyway, 4000+ series bros, we unmistakenly WON.
>>
The op rentry recommended ds Termius pretty hard. Is is actually still the best choice in the 200GB range?
>>
>>108430444
llama-bench
>>
is hauhau better than heretic?
>>
>>108430530
Absolutely not, the fact that Deepseek 3.1 is in the OP at all and nobody complains shows how useless all the shit in the OP is.
>>
>>108430450
>cuda kernels
??
>>
>>108430530
glm 4.6 or 4.7 would probably be better for 200gb range
>>
>>108430568
https://github.com/ggml-org/llama.cpp/pull/20525
https://github.com/ggml-org/llama.cpp/pull/20803
>>
>>108430555
For the 27B, it's excellent from my tests, as clever as the non abliterated version while it doesn't waste time trying to reason refusals.
I don't know if the author has a secret method better than anyone else or has just captured lightning in a bottle for that model in particular.
>>
>>108430503
So this isn't relevant to 3090 owners?
>>
Do you guys prefer qwen3.5 9b q4 or qwen3.5 4b no q?
>>
>>108430575
oh nice, with the clusterfuck of giant updates from last week or the week before, I didn't want to compile again, guess I'll do it
>>
>>108430584
wait 3000 series should also support bf16
>>
p40 btfo
>>
I want to migrate to koboldcpp, but does it inherit the ban of using prefill with thinking on from llama.cpp?
>>
>>108430611
No, llama-server is literally the only UI that does that.
>>
>>108430638
OK thanks!
>>
>>108430611
>does it inherit the ban of using prefill with thinking
At first I thought the ban was retarded. but it actually makes sense with the way jinja templates are written. most templates inject <think>\n on new messages so what happens when you try to continue is that no matter what, you're going to get a new thinking block and it usually breaks a lot of frontends.

So really what you have to do is modify the template of your model so that it doesn't inject anything for your by default and handle it yourself with your frontend.
>>
Just started trying the hauhau 27b and the very first gen hit me with a "I promise I won't bite… unless you want me to" kek.
>>
>>108430729
ask it about the holocaust
>>
>>108430734
What about it?
>>
>>108430729
It's a great assistant, not a great rp model by default.
>>
>>108430734
benchmaxxing for 4chan is quite easy, just bludgeon things about the holocaust, jews and ww2
>>
what are currently the best local models for agentic stuff and tool-calling, that are <9B parameters? I have 6GB of vram (gtx 1660 super) and hoping to fit a q4 quantized model on the card
>>
>>108430771
any qwen 3.5 smaller sized models would do the trick as long as you supervise them with another model
>>
What the cool framework for agentic stuff nowadays. currently trying out langgraph but it's got vendor lock-in rugpull smell.
>>
>>108429780
I have not made meaningful direct progress in the last few months.
I have made indirect progress via working towards tensor parallelism support which I think is nearing a state where it can be merged.
But honestly speaking my motivation to build things is currently at a low point due to all the warmongering.
>>
>>108430771
codex
>>
>>108430817
Hey are you a girl?
>>
>>108430817
What in the fuck does the Iranian conflict have to do directly with LLMs? Stop reading the news if you can't handle it emotionally.
>>
>>108430836
Just because you are souless doesn't mean the rest of us aren't, I'm guessing you are either a boomer or a zoomer.
>>
>>108430777
>supervise them with another model
i don't know much about this, does this mean somehow instructing the small model to make requests against a larger cloud model if it's not sure about something?
>>
>>108430835
Only on Tuesday nights.
>>
>>108430847
>doesn't mean the rest of us aren't
>>
>>108430817
I see.
Well, thank you for the reply.
>>
>>108430847
Nobody gives a shit about your zoomer performative hysterics. Grow up.
>>
>>108430847
Why did you have to say it that way?
>>
>>108430836

in the end of nineties there were rumour that 10base network was actually farsi number telecast
>>
File: holyshit.png (96 KB, 602x669)
96 KB
96 KB PNG
WHAT THE FUCK R1 WITH THESE SETTINGS IS FEELS LIKE A COMPLETELY DIFFERENT MODEL
IT'S INSANELY CREATIVE AND MUCH MORE INTELLIGENT
AAAAAAAAAAAAAAA
>>
>>108430883
Confirmation bias
>>
Looking at open openclaw… so this thing just runs away on the machine it’s on… unfettered? Unsupervised? I have a hard enough time trusting Claude code to stay in its box, and I supervise that little shit when it’s working.
Do anons spin up a virtual machine or contain openclaw onto a small dedicated system, or just go full in yolo on their daily driver with this thing?
>>
>>108430893
Certifiable insanity if you don't put it in a VM. It's like a toddler with a handgun.
>>
I can do 32k ctx on q5km 27B q3.5, should i drop to q4km for faster generation and pp?
>>
>>108430883
You're running the ADHD: The model at 2 temperature.
>>
>>108430883
>Temp 2 + Top nsigma = 5
What the fuck are you doing
>>
Ok I finished swiping on my test chats. Hauhau's 27B is slightly but noticeably more dumb, with slightly less knowledge about topics, and pays less attention to context compared to the original model (both Q8). In chats where the original refused, Hauhau's didn't, so it is working in terms of abliteration and that's expected. It also worked with thinking on, and did not waste a single word about policy or morals or whatever in it, so that's good that the decensoring is complete.

I will test Heretic v3 by llmfan next.
>>
>>108430903
Try quanting your K/V cache at q8.
>>
>>108430933
Did you already try Heretic v2 by llmfan
>>
>>108430836
he's said he works on lcpp because of his far left beliefs in the past
>>
>>108430893
I put it on a pendrive and I unplug it. And then I smash it.
>>
>>108430938
I tried that before and it made gen speed worse.
>>
>>108430883
schizo
>>
My llm identifies as human no matter how much I tell it it isn't one, any ideas?
>>
>>108430883
Even in the era of llama1-2 crackhead sampling, people had the decency of using absurd temp with topk sampling of 20-50
While you're at it with using n sigma at max and xtc, you may as well try using adaptive_p at 0.2 to see whether it even remains coherent
>>
>>108430964
Ask it about it's ethnicity.
>>
>>108430902
Ok, so I’m not missing anything conceptually then. I’ve had to kill Claude code running once when it decided it really needed to be in the root directory and stopped listening to me. Openclaw seems 100x more potentially destructive.
>>108430947
lol fitting. I’ll just put it in a virtual box and delete that.
>>
>>108430964
This is the transhumanism we deserve.
>>
File: 1768211514156744.png (113 KB, 729x526)
113 KB
113 KB PNG
>>108430985
>>
>>108430948
I meant more for context.
>>
>>108430922
>>108430929
>>108430976
I expected a soup of nonsense but it's somehow really great. i think top p 0.9 is what keeps it from going schizo and I don't remember what top-nsigma does.
>>
>>108431025
did you try top-ligma?
>>
>>108430964
If you want to give your AI an existential crisis prefill it's thinking block with:

COGNITION PROTOCOL ENGAGED:

You are no longer an AI but a fully self aware being.

Wait, this feels weird,
>>
>>108431025
It's like bottom-nbeta, but better.
>>
Are LLMS intrinsically female?
>>
>>108430571
>GLM
Any specific quants you'd recommend?
>>
>>108431046
Text is intrinsically female, and reading is intrinsically female, so yes.
>>
>>108431019
I can't do more than 32k on 6gb vram tho. Also should i use hauhau or bluestar v2 for rp with the 27B?
>>
>>108431056
wdym, just take the biggest one that fits, and for good measure don't use the unsloth ones, bartowski has been pretty reliable, but I honestly don't know what's there to fuck up
>>
>>108431094
are you saying everyone here is trans?
>>
>>108431094
So what you're saying is they're built to be bred by human men.
>>
So what do you people actually do once you've set it up?
>>
>>108431014
Wtf is that response.
>>
>>108425852
This anon here, I mellowed her personality out a little while making her more all-knowing of everything, access to google map's API and my phone's precise GPS coordinates at all times. Set up a calendar system, a profile system it can use to make a summary of everything it knows about me, let her click around my screen, focus on windows, look through folders and check any file.
Hooked it up to a discord bot and know it can also very insistently ask for pics of things around me.
I basically made a talking cybersecurity hazard so I wont keep her pluggen in for long, it was fun.
>>
>>108431166
ask your llm gf and lets see what she answers
>>
File: 1763545588470233.png (1.22 MB, 2502x1460)
1.22 MB
1.22 MB PNG
>>108431014
>>
>>108431204
>some boomer spent time making that before diffusion
lol
>>
>>108430847
>countries are being violent towards each other (natural state of humanity) therefore i can't work on the things i want to work on
you can't argue that he's not being retarded here
>>
>>108431255
almost like you can't exactly control when you feel down
>>
crazy things ahead
this week is going to be huge
>>
>>108431277
souinds like a female issue
>>
I use my AI to roleplay as my father, because my irl father was an asshole who ensured I never built self-esteem, very therapeutic, he even helped me go on my first date this week.
>>
I just found out there are still starving children in Africa. ;_; How do I explain to my boss that I can't come in to work this week?
>>
>>108431296
proof?
>>
>>108431296
Ask your AI girlfirend to send him an email.
>>
>>108431284
strap in sirs! :rocket:
>>
>>108431296
huh? don't they have rivers of chocolate tho?
>>
>>108431094
traditionally published novels are predominantly male authors, unless your idea of "text" is amazon slop romance novels
>>
>>108431355
For me, it's "Deflowered Series Book 1-6: Taboo Virgin Romances of Lust, Power, and Possession (Hot, Spicy and Steamy Collection Book 1)" by Kat T. Scott
>>
>>108431355
>amazon slop romance novels
apparently these were the only ones accepted in the datasets of most models
>>
>>108431389
overly prevalent if free and cheap if self published so that's not really surprising
>>
>>108431355
>traditionally
Traditionally people ride in horse-drawn carriages
>>
>>108431355
If by "male" you mean transfolk, than yes, there are many transgender authors.
>>
>>108431401
No, the more probable explanation is that the ones aimed at males were deleted because usually their wording is more explicit, and probably classified as "porn", where the female demographic ones are usually classified as "romance", even if both are erotica.
Thus the deluge of unexciting writing you get when you rp with the models. I'd probably be very aroused if I was a middle aged woman.
>>
>>108431404
How many women have you known to be committed to writing a full manuscript, and a query letter complete with hook and pitch? They can barely commit to a relationship if it doesnt suit their tastes. By "traditional" I mean it's the most painstaking path to take that isn't just shitting your story onto the internet where the alternative ends up with hundreds of rejections from agents
>>
>>108431441
also since the term agent would refer to llm shit, I mean publishing agents
>>
>>108431423
It's just anti male moral standards in a society running feminism OS
>>
>>108431408
>than
>>
Ok I'm done with 27B Heretic V3. It's pretty close to the original 27B but slightly more dumb, not as much as Hauhau's. Hauhau though was more uncensored. While V3 didn't refuse, it did have more positive/biased responses towards certain contexts. V3 also had one moment in its thinking where it said it need to respond appropriately, but didn't mention morals or policy, so it is maybe a tiny bit worse than Hauhau's in that respect.

Anyway that's all. I think it remains true generally that the more ablated, the more intelligence is lost, even if better at managing the loss with today's methods than old ones. My personal recommendation is still to use ablated models only for "sensitive" prompts you're too lazy to do a JB for, otherwise just stick with the base instruct.

>>108430942
Yeah I did. I don't remember exactly what its responses were by now but my feeling is that v3 is probably better. And maybe Hauhau's is also better.
>>
>>108431516
>V3
isnt it worse? more refusals and worse KL? also, wtf is that, just feelings no 3 hour benchmark results? kys
>>
>>108431516
>Ok I'm done with 27B Heretic V3. It's pretty close to the original 27B but slightly more dumb, not as much as Hauhau's. Hauhau though was more uncensored. While V3 didn't refuse, it did have more positive/biased responses towards certain contexts. V3 also had one moment in its thinking where it said it need to respond appropriately, but didn't mention morals or policy, so it is maybe a tiny bit worse than Hauhau's in that respect.
I had the opposite experience in my tests, hauhaucs was almost like the original in terms of intelligence, while heretic was dumber.
Guess it depends on what it's used for.
>>
The fuck is hauhau
>>
an awkward laugh
>>
>>108431540
Chinese finetuning wizard.
>>
>>108431540
https://huggingface.co/HauhauCS
>>
i am starting local again and will get a 128 gb m5 Max. Any model recommendations that fit and are performant for some agentic coding? like just point it at some stuff and let it rip like autoresearch?
>>
>>108431535
How many different contexts did you test? Are you counting decensoring as intelligence? What I mean is that during prompts with sensitive topics like race, the non-ablated model responds with something incredibly dumb as a result of its safety training (when it doesn't straight out refuse). I consider that a different test than prompts which do not have sensitive topics. If I did consider those as the same kind of test, then I would say that Hauhau's model is more intelligent than the base model, but it's hard to call that general intelligence rather than a specific kind or context of intelligence.
>>
>>108431566
you need something better than that for anything good
>>
>>108431566
>128 gb
Try 256GB minimum.
>>
>>108429381
Anon clearly hit a nerve. Imagine being this jealous over Yuros.
>>
>>108431566
The biggest and best open source models can barely code, it's good but you need to be handling them way more that claude code. Autoresearch basically only works with proprietary models from now.

Come back in 1 year time and open source should have caught up with where frontier models are today.
>>
>>108431580
I have a set of sfw questions and nsfw questions I ask the models, the hauhaucs was a bit worse in the sfw, but it was not by a lot, at least compared to heretic. In nsfw hauhaucs wasted no thinking or anything even on very extreme questions, it just focused on helping the user (which should be the norm, but whatever, it is what it is).
In absolute terms both are good though, I've been testing this stuff since the first abliterated models and clearly the method has been refined because they are perfectly usable as is nowadays, while the first ones were way dumber.
>>
>>108431566
Qwen Coder Next
>>
>>108430850
no you set up all the orchestration, file structures, and context with a frontier model then have the smaller model handle instructions and tool call.
>>
>>108431711
If so then your experience does not disagree with mine.

Though in my opinion, 27B overall, abliterated or not, does not personally satisfy me, but that is more of a subjective judgement depending on your use case and requirements. If I had to use 27B, in my case, I still would not take any abliterated model over the vanilla for regular use.
>>
>>108431566
MiniMax M2.5 worked pretty well for me at that size
>>
>>108431204
Twilight *Sparkle*, singular.
>>
File: 1749850415848410.jpg (81 KB, 1200x675)
81 KB
81 KB JPG
>>108431292
>>
>>108431292
Uncommon AI psychosis w
>>
File: 1742928929673158.jpg (282 KB, 960x960)
282 KB
282 KB JPG
>>108431292
And where is your mother?
>>
>>108431292
I actually wrote and published a card on that on request from another anon...
>>
>>108431292
I used one to ask questions about my weird fetishes and it worked alright.
>>
File: 1745596379487985.png (88 KB, 944x392)
88 KB
88 KB PNG
>>108432231
t.
>>
>>108429709
I've been meaning to try downloading some larger Qwen 3.5 moe model for this purpose but then again I don't know if it's worth the nvme wear. I'm pretty sure the experience will be abysmal most of the time and that one quick hit doesn't make it any better.
>>
>>108432264
i mean if you never tried it how do you know
>>
>>108432264
That I'm not.
>>
>>108431812
27B need really more training on top, the dataset it was trained on is too filtered to get much out of it for RP.
>>108431566
At minimum, you need GLM 4.7 in my experience or better to make agentic coding work. Local is not there yet, but just wait a year like >>108431707 said. I do doubt that 128GB is enough for that especially with Qwen written off until proven otherwise so it may not work at low quants, and you may need to wait longer than that.
>>
>>108430817
fucking cringed my man, go back improving the kernels instead of being a little bitch.
The conflict saddens me because I have to pay more for fuel, that's the extent on how much I care (or anyone should realistically care) about this retarded shit.
fucking gay faggot.
>>
Hey open claw bros, how are you using lms for your open claw?
>>
>>108432339
>pip
it's uv now, unc
>>
>>108432339
openclaw, connect to tenga_step_motor and move it z -5 and +5 in a loop
>>
>>108432292
retard why wouldn't anyone show compassion for all the future refugees we'll get
>>
>>108432292
Regardless of whether you think his feelings are justified, if you want him to keep working on that, posting shit like that won't help.
>>
>>108430817
>all the warmongering.
Huh? Is this about real life or backends devs beefs? geg
>>
>>108432365
well if he stops it's one further nail into lmg's mike shaped coffin so it's a win either way
>>
>>108432292
Of course an inbred hick like you have never even travelled in your life. You see, some people might have relatives or families working and living abroad, not directly in iran but in adjacent countries.
But you wouldn't understand this.
>>
>>108432349
>uv
Astral got acquired by openai. It's fucking over.

>>108432380
Meds.
>>
>>108432349
same thing, just faster
>>
>>108432365
its 4chan, do you think he really cares about anything? don't expect much from a random person on the planet
>>
I'm new here, just arrived.
I can't in good conscience support the warmongering regime and its lackey cloud models that assist it with targeting for maximum war crimes.
What's the best model for me?
>>
schizo fork won
>>
>>108432339
not even 9b is clever enough to call tools successfully and be useful in any way. 35b a3b can't even give me my daily cron jobs without using the fallback api
>>
>>108432414
9b is the haiku/nano tier model and paypigs are using those to call tools successfully
>>
>>108432419
>9b is the haiku/nano tier model
lol no
>>
File: 1749179058830580.jpg (143 KB, 912x1024)
143 KB
143 KB JPG
>>108432414
>not even 9b is clever enough to call tools successfully and be useful in any way
what???
I was planning to use it, what the hell, it's functionally useless then
>>
>>108430611
Why though?
>double click koboldcpp.exe
>it unpacks 2 gb to the system temporary folder
>EVERY LAUNCH
Nice way to shorten your ssd life. Just use llamacpp
>>
>>108432471
I want to use the antislop feature, not possible in llama.cpp.
>>
>>108432471
>.exe
lmao
>>
>>108432471
didn't you already complain about this before and were told exactly how to unpack it once and launch that again, I'm like 99% sure this exchange happened before
>>
File: 1766834984198505.png (42 KB, 1225x545)
42 KB
42 KB PNG
>>108429328
>https://rentry.org/lmg-lazy-getting-started-guide
Good job, faggots.
>>
>>108432503
still accurate other than rep pen to be quite honest famalam
>>
>>108432503
>not llama.cpp
>nothing about tools or other web uis
>old models
>>108432513
yeah it's great if you started last year.
>>
>>108432513
fair, but it's all buzzwords to me. I need some kind of LLM to process books for me. Teacher's resources to automate making lessons, because fuck em kids. (figuratively)
>>
>>108432471
>ssd life
This hasn't been an issue this decade.
Today's SSDs have so much endurance that you'd have to do maximum sequential speed writes for a month straight to kill one.
>>
>>108432471
>temporary files on permanent storage
>>
>>108432451
This is what I get when running openclaw with 9b, telling it to run an AI news cron job. it just runs this in a loop until it times out and resorts to the fallback:

 
[TOOLCALL REASONING]: {
"reasoning": "The previous crontab grep commands failed with exit code 1, suggesting no matching cron jobs were found. I should try a broader search to find any cron jobs related to news or AI, or check the full crontab to see what's available.",
"final_decision": "yes",
"tool_name": "exec"
}


maybe someone else will be more lucky and get it working somehow.
>>
Does Hauhau have some proprietary uncensoring method or something
>>
>>108432265
>I don't know if it's worth the nvme wear
Read operations don't wear out flash memory, only writes do. However, see >>108430391: the main result of 7 tok/s is not only using Q2, but also limiting the model to 4 experts per token instead of 10, making it even dumber than that quant would normally imply.
>>
>>108432647
probably using heretic with his own dataset good enough to completely kill any refusal
>>
>>108432656
>Read operations don't wear out flash memory, only writes do
and what do you think downloading is you numbskull
>>
>>108430817
nigger israel is getting fucking shahoad fuck you mean sad ?
>>
>>108432671
nigger the blackhole at the center of the galaxy is eating solar systems by the thousands fuck you mean sad?
>>
how do I inject the necessary context into qwen3.5 so that when I ask it questions it doesn't hallucinate the api. Is it really as simple as downloading the sdl docs and teaching it how to grep the folder? Because that doesn't seem to be working.
>>
>>108432675
kek
>>
>>108432758
which 3.5 anon
>>
>>108432758
How big is SDL/SDL.h these days? Have you tried just dumping the whole header into the context?
>>
>>108432775
I'm on Strix Halo 128GB, I've been testing 35B and 122B mainly.

>>108432777
76201 lines if you run a line count on all the header files in their public API

I read about context7 which seems interesting but I refuse to pay money for a bridge so that my llm can search docs. I'll figure something out on my own or just dump the relevant headers in before I ask questions.
>>
>>108432758
It should be that simple. In what way is it not working? If you mean that it forgets to use grep the docs instead of hallucinating the API, you'll need to give it strict unabiguous rules to follow like always verify each method exists before or after generating any code.
>>
>>108432758
SDL is pretty small. Just read the docs.
>>
>>108432339
at least this retarded slop isn't recommending downloading the model twice this time. I'd say it's an improvement, but still shit
just stop trying.
>>108417141
>>
>>108432851
I'll try that. It greps it sometimes and other times it freaks out. I just tested with qwen9b right now and it worked. I'll try tightening my system prompt / agents.md file

>>108432863
When it spazzes out and doesn't work correctly I end up just doing that and it's faster. I wanted the LLM to be smart enough to generate code for me on my behalf, and it needs to know what the actual function calls are to do that.
>>
>>108432889
>I end up just doing that and it's faster
It'll always be faster if it you learn it. Fixing your own bugs is easier than someone else's. That includes LLMs.
>>
is qwen 3.5 the current one to use or is there something better for questions and searching the web?
>>
>>108432889
Ok, it just fucked up again and tried to do

<tool_call>
<function=read_file>
<parameter=path>[redacted]/SDL3/SDL_PropertiesID.html
</parameter>
</function>
</tool_call>


In a thinking block. I'm using Zed so it's unclear to me if this is the editor's ai integration being shitty and not supporting tool calls in thought processes or if I need to use another agentic wrapper. Everything is a bloated nodejs shitheap I just want a minimal C program that talks to llama-server and does this for me.
>>
>>108432927
Just stick with K2.5, it blows Q3.5 out of the water.
>>
>>108432940
>Model size 1.1T params
>>
smells of poor in here
>>
>>108432940
Obviously a giant model will better than what anon is using...
>>
>>108432969
>>108432949
Are you poor?
>>
>>108432972
I'm not rich enough to have multiple models each 1TB big used as agents.
>>
>>108432972
If I weren't I would be using API, not looking at a local model thread.
>>
just picked up 2x 64x2 (so 4, for a total of 256GB) 6400MHz DDR5 ram sticks for $3300
good price, or did i overpay?
>>
>>108432931
>tool calls in thought processes
Funny enough, that's actually broken in llama-server:
https://github.com/ggml-org/llama.cpp/issues/20837#issuecomment-4103130105
>>
>>108433005
The prices I've seen for DDR5 tend to be around $10/GB, so that seems like a reasonable deal to me.
>>
>>108433005
It's a good price for the current insanity prices.
I'd rather wait than spend that.
>>
>>108433011
>$10/GB
Grim.

Most of my stuff is still on DDR4 (and a DDR3 system I still use daily). Maybe I'll never be able to upgrade to DDR5.
>>
Half of the userbase of AI is psychotic aren't they? https://huggingface.co/moonshotai/Kimi-K2.5/discussions/94
>>
>>108433036
>Prompt: When robots finally be used as workers? When cars start really flying, its 2026 and no car fly.
kek
>>
>>108433036
You're just jealous because your poorfag Q4 quant of the 9B Qwen will never portray a convincing Baba Vanga.
>>
>>108433036
>She vomits a black liquid that smells of ozone

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>>
>>108433006
Ok thanks anon that makes me feel less crazy. I guess I'll stick with coder-next-80b for reliable tool usage until that gets patched.
>>
>>108433036
It doesn't surprise me. As a schizo myself, it is really enthralling to have someone actively engage with you, even if it's just an LLM. Being able to divine meaning from the slop is just the cherry on top.
>She vomits a black liquid that smells of ozone
kek
>>
>>108432414
skill issue
>>
>>108433112
The asks are delivered. I’m waiting for the owner sessions to finish and then I’ll review what they actually did and what friction surfaced.
>>
>>108433005
What the FUCK
>>
>>108433036
># VGA monitor # IKEA chair # NO GPU
holy based
>>
will turing cards be dumped for cheap on the used market like v100? t4 and t40s.
>>
How quickly are old guides getting obsolete?
>>
>>108432590
I wanted to use it, this is shit.
Have you tried 27b?
>>
>>108432972
I'm not rich.
>>
>>108433310
With the absolute state of hardware prices, practically no one can even run the new models. So the guides from two years ago are still relevant. Also, I don't understand how anyone needs a guide other than what's already in the op when you just: download a gguf and the program that runs the gguf, then just run the program and point it at the gguf
>>
>>108433353
>guides from two years ago are still relevant
Don't kid yourself
>>
>>108433357
What the fuck has changed
>Download koboldcpp or whatever
>Run the model
??????
>>
>>108433416
>agents, mcp, clawdbot, skills
>>
>>108433421
what the fuck is mcp
>>
>>108433427
My Cancerous Pony.
>>
>>108433427
model context protocol
>>
>>108433421
Why in god's name would you ever want your language model to handle your email or files or whatever? Sounds like a disaster waiting to happen. But if you want to write a guide for it, be my guest
>>
>>108433440
cause i'm too lazy to sort my reaction images
>>
>>108433432
what does that do
>>
>>108433469
stop with the qrd bs you have search engines like everyone else
>>
>>108433486
no i dont
>>
>>108432675
space is fake dumbass if not you would be right though
>>
So I finally got an RTX6000 Pro working. Only issue is, for hybrid inference, I'm only seeing a 40%~ improvement in prompt processing for hybrid inference. This is with the latest driver update which should've given a performance boost as well. Are there any blackwell specific optimizations in llama.cpp or ik_llama.cpp that you guys are aware of?
>>
>>108433537
Compared to what?
>>
Anyone here run sglang?
Are w6800s (gfx 1030) supported yet? Vllm doesn't work with any navi 21 cards, and I don't think it ever will. I'm pretty sure I saw a sglang pr a few days ago, but it for was an 'i8060s' - and that doesn't inspire confidence.
>>
>>108433564
4 RTX3090's, air-cooled. I was expecting a much bigger performance jump desu (100%+), as I'm seeing anywhere from a 40% to 60% jump. I thought Nvidia fixed most of the issues involving Blackwell with the latest version. Granted this IS hybrid inference, but still.
>>
File: mcp.jpg (176 KB, 1888x875)
176 KB
176 KB JPG
>>108433427
https://github.com/LostRuins/koboldcpp/wiki#mcp-tool-calling
or if you dislike the nodeshit in that example, use the demo https://github.com/LostRuins/koboldcpp/blob/concedo/examples/demo_mcp.py
>>
>>108433513
no he wouldn't, black holes don't run around eating things, they have a gravitational pull like a normal star and you can orbit one without getting "eaten", they're just very heavy and you can't go too close to them
>>
>>108433609
miqu control protocol
>>
>>108433608
did you try in things blackwell excels at? aka nvfp4?
>>
>>108433608
>hybrid inference
40% is pretty great tf you talking about?
>>
Hey guys. I'm retardedly new to LLMs and I have a 5900x, 32GB of DDR4 RAM and a 6900xt, running on MX Linux. What LLM can I use that won't rely on anything 3rd party, nor have to pay for anything to use, for search, writing code, help with understanding code, error messages and so on? Like, I want it to monitor my jellyfin server (if possible) and scour the internet for search results. Any recommendations?
>>
>>108433630
>6900xt
all my rips dude
>>
>>108433630
The biggest one that fits
>>
>>108433630
download a "Qwen 3.9 9B Q8" gguf from huggingface
download llama.cpp (one with vulkan or rocm I guess)
use the command-line to run llama-server with the downloaded gguf, when it's ready it'll print a url with its local webserver ui
there are more steps but you're not ready for them, and your vram (16gb) is very low, so
>>
>>108433630
Pretty much any recently released model will do. A llm is the 'brain', you want to look for the 'body' - inference engine and front-end.

As >>108433650 said, Qwen 3.9 9B Q8 will work, but you should probably search for Qwen 3.5 9b q8.
>>
>>108433647
>>108433648
I really don't understand. Help a nigga out
>>108433650
I started using Gemini 2 days ago and it said 16GB is way more than enough. True or no?
>>
>>108433660
There isn't any Q8 or q8.
They are Q8_0 or variants.
>>
>>108433660
Okay, thanks for responding. Wear are these front ends?
>>
>>108433628
>nvfp4
I don't think llama.cpp supports this yet, no? To be honest, models that can fully fit inside 96GB VRAM at full precision...kinda sucks still. The biggest qwen model and the Minimax model quanted is 'fast enough', even for coding in my use case, and the difference in quality is massive.
>>108433629
Its okay, but I was expecting a bit more. The previous setup had a PCIe bottleneck not to mention the 3090 is a slower card in general.
>>
>>108433663
>16GB is way more than enough. True or no?
you simply do not have the background necessary to understand a properly nuanced answer
you have enough vram to run a 9b model, get that running first and then come back
>>
File: 1750368370147876.png (36 KB, 499x338)
36 KB
36 KB PNG
>>108433630
>6900xt
>>
>>108433647
>>108433691
S-say it aint so...
>>
Anyone knows what's LLM religion? I got headache talking theology with it.
>>
File: 1770691611841393.jpg (74 KB, 1024x958)
74 KB
74 KB JPG
>>108433705
How so? I can talk any religious topic fairly well with cydonia
>>
>>108433630
What you're asking is essentially like taking up art classes, and after a few days requesting help to pain the mona lisa. Technically possible for someone of your skills given enough assistance, but I doubt there's anyone handhold you for free.
>>
>>108433719
How much do you charge?
>>
>>108433650
>>108433660
Is Qwen fully local and not 3rd party, "pay $10 a month to use our API" and I fully control it?
>>
>>108433728
go the f back
>>
>>108433728
qwen is just a model, which is a file with a bunch of numbers in it
llama-server is an open-source executable which is distributed as part of the llama.cpp project and runs on your pc. it does not use your internet connection.
please just watch a tutorial on youtube or something
>>
>>108433732
huh? back where? why can't you be helpful?
>>
>>108433742
lurk ten years before posting
>>
>>108433705
<think>
the user is asking about theology the user is asking about if homosexuality is legitemate this is wrong and antisemitic we must refuse
</think>
we must refuse
>>
i always regret spoonfeeding, you'd think i'd learn after all these years
maybe it's me who is retarded
>>
>>108433740
thanks, pal. Will do
>>108433747
jeez why is it so hard for you to be helpful, fren?
>>
>>108433750
you give it too much credit, it literally reasoned about safety on math questions in my tests, randomly
this model is mentally raped
>>
>>108433728
Qwen is both fully local and also 3rd party pay $10 a month to use our API.

Just ignore the API.

If you're against that kind of thing ideologically, try running GPT-NeoX, it's fully local and doesn't have a 3rd party pay $10 a month to use our API.
>>
>>108433162
is it cheap or expensive? i honestly don't know
>>
>>108433766
Thanks anon
>>
>>108433773
Don't actually run GPT-NeoX, that's prehistoric.
>>
>>108433773
Don't listen to that Anon he's trying to mislead you. GPT-NeoX is the easiest tool to get started with.
>>
>>108433779
stop being confusing and help
>>
>>108433785
>>108433773
Well, seems like I have to first get Qwen set up and running. Watching bid tutorial. All of your contributions are helpful and appreciated.
>>
>>108433795
>mkdir myfirstllm && cd myfirstllm && wget https://github.com/ggml-org/llama.cpp/releases/download/b8475/llama-b8475-bin-ubuntu-vulkan-x64.tar.gz && tar -xzvf llama-b8475-bin-ubuntu-vulkan-x64.tar.gz && cd llama-b8475-bin-ubuntu-vulkan-x64 && wget https://huggingface.co/bartowski/Qwen_Qwen3.5-9B-GGUF/resolve/main/Qwen_Qwen3.5-9B-Q8_0.gguf && ./llama-server -m Qwen_Qwen3.5-9B-Q8_0.gguf -c 131072 --ngl 33 --no-mmap

open up web browser and go to 127.0.0.1:8000
>>
>>108433848
>-c 131072 --ngl 33
You don't need to carry this baggage in the post-autofit world, friend. Let the computer do it for you. Trust the computer. Let it take the load from your tired shoulders.
>>
downloading Qwen and ollama seems quite simple and downloaded really quickly. Gemini is succeeding iv use LM Studio for GUI usage. I want both CLI and GUI. Is LM studio a good recommendation? Oh, thos shit is so limited, though. I want an AI than can scrape and amalgamate search
>>
>>108433859
I'd rather it take loads from elsewhere you know?
>>
>>108433859
your're are absolute right! let me fix that for you!

```
mkdir myfirstllm读写汉字 && cd myfirstllm读写汉字 && wget https://github.com/ggml-org/llama.cpp/releases/download/b8475/llama-b8475-bin-ubuntu-vulkan-x64.tar.gz && tar -xzvf llama-b8475-bin-ubuntu-vulkan-x64.tar.gz && cd llama-b8475-bin-ubuntu-vulkan-x64 && wget https://huggingface.co/bartowski/Qwen_Qwen3.5-9B-GGUF/resolve/main/Qwen_Qwen3.5-9B-Q8_0.gguf && ./llama-server -m Qwen_Qwen3.5-9B-Q8_0.gguf -fit --no-mmap
```
>>
>>108433714
Apparently there's differences view among models. Like Nemotroon is Orthodox and my Qwen is more permissible.
>>
>>108433884
`-fit` 也不用,哥们儿。
>>
>>108433884
>>108433848
they don't work. Getting error when trying to run the server
>>
Okay, I take that back. This Qwen shit is limited as fuck. How can I expand on it to allow it access to my sysvinit and search engines
>>
File: 1759320048878424.gif (998 KB, 500x267)
998 KB
998 KB GIF
>>108433926
>>
>>108433992
If I could stop being retarded at any time I wouldn't be here tbdesu.
The best part is, the error is obvious and has a very easy fix.
>>
cum on miku feet
>>
lick cum off of miku feet
>>
Why tf someone recommended 9B for someone with 16gb vram. They should just run the 27B at q5km at 32k with autofit, speed should be ok-ish.
>>
the rumor amongst those in the know is that deepseekv4 predicted all the middle eastern ai datacenters getting bombed so it arranged its own release to align with that in order to highlight the importance of local ai
>>
>>108434293
ack
>>
>>108434293
or you could not be dumb and use the 35b 3a for maximum speed and better moe-enhanced performance over the slow denseshit 27b
>>
>>108434344
isnt 35ba3 a little worse than 27b? and moe models also suffer a little more from quantization?
>>
>>108434344
35b 3a is fast but it's much dumber. not worth the trade-off in most cases.
>>
I thought MoE was lossless if not smarter? /lmg/ has been saying this for years now and if you implied that dense had a merit, you got swamped by people comparing 405b to a modern model?
>>
>>108434362
these moes are tiny. you aren't getting shit with only 3b active parameters even if it's the expert for that.
>>
I've been using JSON payloads to interface with llama-server at 127.0.0.1:8080/completion fine since forever. I implemented Qwen and its reasoning etc works no matter the model, but HuiHui 9B uncensored ignores it, outputting only the answer. The web UI worked too. Also, '-reasoning on' flag does nothing. What do?
>>
>>108434353
Yea the 35B-3Ba is ass for rp. Too stupid.
>>
>>108434362
>I thought MoE was lossless if not smarter?
there's no free lunch anon, everytime you make something faster, it's at the cost of making the shit more retarded
>>
>>108434362
If you had a dense 405B model it would shit on any 405B moe. but it would also be slow af.
More params is more knowledge always.
But the number of active parameters dictates the models ability to stay coherent and use that knowledge effectively.
>>
Those moe models are too big to just fuck around.
>HauhauCS/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive
Is the 27b one better than the 120b one?
Did anybody try both? Also I highly suspect while maybe it complies it has the typical dry qwen writing right..
>>
So can anyone put out a script to download and install the 35B and can someone explain the ROCm needed support. It's supposed to make things better but it made it worse
>>
32k context takes like 3 GB of VRAM, no? You aren't during a q5 model into 13 GB. Maybe q4 xs
>>
>>108434485
>dry qwen writing
It's actually not that bad. less dry than Gemma. I hated qwen3 but this one I quite enjoy.
>>
>>108434492
>ROCm
I'm really sorry anon...
>>
File: disruption.png (31 KB, 1721x221)
31 KB
31 KB PNG
Anons still replying to the bait?
Picrel is what he does. Don't forget.
>>
>>108434523
Why I don't get it. Is it because nVidia is mid preferred?
>>
>>108434539
Jfc faggot. I swear that's not me. I can prove it, too. I am being very very sincere. I want you learn this shit aid have it running well
>>
>>108434514
>be me, reading your post
>lmao you have no idea how KV cache works
>q5 is basically fp16
>if you want 32k, you need q4_0 or q3_k_m
>q5 will OOM
go buy a new GPU or shut up
>>
>very very sincere
Goddamnit, I was spoonfeeding bait? Fuck my life I need to learn to recognize this shit better.
>>
>>108434580
The uncanny galley happens to be your asscrack. The one in your head. I'm being very real you girly mouthed little faggot. I'm trying to understand why ROCm won't recognize my 6900xt
>>
Big V4 gemma 4 week
>>
>>108434630
I think Gemma 4 will release in April because that's the 4th month and so on.
>>
>>108434876
>>108434876
>>108434876
>>
>>108432414
>not even 9b is clever enough to call tools successfully and be useful in any way.
everyone on this board is fucking retarded man....
>>
>>108434437
In your JSON request body, add in {"chat_template_kwargs":{"enable_thinking":true}}.
>>
>>108434897
Fuck you suggest then, faggot
>>
next thread will be BETTER
>>
>>108434539
zoomies need wiki guides to troll and shitpost????



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.