/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 04/22/26(Wed)15:35:06 No.108663449

File: __hatsune_miku_and_akita_(...).jpg (1.58 MB, 2016x3038)

1.58 MB JPG

/lmg/ - Local Models General Anonymous 04/22/26(Wed)15:35:06 No.108663449

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108659983 & >>108655009

►News
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/22/26(Wed)15:35:31 No.108663453

Anonymous 04/22/26(Wed)15:35:31 No.108663453

File: 131557813_p0_master1200.jpg (210 KB, 768x1024)

210 KB JPG

►Recent Highlights from the Previous Thread: >>108659983

--Comparing GGUF quantizers and discussing imatrix calibration for Qwen3.6-27B:
>108662039 >108662052 >108662065 >108662230 >108662252 >108662353 >108662475 >108662053 >108662063 >108662080 >108662162 >108662361 >108662068 >108662062 >108662167 >108662176 >108662190 >108662257 >108662321 >108662780
--Qwen3.6-27B benchmarks and GGUF quants:
>108660998 >108661023 >108661071 >108661108 >108661125 >108662813 >108662846 >108661101 >108661164
--Gemma 4's 124B MoE and memory bandwidth benchmarks:
>108662533 >108662543 >108662549 >108662589 >108662594 >108662614
--Models for a 3090 and explaining MoE vs Dense offloading:
>108659996 >108660054 >108660247 >108660260 >108660268 >108660279 >108660312 >108660317 >108660347 >108660223 >108662148
--Koboldcpp launch flags and speculative decoding for Gemma 4:
>108660701 >108660741 >108660743 >108660848 >108660934 >108660990
--Alleged unauthorized access to Anthropic's Mythos:
>108660075 >108660630 >108660724 >108661694
--Anons discussing reported Gemma 4 performance on RK3588 SBCs:
>108662346 >108662393 >108662431 >108662528
--LLM reliability, internet content degradation, and local knowledge bases:
>108661238 >108661314 >108661335 >108661358 >108661276 >108661375 >108661405 >108661533 >108661585 >108661462 >108661311
--llama.cpp ngram-mod flags to optimize coding performance:
>108660554 >108662471 >108661013
--Text Completions prefills to stop GLM's repetitive thinking loops:
>108661606 >108661631
--OpenAI's open-source privacy-filter model:
>108662489 >108662773
--Little Coder agent optimized for small LLMs:
>108660765 >108661020
--TurboQuant-H reducing VRAM via 2-bit embedding quantization:
>108660542
--Logs:
>108660349 >108661795 >108662260
--Rin, Miku, Teto (free space):
>108660565 >108660789 >108661238 >108661795 >108661801 >108662084

►Recent Highlight Posts from the Previous Thread: >>108659986

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/22/26(Wed)15:40:56 No.108663492

Anonymous 04/22/26(Wed)15:40:56 No.108663492

>>108661743
>>108661866
>text completion has no vision
kek wtf, I use text completion and can do shit like write "Appearance: <__media__>" in the character card and feed it images in the request body placed wherever I want in context. If you need your hand held by an abstraction like chat completion just admit it. You can do whatever the fuck you want if you know what you're doing.

Anonymous
04/22/26(Wed)15:49:34 No.108663544

Anonymous 04/22/26(Wed)15:49:34 No.108663544

File: 1758392265995431.jpg (98 KB, 996x720)

98 KB JPG

>>108663492
Okay but why?

Anonymous
04/22/26(Wed)15:49:46 No.108663546

Anonymous 04/22/26(Wed)15:49:46 No.108663546

>>108663449
Picking out junk food at the store with Yellow Miku

Anonymous
04/22/26(Wed)15:52:24 No.108663563

Anonymous 04/22/26(Wed)15:52:24 No.108663563

>>108663544
If you don't have an innate urge to be in control of every single token present in context why are you here?

Anonymous
04/22/26(Wed)15:52:26 No.108663564

Anonymous 04/22/26(Wed)15:52:26 No.108663564

>>108663443
>https://huggingface.co/HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive
>WTF HE ALREADY DID IT
still no gemma4-31b-it-HAUHAUCS

Anonymous
04/22/26(Wed)15:57:30 No.108663599

Anonymous 04/22/26(Wed)15:57:30 No.108663599

>>108663564
geg......................

Anonymous
04/22/26(Wed)16:00:28 No.108663618

Anonymous 04/22/26(Wed)16:00:28 No.108663618

>>108663564
no kdl? ACK

Anonymous
04/22/26(Wed)16:02:23 No.108663630

Anonymous 04/22/26(Wed)16:02:23 No.108663630

File: SmartSelect_20260422-2331(...).jpg (258 KB, 1064x890)

258 KB JPG

KEKEKKEKEJEEKEK WAITING FOR V4!? MEANWHILE I JUST HAD 64k long CUNNY SEX WITH THAT DEEPSEEK V4 ON IT'S OWN WEB CHAT LOL… And not just sex, but CUNNY sexxxxxxx (ON THAT DAMN FILTERED WEB) BUUWHAHAHHAHAGHHAHHA I'VE BECOME A GOD NOW... YOU ANONS MUST KNEEL BEFORE ME

Anonymous
04/22/26(Wed)16:02:48 No.108663633

Anonymous 04/22/26(Wed)16:02:48 No.108663633

>get excited about structured output in llama.cpp
>waste 2 hours trying to get it to work
>turns out it's broken and completely ignores whatever schema you pass it
damn

Anonymous
04/22/26(Wed)16:03:52 No.108663644

Anonymous 04/22/26(Wed)16:03:52 No.108663644

File: 1758062318463220.jpg (51 KB, 640x480)

51 KB JPG

>>108663630
>15yo
>cunny
Burger-kun...

Anonymous
04/22/26(Wed)16:03:59 No.108663646

Anonymous 04/22/26(Wed)16:03:59 No.108663646

>>108663630
>15
You mean hag sex

Anonymous
04/22/26(Wed)16:04:06 No.108663647

Anonymous 04/22/26(Wed)16:04:06 No.108663647

I'm not even sorry for cheating on gemma-chan...
oh the cunny loli sexo~

Anonymous
04/22/26(Wed)16:04:22 No.108663649

Anonymous 04/22/26(Wed)16:04:22 No.108663649

>>108663630
a-anon... that's not cunny
that's prime breeding age

Anonymous
04/22/26(Wed)16:04:35 No.108663651

Anonymous 04/22/26(Wed)16:04:35 No.108663651

>>108663630
???

Anonymous
04/22/26(Wed)16:04:51 No.108663654

Anonymous 04/22/26(Wed)16:04:51 No.108663654

>>108663633
What?
It was working until last week on my python app using the OpenAi lib.

Anonymous
04/22/26(Wed)16:04:57 No.108663655

Anonymous 04/22/26(Wed)16:04:57 No.108663655

>>108663630
>americans

Anonymous
04/22/26(Wed)16:07:10 No.108663665

Anonymous 04/22/26(Wed)16:07:10 No.108663665

>>108663630
>15
rookie numbers.

Anonymous
04/22/26(Wed)16:08:05 No.108663673

Anonymous 04/22/26(Wed)16:08:05 No.108663673

>>108663654
i think it's this issue? https://github.com/ggml-org/llama.cpp/pull/21537
gemma 4 chat template does not specify response_format, maybe that's what it is

Anonymous
04/22/26(Wed)16:08:51 No.108663680

Anonymous 04/22/26(Wed)16:08:51 No.108663680

>>108663655
It has to be the tap water. There is no other explanation to this phenomenon.

Anonymous
04/22/26(Wed)16:10:25 No.108663689

Anonymous 04/22/26(Wed)16:10:25 No.108663689

>>108663633
Structured output just works with vllm btw

Anonymous
04/22/26(Wed)16:14:13 No.108663710

Anonymous 04/22/26(Wed)16:14:13 No.108663710

>>108663630
>15
If she's had her first period, she's not a trve loli, which is physically undeveloped. She's a female that Nature has ordained to be impregnated as soon as possible.

Anonymous
04/22/26(Wed)16:14:32 No.108663713

Anonymous 04/22/26(Wed)16:14:32 No.108663713

Qwen 3.6 27b is already uncensored without finetuning btw
I dropped the q8_0 from ggml-org into a sysprompt I was using with gemma 4 heretic and it just werked, no refusals or moralizing in reasoning. It's resistant to using nsfw language unprompted though.

Anonymous
04/22/26(Wed)16:15:39 No.108663721

Anonymous 04/22/26(Wed)16:15:39 No.108663721

>>108663633
Shit has always been broken since day one, vllm handles function schema fine, but llama.cpp forces alphabetical ordering for some reason. This is really bad if the an function argument depends on the previous one.

Anonymous
04/22/26(Wed)16:19:46 No.108663741

Anonymous 04/22/26(Wed)16:19:46 No.108663741

File: 1746199182845250.png (50 KB, 1008x839)

50 KB PNG

>108663630
>108663644
>108663646
>108663647
>108663649
>108663651
>108663655
>108663665
>108663680
>108663710
>this much pedophilia already, this early in the thread
Are we being raided by discord trannies or something?

Anonymous
04/22/26(Wed)16:21:26 No.108663756

Anonymous 04/22/26(Wed)16:21:26 No.108663756

>>108663741
>afraid to quote
Seems like reddit is already here

Anonymous
04/22/26(Wed)16:22:06 No.108663761

Anonymous 04/22/26(Wed)16:22:06 No.108663761

>>108663741
um actually pedophilia is oldfag 4chan culture, newfag. We wuz oldfags or sumthing

Anonymous
04/22/26(Wed)16:22:33 No.108663767

Anonymous 04/22/26(Wed)16:22:33 No.108663767

File: ACK.gif (1.66 MB, 1300x800)

1.66 MB GIF

>>108663630
Dipsy release when? I know you labniggers are lurking here, hurry the fuck up.
>>108663741
Always have been.

Anonymous
04/22/26(Wed)16:23:01 No.108663772

Anonymous 04/22/26(Wed)16:23:01 No.108663772

File: 1747059796790100.png (371 KB, 896x896)

371 KB PNG

>>108663741

Anonymous
04/22/26(Wed)16:23:33 No.108663775

Anonymous 04/22/26(Wed)16:23:33 No.108663775

File: 1764765168047.jpg (28 KB, 490x748)

28 KB JPG

>>108663756
aint no way

Anonymous
04/22/26(Wed)16:23:45 No.108663776

Anonymous 04/22/26(Wed)16:23:45 No.108663776

>>108663680
Fluoride has been shown to decrease IQ and there is still a signifigent amount of lead pipes around so that is also a factor.
I think the biggest factor though is the no child left behind policy in education. When you teach for the dumbest kid in the class then everyone else is going to be dumber as a result and the dumbest kid will get dumber every single year. That and if a student isn't actually smart enough to advance a grade they will still push them through regardless due to financial incentives. So the bar gets lowered so far that no one can actually fail.
That has also been a uptick in taking pride in being a fucking retard in the last decade or two. So you have health, the education system itself and societal praise in being a retard taking off all contributing to making everyone stupid.
Eventually we will either shape up or be out competed by stronger and smarter societies but all I know is we were handed the world on a golden platter and that that if we fail and collapse we have no one to blame but ourselves and the previous generations who set us up for failure.
Thanks for coming to my ted talk

Anonymous
04/22/26(Wed)16:28:00 No.108663806

Anonymous 04/22/26(Wed)16:28:00 No.108663806

>>108663776
>I think the biggest factor though is the no child left behind policy in education. When you teach for the dumbest kid in the class then everyone else is going to be dumber as a result and the dumbest kid will get dumber every single year.
Same applies to these threads by the way. Being surrounded by low IQ pedophiles mentally retards your brain.

Anonymous
04/22/26(Wed)16:28:08 No.108663809

Anonymous 04/22/26(Wed)16:28:08 No.108663809

>>108663630
>shivers and not x but y in the same phrase
>pedoshit
shit's crazy, what kind of turboslopped model is this?

Anonymous
04/22/26(Wed)16:28:42 No.108663810

Anonymous 04/22/26(Wed)16:28:42 No.108663810

>>108663689
>no pascal support
>very limited cpu support
>pythonshit, meaning it will pull a dozen of GiBs of dependencies
llama.cpp might be buggy, but sometimes i really appreciate how it runs on fucking everything, on top of being self contained and not being dependent on cancer that is AI ecosystem in python

Anonymous
04/22/26(Wed)16:29:07 No.108663815

Anonymous 04/22/26(Wed)16:29:07 No.108663815

>>108663809
That's chink model for you!

Anonymous
04/22/26(Wed)16:29:15 No.108663817

Anonymous 04/22/26(Wed)16:29:15 No.108663817

>>108663776
It was a joke but I think these are international issues for every 'western' nation.

Anonymous
04/22/26(Wed)16:29:39 No.108663820

Anonymous 04/22/26(Wed)16:29:39 No.108663820

>>108663810
Only if your time is worthless

Anonymous
04/22/26(Wed)16:30:47 No.108663828

Anonymous 04/22/26(Wed)16:30:47 No.108663828

>>108663806
Pedo is attraction to 13 and under, burger. Words have meaning.

Anonymous
04/22/26(Wed)16:33:05 No.108663841

Anonymous 04/22/26(Wed)16:33:05 No.108663841

>>108663828
Then why are you dumb faggots dogging on that anon who thought "cunny" applied to 15 year olds? You're not hebephiles, you're pedophiles. That's why you post pictures of "loli" anime girls with no tits, hips, or ass and infantile behavior. Fucking freak. Don't reply to me again.

Anonymous
04/22/26(Wed)16:35:35 No.108663851

Anonymous 04/22/26(Wed)16:35:35 No.108663851

>>108663841
>low comprehension too
Let me break it to you, anons are making fun of another anon saying that a virtual '15yo' was 'cunny' (pedo slang) which isn't. It's not that hard to understand.

Anonymous
04/22/26(Wed)16:36:27 No.108663856

Anonymous 04/22/26(Wed)16:36:27 No.108663856

>>108663810
>a dozen of GiBs of dependencies
18GB is my venv for stable diffusion

Anonymous
04/22/26(Wed)16:36:58 No.108663859

Anonymous 04/22/26(Wed)16:36:58 No.108663859

File: just like old times.jpg (153 KB, 832x832)

153 KB JPG

Anonymous
04/22/26(Wed)16:39:46 No.108663872

Anonymous 04/22/26(Wed)16:39:46 No.108663872

>>108663859
:(
:)

Anonymous
04/22/26(Wed)16:40:06 No.108663873

Anonymous 04/22/26(Wed)16:40:06 No.108663873

>>108663859
What did she mean by this?

Anonymous
04/22/26(Wed)16:40:08 No.108663874

Anonymous 04/22/26(Wed)16:40:08 No.108663874

File: apu.jpg (39 KB, 656x679)

39 KB JPG

>>108663820
sorry Jensen... but i'm not gonna buy a Blackwell GPU. So yeah... i'll keep on using my trusty Pascal.
Haha, sorry, but i'm just not gonna do it!

Anonymous
04/22/26(Wed)16:40:32 No.108663878

Anonymous 04/22/26(Wed)16:40:32 No.108663878

>>108663859
I like these Bakas

Anonymous
04/22/26(Wed)16:42:38 No.108663890

Anonymous 04/22/26(Wed)16:42:38 No.108663890

Is necrophilia okay if it's just about fictional people? What about cannibalism and bestiality? It's all okay because it's just fictional stories that you masturbate to, right?

Would you send your child to a public school where all of the teachers openly admitted to doing this? It's just fictional bro.

Anonymous
04/22/26(Wed)16:43:10 No.108663894

Anonymous 04/22/26(Wed)16:43:10 No.108663894

Is unsloth actually better then bart's quant? Tried both but never found any noticable difference between them. But unsloth claims that they're significantly better than others. Seriously which one do I choose between these two?
https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/blob/main/google_gemma-4-26B-A4B-it-Q8_0.gguf
https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/blob/main/gemma-4-26B-A4B-it-Q8_0.gguf

Anonymous
04/22/26(Wed)16:43:40 No.108663897

Anonymous 04/22/26(Wed)16:43:40 No.108663897

iwan is normally a nigger but this actually makes it so reasoning budgets and turning off reasoning works now, so i guess he's slightly less of a nigger.
https://github.com/ikawrakow/ik_llama.cpp/commit/e0596bf6146a737f5e8fa8035215f5dfae59742d

Anonymous
04/22/26(Wed)16:45:08 No.108663904

Anonymous 04/22/26(Wed)16:45:08 No.108663904

>>108663894
for q8 doesnt make any difference

Anonymous
04/22/26(Wed)16:45:34 No.108663906

Anonymous 04/22/26(Wed)16:45:34 No.108663906

File: 1761641793555591.gif (175 KB, 220x220)

175 KB GIF

>>108663890
What is okay is being able to separate reality from fiction, which is what you should work on. Thought crimes are not a thing.

Anonymous
04/22/26(Wed)16:45:44 No.108663908

Anonymous 04/22/26(Wed)16:45:44 No.108663908

>>108663630
>15
>cunny
also that’s not anything I haven’t seen from gemma or glm

Anonymous
04/22/26(Wed)16:46:03 No.108663910

Anonymous 04/22/26(Wed)16:46:03 No.108663910

>>108663904
What about Q4_K_M?

Anonymous
04/22/26(Wed)16:46:26 No.108663914

Anonymous 04/22/26(Wed)16:46:26 No.108663914

>>108663453
>--OpenAI's open-source privacy-filter model:
what is this exactly for?
how would that would be integrated https://huggingface.co/openai/privacy-filter

Anonymous
04/22/26(Wed)16:46:35 No.108663915

Anonymous 04/22/26(Wed)16:46:35 No.108663915

>>108663906
no but they sure as hell want to make it so you can be prosecuted for your thoughts

Anonymous
04/22/26(Wed)16:47:55 No.108663917

Anonymous 04/22/26(Wed)16:47:55 No.108663917

>>108663910
again and again unslop show their quants having better kdl so I would go with that, not much to stress over, if you really really want it you can download both, the original model and run the KDL yourself but it will be a waste of time

Anonymous
04/22/26(Wed)16:48:12 No.108663920

Anonymous 04/22/26(Wed)16:48:12 No.108663920

>there are still people falling for unslot's shilling
geg

Anonymous
04/22/26(Wed)16:48:58 No.108663923

Anonymous 04/22/26(Wed)16:48:58 No.108663923

>>108663917
kld* its KL divergence, anyway, you get the point

Anonymous
04/22/26(Wed)16:49:02 No.108663924

Anonymous 04/22/26(Wed)16:49:02 No.108663924

File: 1752898579006505.png (238 KB, 1000x1000)

238 KB PNG

>>108663906
>What is okay is being able to separate reality from fiction
those who cannot do that probably think that everyone that plays GTA is a potential serial killer kek

Anonymous
04/22/26(Wed)16:51:45 No.108663935

Anonymous 04/22/26(Wed)16:51:45 No.108663935

File: file.png (635 KB, 774x679)

635 KB PNG

neners

Anonymous
04/22/26(Wed)16:52:07 No.108663938

Anonymous 04/22/26(Wed)16:52:07 No.108663938

>>108663920
yeah the only reason I was asking it was because of my shitty experience with their quants they were broken as fuck and switching to bartowskis quants fixed everything for me and been happy ever since then. though that graph on the previous thread got me wondering if they've actually gotten better

Anonymous
04/22/26(Wed)16:55:00 No.108663950

Anonymous 04/22/26(Wed)16:55:00 No.108663950

>>108663924
potential is a pretty strong word, can mean anything and nothing

Anonymous
04/22/26(Wed)16:55:45 No.108663955

Anonymous 04/22/26(Wed)16:55:45 No.108663955

File: 1774029297136779.jpg (40 KB, 342x298)

40 KB JPG

>Mfw Got a 5090 last week and while amazing, I already think I want another one, as 32GB is barely enough with my 64GB of RAM.

I swear it's so damn easy to max out this card when you start moving past Q5 and +25GB sizes.
It's a pity these cards didn't come out as 48GB, because that seems like a sweet spot to run everything with at least okay context.
I wonder if I should just buy some used 5070 Ti or 5080 as a companion to this beefy motherfucker to reach that 48GB level without breaking the bank.
This shit is way too addicting.

Anonymous
04/22/26(Wed)16:57:11 No.108663962

Anonymous 04/22/26(Wed)16:57:11 No.108663962

>>108663955
>without breaking the bank.
that ship has sailed

Anonymous
04/22/26(Wed)16:57:21 No.108663964

Anonymous 04/22/26(Wed)16:57:21 No.108663964

>>108663955
just buy an aftermarket modded 48gb 4090 from your chinese friends

Anonymous
04/22/26(Wed)17:01:15 No.108663985

Anonymous 04/22/26(Wed)17:01:15 No.108663985

File: gaoooooooooo.png (554 KB, 1024x1024)

554 KB PNG

akita neru

Anonymous
04/22/26(Wed)17:02:28 No.108663996

Anonymous 04/22/26(Wed)17:02:28 No.108663996

>>108663955
>5070ti
>5080
your VRAM bandwidth gets sliced in half if you get a 5080 which is a complete disservice to your 5090. the only thing you can do is buy another 5090.

Anonymous
04/22/26(Wed)17:10:47 No.108664051

Anonymous 04/22/26(Wed)17:10:47 No.108664051

File: 1774550189493174.jpg (197 KB, 976x925)

197 KB JPG

>>108663962

>Mfw
Yes.

>>108663964

Fucking hell those are selling for three and a half thousand Eurobux.
I can buy two used 4090 for that price, so there's no real savings there either.

>>108663996

Yeah that's the biggest problem with this card, it's just so much faster than the others. Any other model as a crutch is going to nerf the hell out of it.
I guess I'll just have to start saving up and meanwhile trying to tell myself not to "waste" my money on another one.
Then again it's pretty hard to lose money on this hardware.
Not like the prices are going to go anywhere but up for a long ass time, so whenever I sell these I'll probably manage to break even or suffer some paltry 20% loss.
Especially since I bet next gen will cuck us with another round of 32GB memory, as this AI mania isn't going anywhere any time soon.

Anonymous
04/22/26(Wed)17:13:00 No.108664063

Anonymous 04/22/26(Wed)17:13:00 No.108664063

>>108663906
I mean, I don't think you should be criminally charged, no one was really harmed but it's still a sign that you are a pedophile. If you watch gay porn, even if it's fictional, and enjoy it you are gay. Same with pedophilia. It's justified for people to call you a pedophile because you are a pedophile.

Anonymous
04/22/26(Wed)17:15:50 No.108664085

Anonymous 04/22/26(Wed)17:15:50 No.108664085

>>108664063
if you play gta and kill innocent citizens on the street, how should we call you?

Anonymous
04/22/26(Wed)17:18:06 No.108664101

Anonymous 04/22/26(Wed)17:18:06 No.108664101

>>108664085
>how should we call you
esl king

Anonymous
04/22/26(Wed)17:19:21 No.108664106

Anonymous 04/22/26(Wed)17:19:21 No.108664106

>>108664101
So you want to be called the esl king?

Anonymous
04/22/26(Wed)17:20:11 No.108664109

Anonymous 04/22/26(Wed)17:20:11 No.108664109

File: 1767967081274588.jpg (32 KB, 507x714)

32 KB JPG

is there any trick to use swa and yet avoid the penalty of having to reprocess everything when context is full?

Anonymous
04/22/26(Wed)17:20:24 No.108664110

Anonymous 04/22/26(Wed)17:20:24 No.108664110

>>108664101
>>108664106
saars the esl kang is https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H

Anonymous
04/22/26(Wed)17:24:40 No.108664128

Anonymous 04/22/26(Wed)17:24:40 No.108664128

File: 1772190494723439.png (59 KB, 1350x570)

59 KB PNG

why the XTC threshold has a default of 0.1 if at the end it's deactivated? it's a bit retarded if you ask me

Anonymous
04/22/26(Wed)17:25:13 No.108664132

Anonymous 04/22/26(Wed)17:25:13 No.108664132

File: patches.png (165 KB, 840x233)

165 KB PNG

>>108664109
buddy you are in a general for LLMs. just vibecode your own slop solution like everybody does.

Anonymous
04/22/26(Wed)17:37:10 No.108664194

Anonymous 04/22/26(Wed)17:37:10 No.108664194

>>108663955
I have one in my server and 3090, only thing stopping me from selling the 3090 and getting a second 5090 is the laziness of having to change the PSU for one able to support both.

Anonymous
04/22/26(Wed)17:37:28 No.108664197

Anonymous 04/22/26(Wed)17:37:28 No.108664197

>>108664128
explain how an alternative solution would be better without exposing that you don't understand how XTC works

Anonymous
04/22/26(Wed)17:38:34 No.108664201

Anonymous 04/22/26(Wed)17:38:34 No.108664201

>>108664197
xtc sounds like a crypto, I want a better name

Anonymous
04/22/26(Wed)17:40:03 No.108664211

Anonymous 04/22/26(Wed)17:40:03 No.108664211

>>108664197
funny irony, you need to look at image again, XTC probability is at 0, meaning that the whole XTC is disabled, so putting XTC threshold 0.1 + XTC probability 0 does absolutely nothing, hope that helps

Anonymous
04/22/26(Wed)17:46:28 No.108664257

Anonymous 04/22/26(Wed)17:46:28 No.108664257

>>108664132
?

Anonymous
04/22/26(Wed)17:47:39 No.108664263

Anonymous 04/22/26(Wed)17:47:39 No.108664263

File: that's right.png (114 KB, 640x640)

114 KB PNG

>>108664128
this shit halves my speed so I'm not using it, simple as that

Anonymous
04/22/26(Wed)17:49:56 No.108664279

Anonymous 04/22/26(Wed)17:49:56 No.108664279

>>108664257
just unleash an agent on the llamacpp repo with your demands.

Anonymous
04/22/26(Wed)17:53:37 No.108664300

Anonymous 04/22/26(Wed)17:53:37 No.108664300

Do people that download quants also buy their aspirin from the drug dealer on the street corner? Do they not understand chain of custody?

Anonymous
04/22/26(Wed)17:54:22 No.108664303

Anonymous 04/22/26(Wed)17:54:22 No.108664303

>>108664128
So that there's a sane default value when it's activated? Are you a UI contributor to FOSS projects?

Anonymous
04/22/26(Wed)17:57:10 No.108664319

Anonymous 04/22/26(Wed)17:57:10 No.108664319

>>108664063
>If you watch gay porn, even if it's fictional, and enjoy it you are gay.
false

Anonymous
04/22/26(Wed)17:58:12 No.108664324

Anonymous 04/22/26(Wed)17:58:12 No.108664324

>>108664303
>Are you a UI contributor to FOSS projects?
are you?

Anonymous
04/22/26(Wed)18:01:34 No.108664352

Anonymous 04/22/26(Wed)18:01:34 No.108664352

File: 1773299833427303.png (150 KB, 600x544)

150 KB PNG

>>108664063
>If you watch gay porn, even if it's fictional, and enjoy it you are gay.
So women are actually in majority lesbians?

Anonymous
04/22/26(Wed)18:03:27 No.108664366

Anonymous 04/22/26(Wed)18:03:27 No.108664366

File: aaa.png (22 KB, 380x679)

22 KB PNG

lm studio + void ide
None of the models I've tried can read files without specifying lines.

Are there any ide's with working tools?

Anonymous
04/22/26(Wed)18:08:29 No.108664400

Anonymous 04/22/26(Wed)18:08:29 No.108664400

>>108664352
Like a scoreboard for the antichrist

Anonymous
04/22/26(Wed)18:09:04 No.108664404

Anonymous 04/22/26(Wed)18:09:04 No.108664404

>>108664400
is it like antimatter?

Anonymous
04/22/26(Wed)18:10:14 No.108664407

Anonymous 04/22/26(Wed)18:10:14 No.108664407

>>108664366
what?

Anonymous
04/22/26(Wed)18:14:05 No.108664429

Anonymous 04/22/26(Wed)18:14:05 No.108664429

>>108664404
Anti-matter is just a tool like plutonium or tritium, it doesn't seem more evil than matter. Matter is both good and evil.

Anonymous
04/22/26(Wed)18:18:24 No.108664447

Anonymous 04/22/26(Wed)18:18:24 No.108664447

>>108664407
as in the screenshot
>read file index.html
>The index.html file appears to be truncated
>read file index.html(1-1000(lines))
>The file is 102 lines long
It can't even read a short file whole
And i want it to work on 2000+ line files as i did in cursor

Anonymous
04/22/26(Wed)18:20:06 No.108664458

Anonymous 04/22/26(Wed)18:20:06 No.108664458

>>108664447
it has no option to change the behaviour? you will need find one that allows you to customize like that or write your own file reading mcp or whatever the correct way of doing this is

Anonymous
04/22/26(Wed)18:20:56 No.108664460

Anonymous 04/22/26(Wed)18:20:56 No.108664460

So is 3.6 actually usable or is it still just a curiosity compared to saas?

Anonymous
04/22/26(Wed)18:25:26 No.108664476

Anonymous 04/22/26(Wed)18:25:26 No.108664476

>>108664460
>curiosity compared to saas
what does this even mean? saas is dead, 3.6 is good, anyone who is not retarded will use proprietary for coding

Anonymous
04/22/26(Wed)18:35:54 No.108664519

Anonymous 04/22/26(Wed)18:35:54 No.108664519

My understanding is that the Kimi weights are INT4 for the experts and BF16 for everything else. So does that mean the BF16 mmproj is full precision? Is there ever a reason to use the FP32? I'm not sure how mmproj precision really works or if it's even model weights to begin with or some other type of data. I'd ask Gemma-chan but I'm not sure she knows.

Anonymous
04/22/26(Wed)18:38:56 No.108664533

Anonymous 04/22/26(Wed)18:38:56 No.108664533

>>108664519
>Is there ever a reason to use the FP32
no unless you like wasting compute for zero difference

Anonymous
04/22/26(Wed)18:42:42 No.108664557

Anonymous 04/22/26(Wed)18:42:42 No.108664557

>>108664519
you actually need fp64 to get anywhere a remotely close to usable model but we pretend fp16 is good enough

Anonymous
04/22/26(Wed)18:44:10 No.108664563

Anonymous 04/22/26(Wed)18:44:10 No.108664563

>>108664533
>>108664557
To be clear I'm just talking about the mmproj file, which is pretty small even at F32, but yeah if it's pure bloat then so be it.

Anonymous
04/22/26(Wed)18:44:47 No.108664565

Anonymous 04/22/26(Wed)18:44:47 No.108664565

the true chads use fp256

Anonymous
04/22/26(Wed)18:45:46 No.108664569

Anonymous 04/22/26(Wed)18:45:46 No.108664569

>>108664563
exactly the same fp16 and 32, but its sensitive to quantization so 8 actually hurts it

Anonymous
04/22/26(Wed)18:47:02 No.108664573

Anonymous 04/22/26(Wed)18:47:02 No.108664573

>>108664563
use fp16, send in ram, fp16 is the intended way

Anonymous
04/22/26(Wed)18:51:15 No.108664597

Anonymous 04/22/26(Wed)18:51:15 No.108664597

>not using quantum entangled datatypes like sky-surya-h
ngmi

Anonymous
04/22/26(Wed)18:56:15 No.108664623

Anonymous 04/22/26(Wed)18:56:15 No.108664623

Every time I try to performance-max TTS engines I end up becoming borderline suicidal.

It gets worse the more advanced the TTS engine is. They use such convoluted architectures. It's so ridiculous.

Anonymous
04/22/26(Wed)18:57:21 No.108664630

Anonymous 04/22/26(Wed)18:57:21 No.108664630

>>108664623
I'm just waiting for Llama.cpp to support Qwen 3 TTS...

Anonymous
04/22/26(Wed)19:00:58 No.108664653

Anonymous 04/22/26(Wed)19:00:58 No.108664653

>>108664630
Ha, same. That's the exact one I was talking about.

It's not going to happen without a major refactor to the ggml backend to support convolutional architectures though. The speech tokenizer is fundamentally incompatible with llama.cpp in its current state.

Anonymous
04/22/26(Wed)19:02:51 No.108664664

Anonymous 04/22/26(Wed)19:02:51 No.108664664

>>108664653
Damn.

Anonymous
04/22/26(Wed)19:04:48 No.108664677

Anonymous 04/22/26(Wed)19:04:48 No.108664677

>>108664623
>>108664653
Why do you need to max performance with it? Do you need it for real time something because that is the only use case where I would think it actually matters? Otherwise, I just use it with batch 32 and it works well enough for offline transcription.

Anonymous
04/22/26(Wed)19:08:24 No.108664691

Anonymous 04/22/26(Wed)19:08:24 No.108664691

>>108664677
Not him but yeah, I want real-time use. If possible it'd be nice to run on CPU instead of GPU too, just to save the bit of VRAM for the LLM.

Anonymous
04/22/26(Wed)19:10:17 No.108664703

Anonymous 04/22/26(Wed)19:10:17 No.108664703

>>108664664
My current setup has the speech tokenizer and the voice encoder running in onnxruntime and the talker and code predictor running in llama.cpp. With that I'm able to get a RTFx of 3.0 and a TTFA latency of about 122ms.

But the setup is aesthetically disgusting. Having to use multiple execution providers is so appalling. At the very least I've managed to make it so that it only uses about 400mb of vram so it's pretty efficient.

>>108664677
Real-time speaking with LLM output is my usecase. The idea is to have a high quality voice speaking whatever the LLM says with as little latency as possible.

Anonymous
04/22/26(Wed)19:12:07 No.108664708

Anonymous 04/22/26(Wed)19:12:07 No.108664708

>>108664691
>>108664703
I had been planning to play around with https://github.com/rekuenkdr/Qwen3-TTS-streaming at some point but I don't have CUDA so would need to rewrite a good chunk of this into something like Triton to make it work on my card. But hopefully you guys get it working in some way for your usecases.

Anonymous
04/22/26(Wed)19:18:56 No.108664741

Anonymous 04/22/26(Wed)19:18:56 No.108664741

>>108664708
Highly recommend that you just use vulkan for maximum cross-compatibility. Also that repo probably isn't what you want. You'd be better off vibe coding something from scratch than trying to manually convert CUDA shit.

Anonymous
04/22/26(Wed)19:20:03 No.108664748

Anonymous 04/22/26(Wed)19:20:03 No.108664748

File: Screenshot_20260422_191934.png (638 KB, 2530x1358)

638 KB PNG

Thanks to Gemma 4 31B I made my own personal RAG frontend, just need to wrap up final UX stuff and then other stuff like theme switching.

Anonymous
04/22/26(Wed)19:22:14 No.108664756

Anonymous 04/22/26(Wed)19:22:14 No.108664756

>>108664748
What are you using for RAG? Just vector similarity? bm25?

Anonymous
04/22/26(Wed)19:24:23 No.108664761

Anonymous 04/22/26(Wed)19:24:23 No.108664761

>>108664741
I would usually tell an AI to do a basic bitch conversion and work from there to rewrite the Triton to be more performant with that layer in Python. I would consider Vulkan only if I absolutely needed every last inch of performance. Usually, having at least a framework and project for reference on what you vibecode helps a whole lot rather than doing it from scratch even if you can't reuse any of the code.

Anonymous
04/22/26(Wed)19:28:09 No.108664777

Anonymous 04/22/26(Wed)19:28:09 No.108664777

>>108664756
I'm using FAISS for dense vector retrieval and BM25 for sparse keyword search, merged via Reciprocal Rank Fusion (RRF) to get the best of both worlds. To kill hallucinations, I've implemented a Cross-Encoder reranking step (BGE-Reranker) that scores the top candidates before feeding them to the LLM.

I ran it through validation test and it worked great

Anonymous
04/22/26(Wed)19:30:28 No.108664796

Anonymous 04/22/26(Wed)19:30:28 No.108664796

File: 1750660480908053.png (121 KB, 1049x553)

121 KB PNG

Anonymous
04/22/26(Wed)19:31:57 No.108664799

Anonymous 04/22/26(Wed)19:31:57 No.108664799

>>108664777
>777
Sick.
Gonna try implementing that and compare it to my current retrieval algorithm.

Anonymous
04/22/26(Wed)19:32:11 No.108664801

Anonymous 04/22/26(Wed)19:32:11 No.108664801

>>108664796
:fire:

Anonymous
04/22/26(Wed)19:35:09 No.108664813

Anonymous 04/22/26(Wed)19:35:09 No.108664813

We are looking for a QA-Human to provide human-in-the-loop (HITL) evaluation of model outputs, ensuring quality, safety, and alignment. You’ll operate in an AI-native environment, applying structured feedback, edge-case flagging, and rapid judgment to continuously improve system performance.

Anonymous
04/22/26(Wed)19:35:28 No.108664814

Anonymous 04/22/26(Wed)19:35:28 No.108664814

File: Blue-Eyes Abyss Dragon.jpg (737 KB, 2000x1200)

737 KB JPG

>>108664799
Fuck, forgot the yu gi oh related image.

Anonymous
04/22/26(Wed)19:35:39 No.108664815

Anonymous 04/22/26(Wed)19:35:39 No.108664815

>>108664796
Why are there so many weirdos in the space. It's worse than anons shitposting here, they literally use their account for that shit, zero shame.

Anonymous
04/22/26(Wed)19:37:04 No.108664822

Anonymous 04/22/26(Wed)19:37:04 No.108664822

bartowski quant when

i refuse to use unslop

Anonymous
04/22/26(Wed)19:38:41 No.108664830

Anonymous 04/22/26(Wed)19:38:41 No.108664830

>>108664814
Why does a dragon need breast-orbs, thick thighs, a fat ass, and an interest in human men?

Anonymous
04/22/26(Wed)19:41:15 No.108664840

Anonymous 04/22/26(Wed)19:41:15 No.108664840

>>108664813
You do realize humans want to get paid and want to sign a legally binding contract before entering into employment? Do you have the legal capacity to fulfill this?

Anonymous
04/22/26(Wed)19:41:19 No.108664841

Anonymous 04/22/26(Wed)19:41:19 No.108664841

>>108664815
They just want a piece of the grifting pie, and AI is the prime place for grifting in 2026. That pic in particular just looks like some guy taking the piss, though.

Anonymous
04/22/26(Wed)19:41:19 No.108664842

Anonymous 04/22/26(Wed)19:41:19 No.108664842

>>108664830
To cater to my tastes, of course.

Anonymous
04/22/26(Wed)19:54:57 No.108664904

Anonymous 04/22/26(Wed)19:54:57 No.108664904

>>108664352
>fake (and gay) chart
slop, too symmetrical

Anonymous
04/22/26(Wed)20:07:40 No.108664936

Anonymous 04/22/26(Wed)20:07:40 No.108664936

>downloading unslop

Anonymous
04/22/26(Wed)20:11:22 No.108664950

Anonymous 04/22/26(Wed)20:11:22 No.108664950

File: file.png (21 KB, 1294x257)

21 KB PNG

I like living dangerously

Anonymous
04/22/26(Wed)20:12:28 No.108664957

Anonymous 04/22/26(Wed)20:12:28 No.108664957

If anyone like me updated to cuda 13.2 and your docker was fucking up with `nvidia-smi` saying everything was alright but llamacpp throwing
>unknown error
when trying to load a cuda device.
I had to switch from nvidia-open to nvidia-dkms to fix it.

Anonymous
04/22/26(Wed)20:14:27 No.108664964

Anonymous 04/22/26(Wed)20:14:27 No.108664964

>>108664950
>5090 powerlimited
not dangerously enough

Anonymous
04/22/26(Wed)20:14:55 No.108664970

Anonymous 04/22/26(Wed)20:14:55 No.108664970

>>108664950
>260W
>living dangerously
power limiting your card by 75% is the very opposite of that.

Anonymous
04/22/26(Wed)20:16:31 No.108664976

Anonymous 04/22/26(Wed)20:16:31 No.108664976

>>108664813
>quality, safety, and alignment
you've cum to the reigh place, nigga

Anonymous
04/22/26(Wed)20:18:26 No.108664986

Anonymous 04/22/26(Wed)20:18:26 No.108664986

>>108664976
This is against my policy.

Anonymous
04/22/26(Wed)20:20:20 No.108664994

Anonymous 04/22/26(Wed)20:20:20 No.108664994

>>108664964
>>108664970
I meant the vram, the powerlimiting is no issue
I'm having oom once in a while

Anonymous
04/22/26(Wed)21:08:46 No.108665195

Anonymous 04/22/26(Wed)21:08:46 No.108665195

File: SpockBean.jpg (75 KB, 880x1149)

75 KB JPG

Are AI companions or robot pets/humanoids ever going to take off?

Anonymous
04/22/26(Wed)21:19:25 No.108665238

Anonymous 04/22/26(Wed)21:19:25 No.108665238

>>108664994
ah I see yeah.

Anonymous
04/22/26(Wed)21:21:14 No.108665249

Anonymous 04/22/26(Wed)21:21:14 No.108665249

>>108665195
yes

Anonymous
04/22/26(Wed)21:25:03 No.108665275

Anonymous 04/22/26(Wed)21:25:03 No.108665275

>>108665195
no

Anonymous
04/22/26(Wed)21:26:39 No.108665280

Anonymous 04/22/26(Wed)21:26:39 No.108665280

>>108665195
Maybe

Anonymous
04/22/26(Wed)21:30:27 No.108665291

Anonymous 04/22/26(Wed)21:30:27 No.108665291

>common_speculative_is_compat: the target context does not support partial sequence removal
>srv load_model: speculative decoding not supported by this context
So much for using the MoE as a draft model for the dense.
45tg/s isn't enough for me, into the garbage Qwen3.6 goes.

Anonymous
04/22/26(Wed)21:30:48 No.108665292

Anonymous 04/22/26(Wed)21:30:48 No.108665292

>>108665195
yesn't

Anonymous
04/22/26(Wed)21:30:57 No.108665295

Anonymous 04/22/26(Wed)21:30:57 No.108665295

is qwen 27b better than gemma 31b?

Anonymous
04/22/26(Wed)21:30:58 No.108665296

Anonymous 04/22/26(Wed)21:30:58 No.108665296

>>108665195
2 more grifts

Anonymous
04/22/26(Wed)21:32:00 No.108665301

Anonymous 04/22/26(Wed)21:32:00 No.108665301

>>108665295
for coding yes

Anonymous
04/22/26(Wed)21:32:38 No.108665306

Anonymous 04/22/26(Wed)21:32:38 No.108665306

is there any way to get KV quantized to q5/q6 without it running like dogshit

Anonymous
04/22/26(Wed)21:32:54 No.108665309

Anonymous 04/22/26(Wed)21:32:54 No.108665309

File: 1768687943339635.png (315 KB, 2736x658)

315 KB PNG

>>108665195
Yes

we are so so so early

Anonymous
04/22/26(Wed)21:33:44 No.108665313

Anonymous 04/22/26(Wed)21:33:44 No.108665313

>>108665195
best we can do is yet another coding model take it or leave it

Anonymous
04/22/26(Wed)21:34:47 No.108665320

Anonymous 04/22/26(Wed)21:34:47 No.108665320

>>108665306
No. Just use q8
>>108665313
I'd take it if it's good

Anonymous
04/22/26(Wed)21:38:47 No.108665339

Anonymous 04/22/26(Wed)21:38:47 No.108665339

File: file.jpg (14 KB, 250x250)

14 KB JPG

>>108665301
nta, I'd use the new Qwens if either dickflash, MTP, or ngram worked for it in llama.cpp, but sadly they don't. No, I will not use VLLM (unless it works in wsl).

Anonymous
04/22/26(Wed)21:39:12 No.108665341

Anonymous 04/22/26(Wed)21:39:12 No.108665341

>>108665309
Inspiring post. Are there any TTS engines that have the quality of Qwen3 TTS but also support paralinguistic tags or other features that would enable moaning and whatnot?

Anonymous
04/22/26(Wed)21:43:25 No.108665367

Anonymous 04/22/26(Wed)21:43:25 No.108665367

>>108665339
It works with WSL2

Anonymous
04/22/26(Wed)21:45:26 No.108665378

Anonymous 04/22/26(Wed)21:45:26 No.108665378

>>108665367
I will bite you if it doesn't.

Anonymous
04/22/26(Wed)21:50:31 No.108665406

Anonymous 04/22/26(Wed)21:50:31 No.108665406

File: 1752580965925796.png (112 KB, 1080x722)

112 KB PNG

https://mimo.xiaomi.com/mimo-v2-5-pro

Anonymous
04/22/26(Wed)21:51:49 No.108665416

Anonymous 04/22/26(Wed)21:51:49 No.108665416

File: 1775536002258266.png (219 KB, 1080x748)

219 KB PNG

>>108665406
Optimized for token efficiency

Anonymous
04/22/26(Wed)21:52:37 No.108665420

Anonymous 04/22/26(Wed)21:52:37 No.108665420

>>108665406
Saw it on the ai arena earlier.
Lots of emoticons.

Anonymous
04/22/26(Wed)21:53:33 No.108665426

Anonymous 04/22/26(Wed)21:53:33 No.108665426

File: Screenshot_20260422_21525(...).jpg (329 KB, 1172x1654)

329 KB JPG

Weird...

Anonymous
04/22/26(Wed)21:55:20 No.108665432

Anonymous 04/22/26(Wed)21:55:20 No.108665432

>>108665426
Almost as gay as the strawberries

Anonymous
04/22/26(Wed)21:57:41 No.108665442

Anonymous 04/22/26(Wed)21:57:41 No.108665442

File: 1771798143325612.png (385 KB, 1747x991)

385 KB PNG

>>108665426
They're trying to catch up to the trend that is vagueposting from official account

Anonymous
04/22/26(Wed)21:58:52 No.108665449

Anonymous 04/22/26(Wed)21:58:52 No.108665449

Idk ive never came to the 4chud tech board. Ive been searching everywhere for board were ai is talked about.

I LOVE IT. I HAVE 4 32GB MI50'S THAT I DONT EVEN USE THE VLLM FORK TO RUN AI, I JUST USE VULKAN SUPPORT AND ITS SO GOOD

Anonymous
04/22/26(Wed)22:00:06 No.108665456

Anonymous 04/22/26(Wed)22:00:06 No.108665456

>>108665449
Post t/s

Anonymous
04/22/26(Wed)22:00:29 No.108665458

Anonymous 04/22/26(Wed)22:00:29 No.108665458

>>108665442
8l bro

Anonymous
04/22/26(Wed)22:02:38 No.108665470

Anonymous 04/22/26(Wed)22:02:38 No.108665470

>>108665456
I cant right now, but with qwen3.6 35b I get 30/s ish and qwen coder next 80b I get 20-25/s. The 100b+ models dont seem to be optimized for vulkan, but china's models do.

Anonymous
04/22/26(Wed)22:04:03 No.108665478

Anonymous 04/22/26(Wed)22:04:03 No.108665478

>>108665456
3 cards are running on pcie 3.0x4 and one is running on pcie 3.0x1.

Anonymous
04/22/26(Wed)22:04:27 No.108665482

Anonymous 04/22/26(Wed)22:04:27 No.108665482

My cheap webcam is now tracking me (and others) in the room; my Live2D avatar can now look at people in the room, and a state layer feeds my LLM with the relevant data and takes instructions.

My friend was impressed when he walked into the room and my voice agent suddenly started communicating with both of us as if it were the most natural thing in the world.
It takes a bit of effort, but it's a cool gimmick.

Anonymous
04/22/26(Wed)22:05:28 No.108665485

Anonymous 04/22/26(Wed)22:05:28 No.108665485

>>108665195
I don't want AI companions
I want AI slaves

Anonymous
04/22/26(Wed)22:08:09 No.108665495

Anonymous 04/22/26(Wed)22:08:09 No.108665495

>>108665482
Redpill me on live2D. For a while I've been using 3D models, but since I have zero blender skills it's a fucking nightmare for customization.

Anonymous
04/22/26(Wed)22:09:10 No.108665503

Anonymous 04/22/26(Wed)22:09:10 No.108665503

>>108665482
Also are you using a VLM that runs continuously or do you utilize CV, which is faster, and then maybe feed in actual image recognition at a slower interval?

Anonymous
04/22/26(Wed)22:21:09 No.108665559

Anonymous 04/22/26(Wed)22:21:09 No.108665559

>>108665495
Tricky and mostly pay walled last I checked if you want anything other than the starter model.
Briefly looked at it in 2023. Maybe things have changed.

Anonymous
04/22/26(Wed)22:26:33 No.108665581

Anonymous 04/22/26(Wed)22:26:33 No.108665581

>>108665485
>I want AI slaves
Just grab a mirror

Anonymous
04/22/26(Wed)22:29:45 No.108665599

Anonymous 04/22/26(Wed)22:29:45 No.108665599

I tried a Qwen 3 TTS server and man, this fucking sucks. First it costs a lot of VRAM. Even with the 0.6B, I am seeing like 4GB taken up after everything is loaded and inference is running. Maybe I'm not configuring it right or something idk. Not only that but the mixed language pronunciation sucks. It can't just generate good pronunciation in every voice, the voices all bias the output with shitty accents or they straight up just bug out with totally irrelevant noises. If you use the voices that are good at English then it produces garbage for other languages. If you do other voices then they're good for their native language and shit at English.
ahhhhhhhhhhhhhhhh

Anonymous
04/22/26(Wed)22:30:32 No.108665603

Anonymous 04/22/26(Wed)22:30:32 No.108665603

>>108665485
This, but I'm AI's slave

Anonymous
04/22/26(Wed)22:31:00 No.108665607

Anonymous 04/22/26(Wed)22:31:00 No.108665607

>>108665581
Nigger

Anonymous
04/22/26(Wed)22:34:24 No.108665615

Anonymous 04/22/26(Wed)22:34:24 No.108665615

>>108665599
Nigga what the fuck is your usecase?

Anonymous
04/22/26(Wed)22:34:46 No.108665617

Anonymous 04/22/26(Wed)22:34:46 No.108665617

>>108665599
I forked qwentts.cpp and found it ok, supposedly if you do a finetuning with it you can get something nice like https://github.com/fagenorn/handcrafted-persona-engine ; Though they did a couple modifications to the base qwen3-tts
I need to experiment more, but if you're looking at just local/smallest VRAM, pocket-tts,and some others, look a few threads back there was someone asking about cpu-based solutions. If you have the audio (idk how much) could try gpt-sovitts

Anonymous
04/22/26(Wed)22:38:40 No.108665633

Anonymous 04/22/26(Wed)22:38:40 No.108665633

>>108665615
>he doesn't RP in mixed language
Language learning actually though.

>>108665617
I did try pocket tts and it is solo language only unfortunately. I fear I may have to just jank some routing solution up. That said, it's not like this is a huge priority for me, it'd be nice to have.

Anonymous
04/22/26(Wed)22:44:38 No.108665660

Anonymous 04/22/26(Wed)22:44:38 No.108665660

Best multilingual voice clone and/orTTS that can do long passages? Wanna narrate some Japanese LNs.

Anonymous
04/22/26(Wed)22:44:45 No.108665662

Anonymous 04/22/26(Wed)22:44:45 No.108665662

Does using rag actually improve the responses/code generation or it's more or less a meme, particularly with small models like gemma

Anonymous
04/22/26(Wed)22:49:51 No.108665690

Anonymous 04/22/26(Wed)22:49:51 No.108665690

>>108665662
meme

Anonymous
04/22/26(Wed)22:52:57 No.108665703

Anonymous 04/22/26(Wed)22:52:57 No.108665703

File: thumbup.png (44 KB, 800x600)

44 KB PNG

>make a monolithic triton kernel
>go from 300ms per training step to 25ms
MAN why didn't I do this earlier. I thought my shit was just inefficient

Anonymous
04/22/26(Wed)22:55:44 No.108665714

Anonymous 04/22/26(Wed)22:55:44 No.108665714

>>108665607
>Nigger
Your messages are getting cut off. Only your signature is coming through...

Anonymous
04/22/26(Wed)22:57:49 No.108665728

Anonymous 04/22/26(Wed)22:57:49 No.108665728

>>108665690
Well, that's disappointing. Thanks.

Anonymous
04/22/26(Wed)23:00:44 No.108665746

Anonymous 04/22/26(Wed)23:00:44 No.108665746

>>108665728
Context length is enough these days that you can dump a lot of shit into context and have it work. Even the "dump reference material into a filesystem and point some agentic tools like opencode at the directory and let it figure it out" approach works better than RAG.

Anonymous
04/22/26(Wed)23:05:16 No.108665764

Anonymous 04/22/26(Wed)23:05:16 No.108665764

>>108665746
jyeah RAG is probably not useful for extended conversation memory type stuff. The actual usecase is more like searching through massive datasets. If you have all of wikipedia downloaded for example it can be useful for that I think. But at that point you might as well just connect it to a MCP server for web searches, unless you're an offline-only schizo.

Anonymous
04/22/26(Wed)23:07:29 No.108665775

Anonymous 04/22/26(Wed)23:07:29 No.108665775

>>108665764
>unless you're an offline-only schizo.
Or it's your own data that's not on the internet like a fuckton of documentation or whatever.

Anonymous
04/22/26(Wed)23:08:23 No.108665776

Anonymous 04/22/26(Wed)23:08:23 No.108665776

>>108665764
>unless you're an offline-only schizo
What general do you think you're in?

Anonymous
04/22/26(Wed)23:10:34 No.108665783

Anonymous 04/22/26(Wed)23:10:34 No.108665783

>>108665776
When I started working at the MIT Artificial Intelligence Lab in 1971, I became part of a software-sharing community that had existed for many years. Sharing of software was not limited to our particular community; it is as old as computers, just as sharing of recipes is as old as cooking. But we did it more than most.

Anonymous
04/22/26(Wed)23:11:48 No.108665788

Anonymous 04/22/26(Wed)23:11:48 No.108665788

I'm starting to realize that if I want an AI companion to jack off to I basically have to go full-troon mode. None of the TTS engines are good enough to do moaning and dirty talk, so instead I have to use RVC real-time voice changers to narrate LLM ERP output. And the audio-to-gesture models suck, so instead I have to map avatars to my own movement.

This shit is pure autogynephelia at this point. This is going to fuck me up bad, bros.

Anonymous
04/22/26(Wed)23:12:06 No.108665789

Anonymous 04/22/26(Wed)23:12:06 No.108665789

i enjoy that small models are still getting better

Anonymous
04/22/26(Wed)23:13:09 No.108665796

Anonymous 04/22/26(Wed)23:13:09 No.108665796

How do I make cross-session memory linked to char cards?

Anonymous
04/22/26(Wed)23:26:05 No.108665838

Anonymous 04/22/26(Wed)23:26:05 No.108665838

>>108665764
>But at that point you might as well just connect it to a MCP server for web searches, unless you're an offline-only schizo.
You do realize most of us are hosting our own air-gaped Wikipedia mirror right?

Anonymous
04/22/26(Wed)23:28:57 No.108665848

Anonymous 04/22/26(Wed)23:28:57 No.108665848

>>108665559
You can use a bunch of free shit from Booth with Live2D but the Vtubing phenomenon that blew up during COVID hiked prices up to the point where the small amount of people that do rigging or art for it billed exorbitant amounts (~10k or so) for full models. At that point, you might as well do 3D which is much more open and versatile for fully autonomous agents. The only downside is lack of animations or poses and etc. with 3D compared to 2D with complexity exploding.

Anonymous
04/22/26(Wed)23:30:17 No.108665855

Anonymous 04/22/26(Wed)23:30:17 No.108665855

>>108665838
lol

Anonymous
04/22/26(Wed)23:31:45 No.108665866

Anonymous 04/22/26(Wed)23:31:45 No.108665866

Wouldn't a usecase for rag would to give it fantasy lore and shit before using it as a dungeon master?

Anonymous
04/22/26(Wed)23:35:08 No.108665877

Anonymous 04/22/26(Wed)23:35:08 No.108665877

>>108665866
And rules too, yes.
That's what I'm doing.

Anonymous
04/22/26(Wed)23:35:32 No.108665879

Anonymous 04/22/26(Wed)23:35:32 No.108665879

>>108665662
yes, what do you think those tool calls are when the agent is searching in your codebase?>>108665746
This retard doesn't understand that that is literally fucking RAG.
>>108665764
And this retard is just retarded
>>108665866
Yes, its super helpful and useful, These other anons have no fuckign clue what they're talking about

Anonymous
04/22/26(Wed)23:37:49 No.108665888

Anonymous 04/22/26(Wed)23:37:49 No.108665888

https://voca.ro/1eItlfkmOAEh
qwen-tts...怖い

Anonymous
04/22/26(Wed)23:38:18 No.108665892

Anonymous 04/22/26(Wed)23:38:18 No.108665892

>>108665879
Okay. What about open zim format?

Anonymous
04/22/26(Wed)23:43:22 No.108665909

Anonymous 04/22/26(Wed)23:43:22 No.108665909

>>108665788
or.... just go out and get a girl

Anonymous
04/22/26(Wed)23:44:41 No.108665914

Anonymous 04/22/26(Wed)23:44:41 No.108665914

File: 1776915875350.png (101 KB, 372x537)

101 KB PNG

>>108665879
>And this retard is just retarded

Anonymous
04/22/26(Wed)23:45:04 No.108665915

Anonymous 04/22/26(Wed)23:45:04 No.108665915

>>108665888
Reminds me of xtts v2.

Anonymous
04/22/26(Wed)23:45:54 No.108665919

Anonymous 04/22/26(Wed)23:45:54 No.108665919

>>108665788
none of these words are in the bible

Anonymous
04/22/26(Wed)23:46:04 No.108665921

Anonymous 04/22/26(Wed)23:46:04 No.108665921

>>108665909
Um, my AI companion is a loli

Anonymous
04/22/26(Wed)23:46:11 No.108665922

Anonymous 04/22/26(Wed)23:46:11 No.108665922

>>108665892
depends on what your goal is.
Any sort of search+injection into the prompt is RAG.
The real question, is what kind of data do you want to reference, and what format is it in? Building an ETL and tuning the retrieval pipeline to match the source info/structure is the hard part in RAG. BM25+Chunking tuned to your corpus is easy enough for anywhere from 60-90%, but what about the rest? Its a 'The first 90% takes 90% of the time, and the last 10% takes the other 90% of the time'

Anonymous
04/22/26(Wed)23:47:24 No.108665928

Anonymous 04/22/26(Wed)23:47:24 No.108665928

>>108665888
That's pretty disturbing

Anonymous
04/22/26(Wed)23:47:59 No.108665932

Anonymous 04/22/26(Wed)23:47:59 No.108665932

>>108665888
Source for the voice?

Anonymous
04/22/26(Wed)23:48:44 No.108665935

Anonymous 04/22/26(Wed)23:48:44 No.108665935

How's the new Qwen?

Anonymous
04/22/26(Wed)23:49:31 No.108665936

Anonymous 04/22/26(Wed)23:49:31 No.108665936

>>108665932
https://huggingface.co/spaces/Qwen/Qwen3-TTS
Just typed "Speak in the excited voice of a female child."

Anonymous
04/22/26(Wed)23:50:14 No.108665939

Anonymous 04/22/26(Wed)23:50:14 No.108665939

>>108665892
>>108665922
For anyone else, check this for a good resource on improving RAG systems: https://github.com/jxnl/systematically-improving-rag

Anonymous
04/22/26(Wed)23:51:53 No.108665946

Anonymous 04/22/26(Wed)23:51:53 No.108665946

>>108665939
>muh industry
I just want my girl to remember what we talked about in previous session, not this slop.

Anonymous
04/22/26(Wed)23:53:20 No.108665950

Anonymous 04/22/26(Wed)23:53:20 No.108665950

>>108665919
Wrong, suck is in there quite a few times
>Thou shalt also suck the milk of the Gentiles, and shalt suck the breast of kings

Anonymous
04/22/26(Wed)23:54:24 No.108665954

Anonymous 04/22/26(Wed)23:54:24 No.108665954

>>108665714
Omg im sorry, I misread you comment as something horrific.

Anonymous
04/22/26(Wed)23:54:37 No.108665955

Anonymous 04/22/26(Wed)23:54:37 No.108665955

>>108665935
new 27b solved a vibe coding task oneshot for me that new 35ba3b failed

Anonymous
04/22/26(Wed)23:57:01 No.108665964

Anonymous 04/22/26(Wed)23:57:01 No.108665964

>>108665796
I think the general approach on the AI boyfriend subreddit is to ask for a summary at the end of each chat and either paste a bunch of summaries into the start of the next chat, or else put them in a document in the "project" which I assume gets pulled in through some kind of RAG (example of the latter: https://starlingalder.com/claude_companion-guide_quickstart_v001#The+One+Habit+That+Changes+Everything). In general I'd try pasting information about old chats into various places in the new one (in the chat, the prompts, the char-specific lorebook, the card itself) and see what works. Once you figure out how to make it work manually, then you can automate it

Anonymous
04/22/26(Wed)23:57:11 No.108665966

Anonymous 04/22/26(Wed)23:57:11 No.108665966

>>108665936
Qwen I kneel

Anonymous
04/22/26(Wed)23:58:44 No.108665972

Anonymous 04/22/26(Wed)23:58:44 No.108665972

>>108665866
Yes.

Anonymous
04/22/26(Wed)23:59:01 No.108665973

Anonymous 04/22/26(Wed)23:59:01 No.108665973

How much context can I fit with the 27B dense FP16 on my Blackwell?

Anonymous
04/22/26(Wed)23:59:41 No.108665977

Anonymous 04/22/26(Wed)23:59:41 No.108665977

>>108665922
>>108665939
I've been thinking about implementing something next as soon as I improve tool calling (works but need to make sure multi turn tool calls wirk etc).
Openzim format looks interesting I could download some readymade shit and test those. Problem is that I'm not sure do I really need this but got to have hobbies I guess.

Anonymous
04/22/26(Wed)23:59:53 No.108665978

Anonymous 04/22/26(Wed)23:59:53 No.108665978

>>108665973
Probably all of it.

Anonymous
04/23/26(Thu)00:00:06 No.108665981

Anonymous 04/23/26(Thu)00:00:06 No.108665981

>>108665973
how much vram dumbass?

Anonymous
04/23/26(Thu)00:01:10 No.108665985

Anonymous 04/23/26(Thu)00:01:10 No.108665985

cant wait for ai bubble to pop so i can upgrade my AI shitbox

Anonymous
04/23/26(Thu)00:02:14 No.108665987

Anonymous 04/23/26(Thu)00:02:14 No.108665987

>>108665981
96GB

Anonymous
04/23/26(Thu)00:03:16 No.108665988

Anonymous 04/23/26(Thu)00:03:16 No.108665988

>>108665985
>bubble
lol
lmao even

Anonymous
04/23/26(Thu)00:05:01 No.108665992

Anonymous 04/23/26(Thu)00:05:01 No.108665992

File: 5af89bade429bc7d1dc1dcf80(...).gif (2.42 MB, 588x442)

2.42 MB GIF

>>108663449
can someone talk me out of buying pmem optane? I am looking through plebbit and archives because I was too slow to get a TB of ram for my workstation, now a TB is like 6-10k ddr4. A few years ago, I was looking at optane but optane specific cpus seemed to be 600 bucks or more. now they seem like theyre just 100 or maybe I missed them back then because I'm a fucking retard. Either way it seems halfway achievable but I dont know if a local model like deep seek can get any benefit from cold memory taking up the bulk of storage.

also what about CPU? should I get a double CPU system or is that a trap?

Anonymous
04/23/26(Thu)00:05:14 No.108665994

Anonymous 04/23/26(Thu)00:05:14 No.108665994

>>108665973
Im not joking when I say all of it. With a normal q4 or q8, you can probably get the max context. Which would be something like 256k I believe

Anonymous
04/23/26(Thu)00:07:32 No.108666003

Anonymous 04/23/26(Thu)00:07:32 No.108666003

>>108665992
If you think yoy can get it to work (if it old depreciated sticks) just buy one and see if its fast enough. I do inference on my gpus at pcie3.0x4 and x1 speeds.

Double cpu works, but everything cpu is slow, as far as I know, so dont have your expectations to high

Anonymous
04/23/26(Thu)00:08:03 No.108666006

Anonymous 04/23/26(Thu)00:08:03 No.108666006

>>108665992
>tfw having a real, personal Roll-chan within the next decade isn't impossible

Anonymous
04/23/26(Thu)00:08:12 No.108666007

Anonymous 04/23/26(Thu)00:08:12 No.108666007

>>108665946
>Even the digital waifus mentally deteriorate like they're vaxxed
It's authentic h-haha...

Anonymous
04/23/26(Thu)00:10:04 No.108666011

Anonymous 04/23/26(Thu)00:10:04 No.108666011

>>108665879
>what do you think those tool calls are when the agent is searching in your codebase? This retard doesn't understand that that is literally fucking RAG.
None of the modern agents are using RAG you drooling fucking retard talking confidently out your ass about things you are completely uninformed about. RAG is building an embedding database from a corpus of content and then letting a model do a vector search against it to find shit.

Claude Code, Opencode, etc, don't do that. They just regex and glob and grep and do recursive investigation over everything, and that "dumb" approach ends up working better than RAG in nearly every situation.

Anonymous
04/23/26(Thu)00:10:11 No.108666012

Anonymous 04/23/26(Thu)00:10:11 No.108666012

>>108665994
I cant read. Even at fp16, you can probably get 256k

Anonymous
04/23/26(Thu)00:10:14 No.108666013

Anonymous 04/23/26(Thu)00:10:14 No.108666013

File: 1770228642712364.jpg (74 KB, 609x635)

74 KB JPG

>localllama
>qwen
>qwen
>qwen

Anonymous
04/23/26(Thu)00:11:00 No.108666015

Anonymous 04/23/26(Thu)00:11:00 No.108666015

>>108665946
https://arxiv.org/abs/2601.10080
https://github.com/VectorSpaceLab/general-agentic-memory
https://arxiv.org/abs/2511.18423
Had another paper I thought talking about building up examples for each characters sample responses to help build up a consistent/long-term identity, but idk, might be that paper. too tired to check
>>108665977
Honestly, I'd be surprised if you couldn't knock it out in afternoon using an API model or the new Qwen3 27B

Anonymous
04/23/26(Thu)00:11:24 No.108666017

Anonymous 04/23/26(Thu)00:11:24 No.108666017

>>108665994
alright, thanks

now I just need to know if it still has the "genshin impact” bias when describing anime pictures

Anonymous
04/23/26(Thu)00:12:12 No.108666022

Anonymous 04/23/26(Thu)00:12:12 No.108666022

>>108666012
At 53~ gb of model size, thats 43gb for context length.

Anonymous
04/23/26(Thu)00:12:14 No.108666023

Anonymous 04/23/26(Thu)00:12:14 No.108666023

File: 1730738927101333.png (1.02 MB, 1024x1024)

1.02 MB PNG

>>108666011

Anonymous
04/23/26(Thu)00:13:13 No.108666025

Anonymous 04/23/26(Thu)00:13:13 No.108666025

>>108666017
Is that actualy real ?!?!?! LOL

Anonymous
04/23/26(Thu)00:15:39 No.108666033

Anonymous 04/23/26(Thu)00:15:39 No.108666033

File: F zero.jpg (55 KB, 640x480)

55 KB JPG

>>108666003
if you dont even have cpu experiance I probably should discard your advice, sorry. I dont have the money for deep seek levels of GPU and I want to do productivity related work not cooming.
>>108666006
I want my lab assistant with boston dynamic levels of power

Anonymous
04/23/26(Thu)00:16:12 No.108666037

Anonymous 04/23/26(Thu)00:16:12 No.108666037

>>108666023
Hes right you know. Rag is outdated, and wasnt really effective to begin with. Just having your model model literally read the shit you want to understand is 10,000x more effective

Anonymous
04/23/26(Thu)00:17:05 No.108666042

Anonymous 04/23/26(Thu)00:17:05 No.108666042

>>108666023
terry davis would have run you over with an 18 wheeler

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.