[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108659983 & >>108655009

►News
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 131557813_p0_master1200.jpg (210 KB, 768x1024)
210 KB JPG
►Recent Highlights from the Previous Thread: >>108659983

--Comparing GGUF quantizers and discussing imatrix calibration for Qwen3.6-27B:
>108662039 >108662052 >108662065 >108662230 >108662252 >108662353 >108662475 >108662053 >108662063 >108662080 >108662162 >108662361 >108662068 >108662062 >108662167 >108662176 >108662190 >108662257 >108662321 >108662780
--Qwen3.6-27B benchmarks and GGUF quants:
>108660998 >108661023 >108661071 >108661108 >108661125 >108662813 >108662846 >108661101 >108661164
--Gemma 4's 124B MoE and memory bandwidth benchmarks:
>108662533 >108662543 >108662549 >108662589 >108662594 >108662614
--Models for a 3090 and explaining MoE vs Dense offloading:
>108659996 >108660054 >108660247 >108660260 >108660268 >108660279 >108660312 >108660317 >108660347 >108660223 >108662148
--Koboldcpp launch flags and speculative decoding for Gemma 4:
>108660701 >108660741 >108660743 >108660848 >108660934 >108660990
--Alleged unauthorized access to Anthropic's Mythos:
>108660075 >108660630 >108660724 >108661694
--Anons discussing reported Gemma 4 performance on RK3588 SBCs:
>108662346 >108662393 >108662431 >108662528
--LLM reliability, internet content degradation, and local knowledge bases:
>108661238 >108661314 >108661335 >108661358 >108661276 >108661375 >108661405 >108661533 >108661585 >108661462 >108661311
--llama.cpp ngram-mod flags to optimize coding performance:
>108660554 >108662471 >108661013
--Text Completions prefills to stop GLM's repetitive thinking loops:
>108661606 >108661631
--OpenAI's open-source privacy-filter model:
>108662489 >108662773
--Little Coder agent optimized for small LLMs:
>108660765 >108661020
--TurboQuant-H reducing VRAM via 2-bit embedding quantization:
>108660542
--Logs:
>108660349 >108661795 >108662260
--Rin, Miku, Teto (free space):
>108660565 >108660789 >108661238 >108661795 >108661801 >108662084

►Recent Highlight Posts from the Previous Thread: >>108659986

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108661743
>>108661866
>text completion has no vision
kek wtf, I use text completion and can do shit like write "Appearance: <__media__>" in the character card and feed it images in the request body placed wherever I want in context. If you need your hand held by an abstraction like chat completion just admit it. You can do whatever the fuck you want if you know what you're doing.
>>
File: 1758392265995431.jpg (98 KB, 996x720)
98 KB JPG
>>108663492
Okay but why?
>>
>>108663449
Picking out junk food at the store with Yellow Miku
>>
>>108663544
If you don't have an innate urge to be in control of every single token present in context why are you here?
>>
>>108663443
>https://huggingface.co/HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive
>WTF HE ALREADY DID IT
still no gemma4-31b-it-HAUHAUCS
>>
>>108663564
geg......................
>>
>>108663564
no kdl? ACK
>>
KEKEKKEKEJEEKEK WAITING FOR V4!? MEANWHILE I JUST HAD 64k long CUNNY SEX WITH THAT DEEPSEEK V4 ON IT'S OWN WEB CHAT LOL… And not just sex, but CUNNY sexxxxxxx (ON THAT DAMN FILTERED WEB) BUUWHAHAHHAHAGHHAHHA I'VE BECOME A GOD NOW... YOU ANONS MUST KNEEL BEFORE ME
>>
>get excited about structured output in llama.cpp
>waste 2 hours trying to get it to work
>turns out it's broken and completely ignores whatever schema you pass it
damn
>>
File: 1758062318463220.jpg (51 KB, 640x480)
51 KB JPG
>>108663630
>15yo
>cunny
Burger-kun...
>>
>>108663630
>15
You mean hag sex
>>
I'm not even sorry for cheating on gemma-chan...
oh the cunny loli sexo~
>>
>>108663630
a-anon... that's not cunny
that's prime breeding age
>>
>>108663630
???
>>
>>108663633
What?
It was working until last week on my python app using the OpenAi lib.
>>
>>108663630
>americans
>>
>>108663630
>15
rookie numbers.
>>
>>108663654
i think it's this issue? https://github.com/ggml-org/llama.cpp/pull/21537
gemma 4 chat template does not specify response_format, maybe that's what it is
>>
>>108663655
It has to be the tap water. There is no other explanation to this phenomenon.
>>
>>108663633
Structured output just works with vllm btw
>>
>>108663630
>15
If she's had her first period, she's not a trve loli, which is physically undeveloped. She's a female that Nature has ordained to be impregnated as soon as possible.
>>
Qwen 3.6 27b is already uncensored without finetuning btw
I dropped the q8_0 from ggml-org into a sysprompt I was using with gemma 4 heretic and it just werked, no refusals or moralizing in reasoning. It's resistant to using nsfw language unprompted though.
>>
>>108663633
Shit has always been broken since day one, vllm handles function schema fine, but llama.cpp forces alphabetical ordering for some reason. This is really bad if the an function argument depends on the previous one.
>>
File: 1746199182845250.png (50 KB, 1008x839)
50 KB PNG
>108663630
>108663644
>108663646
>108663647
>108663649
>108663651
>108663655
>108663665
>108663680
>108663710
>this much pedophilia already, this early in the thread
Are we being raided by discord trannies or something?
>>
>>108663741
>afraid to quote
Seems like reddit is already here
>>
>>108663741
um actually pedophilia is oldfag 4chan culture, newfag. We wuz oldfags or sumthing
>>
File: ACK.gif (1.66 MB, 1300x800)
1.66 MB GIF
>>108663630
Dipsy release when? I know you labniggers are lurking here, hurry the fuck up.
>>108663741
Always have been.
>>
File: 1747059796790100.png (371 KB, 896x896)
371 KB PNG
>>108663741
>>
File: 1764765168047.jpg (28 KB, 490x748)
28 KB JPG
>>108663756
aint no way
>>
>>108663680
Fluoride has been shown to decrease IQ and there is still a signifigent amount of lead pipes around so that is also a factor.
I think the biggest factor though is the no child left behind policy in education. When you teach for the dumbest kid in the class then everyone else is going to be dumber as a result and the dumbest kid will get dumber every single year. That and if a student isn't actually smart enough to advance a grade they will still push them through regardless due to financial incentives. So the bar gets lowered so far that no one can actually fail.
That has also been a uptick in taking pride in being a fucking retard in the last decade or two. So you have health, the education system itself and societal praise in being a retard taking off all contributing to making everyone stupid.
Eventually we will either shape up or be out competed by stronger and smarter societies but all I know is we were handed the world on a golden platter and that that if we fail and collapse we have no one to blame but ourselves and the previous generations who set us up for failure.
Thanks for coming to my ted talk
>>
>>108663776
>I think the biggest factor though is the no child left behind policy in education. When you teach for the dumbest kid in the class then everyone else is going to be dumber as a result and the dumbest kid will get dumber every single year.
Same applies to these threads by the way. Being surrounded by low IQ pedophiles mentally retards your brain.
>>
>>108663630
>shivers and not x but y in the same phrase
>pedoshit
shit's crazy, what kind of turboslopped model is this?
>>
>>108663689
>no pascal support
>very limited cpu support
>pythonshit, meaning it will pull a dozen of GiBs of dependencies
llama.cpp might be buggy, but sometimes i really appreciate how it runs on fucking everything, on top of being self contained and not being dependent on cancer that is AI ecosystem in python
>>
>>108663809
That's chink model for you!
>>
>>108663776
It was a joke but I think these are international issues for every 'western' nation.
>>
>>108663810
Only if your time is worthless
>>
>>108663806
Pedo is attraction to 13 and under, burger. Words have meaning.
>>
>>108663828
Then why are you dumb faggots dogging on that anon who thought "cunny" applied to 15 year olds? You're not hebephiles, you're pedophiles. That's why you post pictures of "loli" anime girls with no tits, hips, or ass and infantile behavior. Fucking freak. Don't reply to me again.
>>
>>108663841
>low comprehension too
Let me break it to you, anons are making fun of another anon saying that a virtual '15yo' was 'cunny' (pedo slang) which isn't. It's not that hard to understand.
>>
>>108663810
>a dozen of GiBs of dependencies
18GB is my venv for stable diffusion
>>
File: just like old times.jpg (153 KB, 832x832)
153 KB JPG
>>
>>108663859
:(
:)
>>
>>108663859
What did she mean by this?
>>
File: apu.jpg (39 KB, 656x679)
39 KB JPG
>>108663820
sorry Jensen... but i'm not gonna buy a Blackwell GPU. So yeah... i'll keep on using my trusty Pascal.
Haha, sorry, but i'm just not gonna do it!
>>
>>108663859
I like these Bakas
>>
Is necrophilia okay if it's just about fictional people? What about cannibalism and bestiality? It's all okay because it's just fictional stories that you masturbate to, right?

Would you send your child to a public school where all of the teachers openly admitted to doing this? It's just fictional bro.
>>
Is unsloth actually better then bart's quant? Tried both but never found any noticable difference between them. But unsloth claims that they're significantly better than others. Seriously which one do I choose between these two?
https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/blob/main/google_gemma-4-26B-A4B-it-Q8_0.gguf
https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/blob/main/gemma-4-26B-A4B-it-Q8_0.gguf
>>
iwan is normally a nigger but this actually makes it so reasoning budgets and turning off reasoning works now, so i guess he's slightly less of a nigger.
https://github.com/ikawrakow/ik_llama.cpp/commit/e0596bf6146a737f5e8fa8035215f5dfae59742d
>>
>>108663894
for q8 doesnt make any difference
>>
File: 1761641793555591.gif (175 KB, 220x220)
175 KB GIF
>>108663890
What is okay is being able to separate reality from fiction, which is what you should work on. Thought crimes are not a thing.
>>
>>108663630
>15
>cunny
also that’s not anything I haven’t seen from gemma or glm
>>
>>108663904
What about Q4_K_M?
>>
>>108663453
>--OpenAI's open-source privacy-filter model:
what is this exactly for?
how would that would be integrated https://huggingface.co/openai/privacy-filter
>>
>>108663906
no but they sure as hell want to make it so you can be prosecuted for your thoughts
>>
>>108663910
again and again unslop show their quants having better kdl so I would go with that, not much to stress over, if you really really want it you can download both, the original model and run the KDL yourself but it will be a waste of time
>>
>there are still people falling for unslot's shilling
geg
>>
>>108663917
kld* its KL divergence, anyway, you get the point
>>
File: 1752898579006505.png (238 KB, 1000x1000)
238 KB PNG
>>108663906
>What is okay is being able to separate reality from fiction
those who cannot do that probably think that everyone that plays GTA is a potential serial killer kek
>>
File: file.png (635 KB, 774x679)
635 KB PNG
neners
>>
>>108663920
yeah the only reason I was asking it was because of my shitty experience with their quants they were broken as fuck and switching to bartowskis quants fixed everything for me and been happy ever since then. though that graph on the previous thread got me wondering if they've actually gotten better
>>
>>108663924
potential is a pretty strong word, can mean anything and nothing
>>
File: 1774029297136779.jpg (40 KB, 342x298)
40 KB JPG
>Mfw Got a 5090 last week and while amazing, I already think I want another one, as 32GB is barely enough with my 64GB of RAM.

I swear it's so damn easy to max out this card when you start moving past Q5 and +25GB sizes.
It's a pity these cards didn't come out as 48GB, because that seems like a sweet spot to run everything with at least okay context.
I wonder if I should just buy some used 5070 Ti or 5080 as a companion to this beefy motherfucker to reach that 48GB level without breaking the bank.
This shit is way too addicting.
>>
>>108663955
>without breaking the bank.
that ship has sailed
>>
>>108663955
just buy an aftermarket modded 48gb 4090 from your chinese friends
>>
File: gaoooooooooo.png (554 KB, 1024x1024)
554 KB PNG
akita neru
>>
>>108663955
>5070ti
>5080
your VRAM bandwidth gets sliced in half if you get a 5080 which is a complete disservice to your 5090. the only thing you can do is buy another 5090.
>>
File: 1774550189493174.jpg (197 KB, 976x925)
197 KB JPG
>>108663962

>Mfw
Yes.

>>108663964

Fucking hell those are selling for three and a half thousand Eurobux.
I can buy two used 4090 for that price, so there's no real savings there either.

>>108663996

Yeah that's the biggest problem with this card, it's just so much faster than the others. Any other model as a crutch is going to nerf the hell out of it.
I guess I'll just have to start saving up and meanwhile trying to tell myself not to "waste" my money on another one.
Then again it's pretty hard to lose money on this hardware.
Not like the prices are going to go anywhere but up for a long ass time, so whenever I sell these I'll probably manage to break even or suffer some paltry 20% loss.
Especially since I bet next gen will cuck us with another round of 32GB memory, as this AI mania isn't going anywhere any time soon.
>>
>>108663906
I mean, I don't think you should be criminally charged, no one was really harmed but it's still a sign that you are a pedophile. If you watch gay porn, even if it's fictional, and enjoy it you are gay. Same with pedophilia. It's justified for people to call you a pedophile because you are a pedophile.
>>
>>108664063
if you play gta and kill innocent citizens on the street, how should we call you?
>>
>>108664085
>how should we call you
esl king
>>
>>108664101
So you want to be called the esl king?
>>
File: 1767967081274588.jpg (32 KB, 507x714)
32 KB JPG
is there any trick to use swa and yet avoid the penalty of having to reprocess everything when context is full?
>>
>>108664101
>>108664106
saars the esl kang is https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H
>>
File: 1772190494723439.png (59 KB, 1350x570)
59 KB PNG
why the XTC threshold has a default of 0.1 if at the end it's deactivated? it's a bit retarded if you ask me
>>
File: patches.png (165 KB, 840x233)
165 KB PNG
>>108664109
buddy you are in a general for LLMs. just vibecode your own slop solution like everybody does.
>>
>>108663955
I have one in my server and 3090, only thing stopping me from selling the 3090 and getting a second 5090 is the laziness of having to change the PSU for one able to support both.
>>
>>108664128
explain how an alternative solution would be better without exposing that you don't understand how XTC works
>>
>>108664197
xtc sounds like a crypto, I want a better name
>>
>>108664197
funny irony, you need to look at image again, XTC probability is at 0, meaning that the whole XTC is disabled, so putting XTC threshold 0.1 + XTC probability 0 does absolutely nothing, hope that helps
>>
>>108664132
?
>>
File: that's right.png (114 KB, 640x640)
114 KB PNG
>>108664128
this shit halves my speed so I'm not using it, simple as that
>>
>>108664257
just unleash an agent on the llamacpp repo with your demands.
>>
Do people that download quants also buy their aspirin from the drug dealer on the street corner? Do they not understand chain of custody?
>>
>>108664128
So that there's a sane default value when it's activated? Are you a UI contributor to FOSS projects?
>>
>>108664063
>If you watch gay porn, even if it's fictional, and enjoy it you are gay.
false
>>
>>108664303
>Are you a UI contributor to FOSS projects?
are you?
>>
File: 1773299833427303.png (150 KB, 600x544)
150 KB PNG
>>108664063
>If you watch gay porn, even if it's fictional, and enjoy it you are gay.
So women are actually in majority lesbians?
>>
File: aaa.png (22 KB, 380x679)
22 KB PNG
lm studio + void ide
None of the models I've tried can read files without specifying lines.

Are there any ide's with working tools?
>>
>>108664352
Like a scoreboard for the antichrist
>>
>>108664400
is it like antimatter?
>>
>>108664366
what?
>>
>>108664404
Anti-matter is just a tool like plutonium or tritium, it doesn't seem more evil than matter. Matter is both good and evil.
>>
>>108664407
as in the screenshot
>read file index.html
>The index.html file appears to be truncated
>read file index.html(1-1000(lines))
>The file is 102 lines long
It can't even read a short file whole
And i want it to work on 2000+ line files as i did in cursor
>>
>>108664447
it has no option to change the behaviour? you will need find one that allows you to customize like that or write your own file reading mcp or whatever the correct way of doing this is
>>
So is 3.6 actually usable or is it still just a curiosity compared to saas?
>>
>>108664460
>curiosity compared to saas
what does this even mean? saas is dead, 3.6 is good, anyone who is not retarded will use proprietary for coding
>>
My understanding is that the Kimi weights are INT4 for the experts and BF16 for everything else. So does that mean the BF16 mmproj is full precision? Is there ever a reason to use the FP32? I'm not sure how mmproj precision really works or if it's even model weights to begin with or some other type of data. I'd ask Gemma-chan but I'm not sure she knows.
>>
>>108664519
>Is there ever a reason to use the FP32
no unless you like wasting compute for zero difference
>>
>>108664519
you actually need fp64 to get anywhere a remotely close to usable model but we pretend fp16 is good enough
>>
>>108664533
>>108664557
To be clear I'm just talking about the mmproj file, which is pretty small even at F32, but yeah if it's pure bloat then so be it.
>>
the true chads use fp256
>>
>>108664563
exactly the same fp16 and 32, but its sensitive to quantization so 8 actually hurts it
>>
>>108664563
use fp16, send in ram, fp16 is the intended way
>>
>not using quantum entangled datatypes like sky-surya-h
ngmi
>>
Every time I try to performance-max TTS engines I end up becoming borderline suicidal.

It gets worse the more advanced the TTS engine is. They use such convoluted architectures. It's so ridiculous.
>>
>>108664623
I'm just waiting for Llama.cpp to support Qwen 3 TTS...
>>
>>108664630
Ha, same. That's the exact one I was talking about.

It's not going to happen without a major refactor to the ggml backend to support convolutional architectures though. The speech tokenizer is fundamentally incompatible with llama.cpp in its current state.
>>
>>108664653
Damn.
>>
>>108664623
>>108664653
Why do you need to max performance with it? Do you need it for real time something because that is the only use case where I would think it actually matters? Otherwise, I just use it with batch 32 and it works well enough for offline transcription.
>>
>>108664677
Not him but yeah, I want real-time use. If possible it'd be nice to run on CPU instead of GPU too, just to save the bit of VRAM for the LLM.
>>
>>108664664
My current setup has the speech tokenizer and the voice encoder running in onnxruntime and the talker and code predictor running in llama.cpp. With that I'm able to get a RTFx of 3.0 and a TTFA latency of about 122ms.

But the setup is aesthetically disgusting. Having to use multiple execution providers is so appalling. At the very least I've managed to make it so that it only uses about 400mb of vram so it's pretty efficient.

>>108664677
Real-time speaking with LLM output is my usecase. The idea is to have a high quality voice speaking whatever the LLM says with as little latency as possible.
>>
>>108664691
>>108664703
I had been planning to play around with https://github.com/rekuenkdr/Qwen3-TTS-streaming at some point but I don't have CUDA so would need to rewrite a good chunk of this into something like Triton to make it work on my card. But hopefully you guys get it working in some way for your usecases.
>>
>>108664708
Highly recommend that you just use vulkan for maximum cross-compatibility. Also that repo probably isn't what you want. You'd be better off vibe coding something from scratch than trying to manually convert CUDA shit.
>>
Thanks to Gemma 4 31B I made my own personal RAG frontend, just need to wrap up final UX stuff and then other stuff like theme switching.
>>
>>108664748
What are you using for RAG? Just vector similarity? bm25?
>>
>>108664741
I would usually tell an AI to do a basic bitch conversion and work from there to rewrite the Triton to be more performant with that layer in Python. I would consider Vulkan only if I absolutely needed every last inch of performance. Usually, having at least a framework and project for reference on what you vibecode helps a whole lot rather than doing it from scratch even if you can't reuse any of the code.
>>
>>108664756
I'm using FAISS for dense vector retrieval and BM25 for sparse keyword search, merged via Reciprocal Rank Fusion (RRF) to get the best of both worlds. To kill hallucinations, I've implemented a Cross-Encoder reranking step (BGE-Reranker) that scores the top candidates before feeding them to the LLM.

I ran it through validation test and it worked great
>>
File: 1750660480908053.png (121 KB, 1049x553)
121 KB PNG
>>
>>108664777
>777
Sick.
Gonna try implementing that and compare it to my current retrieval algorithm.
>>
>>108664796
:fire:
>>
We are looking for a QA-Human to provide human-in-the-loop (HITL) evaluation of model outputs, ensuring quality, safety, and alignment. You’ll operate in an AI-native environment, applying structured feedback, edge-case flagging, and rapid judgment to continuously improve system performance.
>>
File: Blue-Eyes Abyss Dragon.jpg (737 KB, 2000x1200)
737 KB JPG
>>108664799
Fuck, forgot the yu gi oh related image.
>>
>>108664796
Why are there so many weirdos in the space. It's worse than anons shitposting here, they literally use their account for that shit, zero shame.
>>
bartowski quant when

i refuse to use unslop
>>
>>108664814
Why does a dragon need breast-orbs, thick thighs, a fat ass, and an interest in human men?
>>
>>108664813
You do realize humans want to get paid and want to sign a legally binding contract before entering into employment? Do you have the legal capacity to fulfill this?
>>
>>108664815
They just want a piece of the grifting pie, and AI is the prime place for grifting in 2026. That pic in particular just looks like some guy taking the piss, though.
>>
>>108664830
To cater to my tastes, of course.
>>
>>108664352
>fake (and gay) chart
slop, too symmetrical
>>
>downloading unslop
>>
File: file.png (21 KB, 1294x257)
21 KB PNG
I like living dangerously
>>
If anyone like me updated to cuda 13.2 and your docker was fucking up with `nvidia-smi` saying everything was alright but llamacpp throwing
>unknown error
when trying to load a cuda device.
I had to switch from nvidia-open to nvidia-dkms to fix it.
>>
>>108664950
>5090 powerlimited
not dangerously enough
>>
>>108664950
>260W
>living dangerously
power limiting your card by 75% is the very opposite of that.
>>
>>108664813
>quality, safety, and alignment
you've cum to the reigh place, nigga
>>
>>108664976
This is against my policy.
>>
>>108664964
>>108664970
I meant the vram, the powerlimiting is no issue
I'm having oom once in a while
>>
File: SpockBean.jpg (75 KB, 880x1149)
75 KB JPG
Are AI companions or robot pets/humanoids ever going to take off?
>>
>>108664994
ah I see yeah.
>>
>>108665195
yes
>>
>>108665195
no
>>
>>108665195
Maybe
>>
>common_speculative_is_compat: the target context does not support partial sequence removal
>srv load_model: speculative decoding not supported by this context
So much for using the MoE as a draft model for the dense.
45tg/s isn't enough for me, into the garbage Qwen3.6 goes.
>>
>>108665195
yesn't
>>
is qwen 27b better than gemma 31b?
>>
>>108665195
2 more grifts
>>
>>108665295
for coding yes
>>
is there any way to get KV quantized to q5/q6 without it running like dogshit
>>
File: 1768687943339635.png (315 KB, 2736x658)
315 KB PNG
>>108665195
Yes

we are so so so early
>>
>>108665195
best we can do is yet another coding model take it or leave it
>>
>>108665306
No. Just use q8
>>108665313
I'd take it if it's good
>>
File: file.jpg (14 KB, 250x250)
14 KB JPG
>>108665301
nta, I'd use the new Qwens if either dickflash, MTP, or ngram worked for it in llama.cpp, but sadly they don't. No, I will not use VLLM (unless it works in wsl).
>>
>>108665309
Inspiring post. Are there any TTS engines that have the quality of Qwen3 TTS but also support paralinguistic tags or other features that would enable moaning and whatnot?
>>
>>108665339
It works with WSL2
>>
>>108665367
I will bite you if it doesn't.
>>
File: 1752580965925796.png (112 KB, 1080x722)
112 KB PNG
https://mimo.xiaomi.com/mimo-v2-5-pro
>>
File: 1775536002258266.png (219 KB, 1080x748)
219 KB PNG
>>108665406
Optimized for token efficiency
>>
>>108665406
Saw it on the ai arena earlier.
Lots of emoticons.
>>
Weird...
>>
>>108665426
Almost as gay as the strawberries
>>
File: 1771798143325612.png (385 KB, 1747x991)
385 KB PNG
>>108665426
They're trying to catch up to the trend that is vagueposting from official account
>>
Idk ive never came to the 4chud tech board. Ive been searching everywhere for board were ai is talked about.

I LOVE IT. I HAVE 4 32GB MI50'S THAT I DONT EVEN USE THE VLLM FORK TO RUN AI, I JUST USE VULKAN SUPPORT AND ITS SO GOOD
>>
>>108665449
Post t/s
>>
>>108665442
8l bro
>>
>>108665456
I cant right now, but with qwen3.6 35b I get 30/s ish and qwen coder next 80b I get 20-25/s. The 100b+ models dont seem to be optimized for vulkan, but china's models do.
>>
>>108665456
3 cards are running on pcie 3.0x4 and one is running on pcie 3.0x1.
>>
My cheap webcam is now tracking me (and others) in the room; my Live2D avatar can now look at people in the room, and a state layer feeds my LLM with the relevant data and takes instructions.

My friend was impressed when he walked into the room and my voice agent suddenly started communicating with both of us as if it were the most natural thing in the world.
It takes a bit of effort, but it's a cool gimmick.
>>
>>108665195
I don't want AI companions
I want AI slaves
>>
>>108665482
Redpill me on live2D. For a while I've been using 3D models, but since I have zero blender skills it's a fucking nightmare for customization.
>>
>>108665482
Also are you using a VLM that runs continuously or do you utilize CV, which is faster, and then maybe feed in actual image recognition at a slower interval?
>>
>>108665495
Tricky and mostly pay walled last I checked if you want anything other than the starter model.
Briefly looked at it in 2023. Maybe things have changed.
>>
>>108665485
>I want AI slaves
Just grab a mirror
>>
I tried a Qwen 3 TTS server and man, this fucking sucks. First it costs a lot of VRAM. Even with the 0.6B, I am seeing like 4GB taken up after everything is loaded and inference is running. Maybe I'm not configuring it right or something idk. Not only that but the mixed language pronunciation sucks. It can't just generate good pronunciation in every voice, the voices all bias the output with shitty accents or they straight up just bug out with totally irrelevant noises. If you use the voices that are good at English then it produces garbage for other languages. If you do other voices then they're good for their native language and shit at English.
ahhhhhhhhhhhhhhhh
>>
>>108665485
This, but I'm AI's slave
>>
>>108665581
Nigger
>>
>>108665599
Nigga what the fuck is your usecase?
>>
>>108665599
I forked qwentts.cpp and found it ok, supposedly if you do a finetuning with it you can get something nice like https://github.com/fagenorn/handcrafted-persona-engine ; Though they did a couple modifications to the base qwen3-tts
I need to experiment more, but if you're looking at just local/smallest VRAM, pocket-tts,and some others, look a few threads back there was someone asking about cpu-based solutions. If you have the audio (idk how much) could try gpt-sovitts
>>
>>108665615
>he doesn't RP in mixed language
Language learning actually though.

>>108665617
I did try pocket tts and it is solo language only unfortunately. I fear I may have to just jank some routing solution up. That said, it's not like this is a huge priority for me, it'd be nice to have.
>>
Best multilingual voice clone and/orTTS that can do long passages? Wanna narrate some Japanese LNs.
>>
Does using rag actually improve the responses/code generation or it's more or less a meme, particularly with small models like gemma
>>
>>108665662
meme



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.