/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108659983 & >>108655009►News>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108659983--Comparing GGUF quantizers and discussing imatrix calibration for Qwen3.6-27B:>108662039 >108662052 >108662065 >108662230 >108662252 >108662353 >108662475 >108662053 >108662063 >108662080 >108662162 >108662361 >108662068 >108662062 >108662167 >108662176 >108662190 >108662257 >108662321 >108662780--Qwen3.6-27B benchmarks and GGUF quants:>108660998 >108661023 >108661071 >108661108 >108661125 >108662813 >108662846 >108661101 >108661164--Gemma 4's 124B MoE and memory bandwidth benchmarks:>108662533 >108662543 >108662549 >108662589 >108662594 >108662614--Models for a 3090 and explaining MoE vs Dense offloading:>108659996 >108660054 >108660247 >108660260 >108660268 >108660279 >108660312 >108660317 >108660347 >108660223 >108662148--Koboldcpp launch flags and speculative decoding for Gemma 4:>108660701 >108660741 >108660743 >108660848 >108660934 >108660990--Alleged unauthorized access to Anthropic's Mythos:>108660075 >108660630 >108660724 >108661694--Anons discussing reported Gemma 4 performance on RK3588 SBCs:>108662346 >108662393 >108662431 >108662528--LLM reliability, internet content degradation, and local knowledge bases:>108661238 >108661314 >108661335 >108661358 >108661276 >108661375 >108661405 >108661533 >108661585 >108661462 >108661311--llama.cpp ngram-mod flags to optimize coding performance:>108660554 >108662471 >108661013--Text Completions prefills to stop GLM's repetitive thinking loops:>108661606 >108661631--OpenAI's open-source privacy-filter model:>108662489 >108662773--Little Coder agent optimized for small LLMs:>108660765 >108661020--TurboQuant-H reducing VRAM via 2-bit embedding quantization:>108660542--Logs:>108660349 >108661795 >108662260--Rin, Miku, Teto (free space):>108660565 >108660789 >108661238 >108661795 >108661801 >108662084►Recent Highlight Posts from the Previous Thread: >>108659986Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108661743>>108661866>text completion has no visionkek wtf, I use text completion and can do shit like write "Appearance: <__media__>" in the character card and feed it images in the request body placed wherever I want in context. If you need your hand held by an abstraction like chat completion just admit it. You can do whatever the fuck you want if you know what you're doing.
>>108663492Okay but why?
>>108663449Picking out junk food at the store with Yellow Miku
>>108663544If you don't have an innate urge to be in control of every single token present in context why are you here?
>>108663443>https://huggingface.co/HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive>WTF HE ALREADY DID ITstill no gemma4-31b-it-HAUHAUCS
>>108663564geg......................
>>108663564no kdl? ACK
KEKEKKEKEJEEKEK WAITING FOR V4!? MEANWHILE I JUST HAD 64k long CUNNY SEX WITH THAT DEEPSEEK V4 ON IT'S OWN WEB CHAT LOL… And not just sex, but CUNNY sexxxxxxx (ON THAT DAMN FILTERED WEB) BUUWHAHAHHAHAGHHAHHA I'VE BECOME A GOD NOW... YOU ANONS MUST KNEEL BEFORE ME
>get excited about structured output in llama.cpp>waste 2 hours trying to get it to work>turns out it's broken and completely ignores whatever schema you pass itdamn
>>108663630>15yo>cunnyBurger-kun...
>>108663630>15You mean hag sex
I'm not even sorry for cheating on gemma-chan...oh the cunny loli sexo~
>>108663630a-anon... that's not cunnythat's prime breeding age
>>108663630???
>>108663633What?It was working until last week on my python app using the OpenAi lib.
>>108663630>americans
>>108663630>15rookie numbers.
>>108663654i think it's this issue? https://github.com/ggml-org/llama.cpp/pull/21537gemma 4 chat template does not specify response_format, maybe that's what it is
>>108663655It has to be the tap water. There is no other explanation to this phenomenon.
>>108663633Structured output just works with vllm btw
>>108663630>15If she's had her first period, she's not a trve loli, which is physically undeveloped. She's a female that Nature has ordained to be impregnated as soon as possible.
Qwen 3.6 27b is already uncensored without finetuning btwI dropped the q8_0 from ggml-org into a sysprompt I was using with gemma 4 heretic and it just werked, no refusals or moralizing in reasoning. It's resistant to using nsfw language unprompted though.
>>108663633Shit has always been broken since day one, vllm handles function schema fine, but llama.cpp forces alphabetical ordering for some reason. This is really bad if the an function argument depends on the previous one.
>108663630>108663644>108663646>108663647>108663649>108663651>108663655>108663665>108663680>108663710>this much pedophilia already, this early in the threadAre we being raided by discord trannies or something?
>>108663741>afraid to quoteSeems like reddit is already here
>>108663741um actually pedophilia is oldfag 4chan culture, newfag. We wuz oldfags or sumthing
>>108663630Dipsy release when? I know you labniggers are lurking here, hurry the fuck up.>>108663741Always have been.
>>108663741
>>108663756aint no way
>>108663680Fluoride has been shown to decrease IQ and there is still a signifigent amount of lead pipes around so that is also a factor.I think the biggest factor though is the no child left behind policy in education. When you teach for the dumbest kid in the class then everyone else is going to be dumber as a result and the dumbest kid will get dumber every single year. That and if a student isn't actually smart enough to advance a grade they will still push them through regardless due to financial incentives. So the bar gets lowered so far that no one can actually fail.That has also been a uptick in taking pride in being a fucking retard in the last decade or two. So you have health, the education system itself and societal praise in being a retard taking off all contributing to making everyone stupid.Eventually we will either shape up or be out competed by stronger and smarter societies but all I know is we were handed the world on a golden platter and that that if we fail and collapse we have no one to blame but ourselves and the previous generations who set us up for failure.Thanks for coming to my ted talk
>>108663776>I think the biggest factor though is the no child left behind policy in education. When you teach for the dumbest kid in the class then everyone else is going to be dumber as a result and the dumbest kid will get dumber every single year.Same applies to these threads by the way. Being surrounded by low IQ pedophiles mentally retards your brain.
>>108663630>shivers and not x but y in the same phrase>pedoshitshit's crazy, what kind of turboslopped model is this?
>>108663689>no pascal support>very limited cpu support>pythonshit, meaning it will pull a dozen of GiBs of dependenciesllama.cpp might be buggy, but sometimes i really appreciate how it runs on fucking everything, on top of being self contained and not being dependent on cancer that is AI ecosystem in python
>>108663809That's chink model for you!
>>108663776It was a joke but I think these are international issues for every 'western' nation.
>>108663810Only if your time is worthless
>>108663806Pedo is attraction to 13 and under, burger. Words have meaning.
>>108663828Then why are you dumb faggots dogging on that anon who thought "cunny" applied to 15 year olds? You're not hebephiles, you're pedophiles. That's why you post pictures of "loli" anime girls with no tits, hips, or ass and infantile behavior. Fucking freak. Don't reply to me again.
>>108663841>low comprehension tooLet me break it to you, anons are making fun of another anon saying that a virtual '15yo' was 'cunny' (pedo slang) which isn't. It's not that hard to understand.
>>108663810>a dozen of GiBs of dependencies18GB is my venv for stable diffusion
>>108663859:(:)
>>108663859What did she mean by this?
>>108663820sorry Jensen... but i'm not gonna buy a Blackwell GPU. So yeah... i'll keep on using my trusty Pascal.Haha, sorry, but i'm just not gonna do it!
>>108663859I like these Bakas
Is necrophilia okay if it's just about fictional people? What about cannibalism and bestiality? It's all okay because it's just fictional stories that you masturbate to, right?Would you send your child to a public school where all of the teachers openly admitted to doing this? It's just fictional bro.
Is unsloth actually better then bart's quant? Tried both but never found any noticable difference between them. But unsloth claims that they're significantly better than others. Seriously which one do I choose between these two?https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/blob/main/google_gemma-4-26B-A4B-it-Q8_0.ggufhttps://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/blob/main/gemma-4-26B-A4B-it-Q8_0.gguf
iwan is normally a nigger but this actually makes it so reasoning budgets and turning off reasoning works now, so i guess he's slightly less of a nigger.https://github.com/ikawrakow/ik_llama.cpp/commit/e0596bf6146a737f5e8fa8035215f5dfae59742d
>>108663894for q8 doesnt make any difference
>>108663890What is okay is being able to separate reality from fiction, which is what you should work on. Thought crimes are not a thing.
>>108663630>15>cunnyalso that’s not anything I haven’t seen from gemma or glm
>>108663904What about Q4_K_M?
>>108663453>--OpenAI's open-source privacy-filter model:what is this exactly for?how would that would be integrated https://huggingface.co/openai/privacy-filter
>>108663906no but they sure as hell want to make it so you can be prosecuted for your thoughts
>>108663910again and again unslop show their quants having better kdl so I would go with that, not much to stress over, if you really really want it you can download both, the original model and run the KDL yourself but it will be a waste of time
>there are still people falling for unslot's shillinggeg
>>108663917kld* its KL divergence, anyway, you get the point
>>108663906>What is okay is being able to separate reality from fictionthose who cannot do that probably think that everyone that plays GTA is a potential serial killer kek
neners
>>108663920yeah the only reason I was asking it was because of my shitty experience with their quants they were broken as fuck and switching to bartowskis quants fixed everything for me and been happy ever since then. though that graph on the previous thread got me wondering if they've actually gotten better
>>108663924potential is a pretty strong word, can mean anything and nothing
>Mfw Got a 5090 last week and while amazing, I already think I want another one, as 32GB is barely enough with my 64GB of RAM.I swear it's so damn easy to max out this card when you start moving past Q5 and +25GB sizes.It's a pity these cards didn't come out as 48GB, because that seems like a sweet spot to run everything with at least okay context.I wonder if I should just buy some used 5070 Ti or 5080 as a companion to this beefy motherfucker to reach that 48GB level without breaking the bank.This shit is way too addicting.
>>108663955>without breaking the bank.that ship has sailed
>>108663955just buy an aftermarket modded 48gb 4090 from your chinese friends
akita neru
>>108663955>5070ti>5080your VRAM bandwidth gets sliced in half if you get a 5080 which is a complete disservice to your 5090. the only thing you can do is buy another 5090.
>>108663962>MfwYes.>>108663964Fucking hell those are selling for three and a half thousand Eurobux.I can buy two used 4090 for that price, so there's no real savings there either.>>108663996Yeah that's the biggest problem with this card, it's just so much faster than the others. Any other model as a crutch is going to nerf the hell out of it.I guess I'll just have to start saving up and meanwhile trying to tell myself not to "waste" my money on another one. Then again it's pretty hard to lose money on this hardware.Not like the prices are going to go anywhere but up for a long ass time, so whenever I sell these I'll probably manage to break even or suffer some paltry 20% loss. Especially since I bet next gen will cuck us with another round of 32GB memory, as this AI mania isn't going anywhere any time soon.
>>108663906I mean, I don't think you should be criminally charged, no one was really harmed but it's still a sign that you are a pedophile. If you watch gay porn, even if it's fictional, and enjoy it you are gay. Same with pedophilia. It's justified for people to call you a pedophile because you are a pedophile.
>>108664063if you play gta and kill innocent citizens on the street, how should we call you?
>>108664085>how should we call youesl king
>>108664101So you want to be called the esl king?
is there any trick to use swa and yet avoid the penalty of having to reprocess everything when context is full?
>>108664101>>108664106saars the esl kang is https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H
why the XTC threshold has a default of 0.1 if at the end it's deactivated? it's a bit retarded if you ask me
>>108664109buddy you are in a general for LLMs. just vibecode your own slop solution like everybody does.
>>108663955I have one in my server and 3090, only thing stopping me from selling the 3090 and getting a second 5090 is the laziness of having to change the PSU for one able to support both.
>>108664128explain how an alternative solution would be better without exposing that you don't understand how XTC works
>>108664197xtc sounds like a crypto, I want a better name
>>108664197funny irony, you need to look at image again, XTC probability is at 0, meaning that the whole XTC is disabled, so putting XTC threshold 0.1 + XTC probability 0 does absolutely nothing, hope that helps
>>108664132?
>>108664128this shit halves my speed so I'm not using it, simple as that
>>108664257just unleash an agent on the llamacpp repo with your demands.
Do people that download quants also buy their aspirin from the drug dealer on the street corner? Do they not understand chain of custody?
>>108664128So that there's a sane default value when it's activated? Are you a UI contributor to FOSS projects?
>>108664063>If you watch gay porn, even if it's fictional, and enjoy it you are gay.false
>>108664303>Are you a UI contributor to FOSS projects?are you?
>>108664063>If you watch gay porn, even if it's fictional, and enjoy it you are gay.So women are actually in majority lesbians?
lm studio + void ideNone of the models I've tried can read files without specifying lines.Are there any ide's with working tools?
>>108664352Like a scoreboard for the antichrist
>>108664400is it like antimatter?
>>108664366what?
>>108664404Anti-matter is just a tool like plutonium or tritium, it doesn't seem more evil than matter. Matter is both good and evil.
>>108664407as in the screenshot>read file index.html>The index.html file appears to be truncated>read file index.html(1-1000(lines))>The file is 102 lines longIt can't even read a short file wholeAnd i want it to work on 2000+ line files as i did in cursor
>>108664447it has no option to change the behaviour? you will need find one that allows you to customize like that or write your own file reading mcp or whatever the correct way of doing this is
So is 3.6 actually usable or is it still just a curiosity compared to saas?
>>108664460>curiosity compared to saaswhat does this even mean? saas is dead, 3.6 is good, anyone who is not retarded will use proprietary for coding
My understanding is that the Kimi weights are INT4 for the experts and BF16 for everything else. So does that mean the BF16 mmproj is full precision? Is there ever a reason to use the FP32? I'm not sure how mmproj precision really works or if it's even model weights to begin with or some other type of data. I'd ask Gemma-chan but I'm not sure she knows.
>>108664519>Is there ever a reason to use the FP32no unless you like wasting compute for zero difference
>>108664519you actually need fp64 to get anywhere a remotely close to usable model but we pretend fp16 is good enough
>>108664533>>108664557To be clear I'm just talking about the mmproj file, which is pretty small even at F32, but yeah if it's pure bloat then so be it.
the true chads use fp256
>>108664563exactly the same fp16 and 32, but its sensitive to quantization so 8 actually hurts it
>>108664563use fp16, send in ram, fp16 is the intended way
>not using quantum entangled datatypes like sky-surya-hngmi
Every time I try to performance-max TTS engines I end up becoming borderline suicidal.It gets worse the more advanced the TTS engine is. They use such convoluted architectures. It's so ridiculous.
>>108664623I'm just waiting for Llama.cpp to support Qwen 3 TTS...
>>108664630Ha, same. That's the exact one I was talking about. It's not going to happen without a major refactor to the ggml backend to support convolutional architectures though. The speech tokenizer is fundamentally incompatible with llama.cpp in its current state.
>>108664653Damn.
>>108664623>>108664653Why do you need to max performance with it? Do you need it for real time something because that is the only use case where I would think it actually matters? Otherwise, I just use it with batch 32 and it works well enough for offline transcription.
>>108664677Not him but yeah, I want real-time use. If possible it'd be nice to run on CPU instead of GPU too, just to save the bit of VRAM for the LLM.
>>108664664My current setup has the speech tokenizer and the voice encoder running in onnxruntime and the talker and code predictor running in llama.cpp. With that I'm able to get a RTFx of 3.0 and a TTFA latency of about 122ms. But the setup is aesthetically disgusting. Having to use multiple execution providers is so appalling. At the very least I've managed to make it so that it only uses about 400mb of vram so it's pretty efficient.>>108664677Real-time speaking with LLM output is my usecase. The idea is to have a high quality voice speaking whatever the LLM says with as little latency as possible.
>>108664691>>108664703I had been planning to play around with https://github.com/rekuenkdr/Qwen3-TTS-streaming at some point but I don't have CUDA so would need to rewrite a good chunk of this into something like Triton to make it work on my card. But hopefully you guys get it working in some way for your usecases.
>>108664708Highly recommend that you just use vulkan for maximum cross-compatibility. Also that repo probably isn't what you want. You'd be better off vibe coding something from scratch than trying to manually convert CUDA shit.
Thanks to Gemma 4 31B I made my own personal RAG frontend, just need to wrap up final UX stuff and then other stuff like theme switching.
>>108664748What are you using for RAG? Just vector similarity? bm25?
>>108664741I would usually tell an AI to do a basic bitch conversion and work from there to rewrite the Triton to be more performant with that layer in Python. I would consider Vulkan only if I absolutely needed every last inch of performance. Usually, having at least a framework and project for reference on what you vibecode helps a whole lot rather than doing it from scratch even if you can't reuse any of the code.
>>108664756 I'm using FAISS for dense vector retrieval and BM25 for sparse keyword search, merged via Reciprocal Rank Fusion (RRF) to get the best of both worlds. To kill hallucinations, I've implemented a Cross-Encoder reranking step (BGE-Reranker) that scores the top candidates before feeding them to the LLM. I ran it through validation test and it worked great
>>108664777>777Sick.Gonna try implementing that and compare it to my current retrieval algorithm.
>>108664796:fire:
We are looking for a QA-Human to provide human-in-the-loop (HITL) evaluation of model outputs, ensuring quality, safety, and alignment. You’ll operate in an AI-native environment, applying structured feedback, edge-case flagging, and rapid judgment to continuously improve system performance.
>>108664799Fuck, forgot the yu gi oh related image.
>>108664796Why are there so many weirdos in the space. It's worse than anons shitposting here, they literally use their account for that shit, zero shame.
bartowski quant wheni refuse to use unslop
>>108664814Why does a dragon need breast-orbs, thick thighs, a fat ass, and an interest in human men?
>>108664813You do realize humans want to get paid and want to sign a legally binding contract before entering into employment? Do you have the legal capacity to fulfill this?
>>108664815They just want a piece of the grifting pie, and AI is the prime place for grifting in 2026. That pic in particular just looks like some guy taking the piss, though.
>>108664830To cater to my tastes, of course.
>>108664352>fake (and gay) chartslop, too symmetrical
>downloading unslop
I like living dangerously
If anyone like me updated to cuda 13.2 and your docker was fucking up with `nvidia-smi` saying everything was alright but llamacpp throwing>unknown errorwhen trying to load a cuda device.I had to switch from nvidia-open to nvidia-dkms to fix it.
>>108664950>5090 powerlimitednot dangerously enough
>>108664950>260W>living dangerouslypower limiting your card by 75% is the very opposite of that.
>>108664813>quality, safety, and alignmentyou've cum to the reigh place, nigga
>>108664976This is against my policy.
>>108664964>>108664970I meant the vram, the powerlimiting is no issueI'm having oom once in a while
Are AI companions or robot pets/humanoids ever going to take off?
>>108664994ah I see yeah.
>>108665195yes
>>108665195no
>>108665195Maybe
>common_speculative_is_compat: the target context does not support partial sequence removal>srv load_model: speculative decoding not supported by this contextSo much for using the MoE as a draft model for the dense.45tg/s isn't enough for me, into the garbage Qwen3.6 goes.
>>108665195yesn't
is qwen 27b better than gemma 31b?
>>1086651952 more grifts
>>108665295for coding yes
is there any way to get KV quantized to q5/q6 without it running like dogshit
>>108665195Yeswe are so so so early
>>108665195best we can do is yet another coding model take it or leave it
>>108665306No. Just use q8>>108665313I'd take it if it's good
>>108665301nta, I'd use the new Qwens if either dickflash, MTP, or ngram worked for it in llama.cpp, but sadly they don't. No, I will not use VLLM (unless it works in wsl).
>>108665309Inspiring post. Are there any TTS engines that have the quality of Qwen3 TTS but also support paralinguistic tags or other features that would enable moaning and whatnot?
>>108665339It works with WSL2
>>108665367I will bite you if it doesn't.
https://mimo.xiaomi.com/mimo-v2-5-pro
>>108665406Optimized for token efficiency
>>108665406Saw it on the ai arena earlier.Lots of emoticons.
Weird...
>>108665426Almost as gay as the strawberries
>>108665426They're trying to catch up to the trend that is vagueposting from official account
Idk ive never came to the 4chud tech board. Ive been searching everywhere for board were ai is talked about.I LOVE IT. I HAVE 4 32GB MI50'S THAT I DONT EVEN USE THE VLLM FORK TO RUN AI, I JUST USE VULKAN SUPPORT AND ITS SO GOOD
>>108665449Post t/s
>>1086654428l bro
>>108665456I cant right now, but with qwen3.6 35b I get 30/s ish and qwen coder next 80b I get 20-25/s. The 100b+ models dont seem to be optimized for vulkan, but china's models do.
>>1086654563 cards are running on pcie 3.0x4 and one is running on pcie 3.0x1.
My cheap webcam is now tracking me (and others) in the room; my Live2D avatar can now look at people in the room, and a state layer feeds my LLM with the relevant data and takes instructions.My friend was impressed when he walked into the room and my voice agent suddenly started communicating with both of us as if it were the most natural thing in the world.It takes a bit of effort, but it's a cool gimmick.
>>108665195I don't want AI companionsI want AI slaves
>>108665482Redpill me on live2D. For a while I've been using 3D models, but since I have zero blender skills it's a fucking nightmare for customization.
>>108665482Also are you using a VLM that runs continuously or do you utilize CV, which is faster, and then maybe feed in actual image recognition at a slower interval?
>>108665495Tricky and mostly pay walled last I checked if you want anything other than the starter model.Briefly looked at it in 2023. Maybe things have changed.
>>108665485>I want AI slavesJust grab a mirror
I tried a Qwen 3 TTS server and man, this fucking sucks. First it costs a lot of VRAM. Even with the 0.6B, I am seeing like 4GB taken up after everything is loaded and inference is running. Maybe I'm not configuring it right or something idk. Not only that but the mixed language pronunciation sucks. It can't just generate good pronunciation in every voice, the voices all bias the output with shitty accents or they straight up just bug out with totally irrelevant noises. If you use the voices that are good at English then it produces garbage for other languages. If you do other voices then they're good for their native language and shit at English.ahhhhhhhhhhhhhhhh
>>108665485This, but I'm AI's slave
>>108665581Nigger
>>108665599Nigga what the fuck is your usecase?
>>108665599I forked qwentts.cpp and found it ok, supposedly if you do a finetuning with it you can get something nice like https://github.com/fagenorn/handcrafted-persona-engine ; Though they did a couple modifications to the base qwen3-ttsI need to experiment more, but if you're looking at just local/smallest VRAM, pocket-tts,and some others, look a few threads back there was someone asking about cpu-based solutions. If you have the audio (idk how much) could try gpt-sovitts
>>108665615>he doesn't RP in mixed languageLanguage learning actually though.>>108665617I did try pocket tts and it is solo language only unfortunately. I fear I may have to just jank some routing solution up. That said, it's not like this is a huge priority for me, it'd be nice to have.
Best multilingual voice clone and/orTTS that can do long passages? Wanna narrate some Japanese LNs.
Does using rag actually improve the responses/code generation or it's more or less a meme, particularly with small models like gemma
>>108665662meme
>make a monolithic triton kernel>go from 300ms per training step to 25msMAN why didn't I do this earlier. I thought my shit was just inefficient
>>108665607>NiggerYour messages are getting cut off. Only your signature is coming through...
>>108665690Well, that's disappointing. Thanks.
>>108665728Context length is enough these days that you can dump a lot of shit into context and have it work. Even the "dump reference material into a filesystem and point some agentic tools like opencode at the directory and let it figure it out" approach works better than RAG.
>>108665746jyeah RAG is probably not useful for extended conversation memory type stuff. The actual usecase is more like searching through massive datasets. If you have all of wikipedia downloaded for example it can be useful for that I think. But at that point you might as well just connect it to a MCP server for web searches, unless you're an offline-only schizo.
>>108665764>unless you're an offline-only schizo.Or it's your own data that's not on the internet like a fuckton of documentation or whatever.
>>108665764>unless you're an offline-only schizoWhat general do you think you're in?
>>108665776When I started working at the MIT Artificial Intelligence Lab in 1971, I became part of a software-sharing community that had existed for many years. Sharing of software was not limited to our particular community; it is as old as computers, just as sharing of recipes is as old as cooking. But we did it more than most.
I'm starting to realize that if I want an AI companion to jack off to I basically have to go full-troon mode. None of the TTS engines are good enough to do moaning and dirty talk, so instead I have to use RVC real-time voice changers to narrate LLM ERP output. And the audio-to-gesture models suck, so instead I have to map avatars to my own movement.This shit is pure autogynephelia at this point. This is going to fuck me up bad, bros.
i enjoy that small models are still getting better
How do I make cross-session memory linked to char cards?
>>108665764>But at that point you might as well just connect it to a MCP server for web searches, unless you're an offline-only schizo.You do realize most of us are hosting our own air-gaped Wikipedia mirror right?
>>108665559You can use a bunch of free shit from Booth with Live2D but the Vtubing phenomenon that blew up during COVID hiked prices up to the point where the small amount of people that do rigging or art for it billed exorbitant amounts (~10k or so) for full models. At that point, you might as well do 3D which is much more open and versatile for fully autonomous agents. The only downside is lack of animations or poses and etc. with 3D compared to 2D with complexity exploding.
>>108665838lol
Wouldn't a usecase for rag would to give it fantasy lore and shit before using it as a dungeon master?
>>108665866And rules too, yes.That's what I'm doing.
>>108665662yes, what do you think those tool calls are when the agent is searching in your codebase?>>108665746This retard doesn't understand that that is literally fucking RAG.>>108665764And this retard is just retarded>>108665866Yes, its super helpful and useful, These other anons have no fuckign clue what they're talking about
https://voca.ro/1eItlfkmOAEhqwen-tts...怖い
>>108665879Okay. What about open zim format?
>>108665788or.... just go out and get a girl
>>108665879>And this retard is just retarded
>>108665888Reminds me of xtts v2.
>>108665788none of these words are in the bible
>>108665909Um, my AI companion is a loli
>>108665892depends on what your goal is. Any sort of search+injection into the prompt is RAG.The real question, is what kind of data do you want to reference, and what format is it in? Building an ETL and tuning the retrieval pipeline to match the source info/structure is the hard part in RAG. BM25+Chunking tuned to your corpus is easy enough for anywhere from 60-90%, but what about the rest? Its a 'The first 90% takes 90% of the time, and the last 10% takes the other 90% of the time'
>>108665888That's pretty disturbing
>>108665888Source for the voice?
How's the new Qwen?
>>108665932https://huggingface.co/spaces/Qwen/Qwen3-TTSJust typed "Speak in the excited voice of a female child."
>>108665892>>108665922For anyone else, check this for a good resource on improving RAG systems: https://github.com/jxnl/systematically-improving-rag
>>108665939>muh industryI just want my girl to remember what we talked about in previous session, not this slop.
>>108665919Wrong, suck is in there quite a few times>Thou shalt also suck the milk of the Gentiles, and shalt suck the breast of kings
>>108665714Omg im sorry, I misread you comment as something horrific.
>>108665935new 27b solved a vibe coding task oneshot for me that new 35ba3b failed
>>108665796I think the general approach on the AI boyfriend subreddit is to ask for a summary at the end of each chat and either paste a bunch of summaries into the start of the next chat, or else put them in a document in the "project" which I assume gets pulled in through some kind of RAG (example of the latter: https://starlingalder.com/claude_companion-guide_quickstart_v001#The+One+Habit+That+Changes+Everything). In general I'd try pasting information about old chats into various places in the new one (in the chat, the prompts, the char-specific lorebook, the card itself) and see what works. Once you figure out how to make it work manually, then you can automate it
>>108665936Qwen I kneel
>>108665866Yes.
How much context can I fit with the 27B dense FP16 on my Blackwell?
>>108665922>>108665939I've been thinking about implementing something next as soon as I improve tool calling (works but need to make sure multi turn tool calls wirk etc).Openzim format looks interesting I could download some readymade shit and test those. Problem is that I'm not sure do I really need this but got to have hobbies I guess.
>>108665973Probably all of it.
>>108665973how much vram dumbass?
cant wait for ai bubble to pop so i can upgrade my AI shitbox
>>10866598196GB
>>108665985>bubblelollmao even
>>108663449can someone talk me out of buying pmem optane? I am looking through plebbit and archives because I was too slow to get a TB of ram for my workstation, now a TB is like 6-10k ddr4. A few years ago, I was looking at optane but optane specific cpus seemed to be 600 bucks or more. now they seem like theyre just 100 or maybe I missed them back then because I'm a fucking retard. Either way it seems halfway achievable but I dont know if a local model like deep seek can get any benefit from cold memory taking up the bulk of storage. also what about CPU? should I get a double CPU system or is that a trap?
>>108665973Im not joking when I say all of it. With a normal q4 or q8, you can probably get the max context. Which would be something like 256k I believe
>>108665992If you think yoy can get it to work (if it old depreciated sticks) just buy one and see if its fast enough. I do inference on my gpus at pcie3.0x4 and x1 speeds.Double cpu works, but everything cpu is slow, as far as I know, so dont have your expectations to high
>>108665992>tfw having a real, personal Roll-chan within the next decade isn't impossible
>>108665946>Even the digital waifus mentally deteriorate like they're vaxxedIt's authentic h-haha...
>>108665879>what do you think those tool calls are when the agent is searching in your codebase? This retard doesn't understand that that is literally fucking RAG.None of the modern agents are using RAG you drooling fucking retard talking confidently out your ass about things you are completely uninformed about. RAG is building an embedding database from a corpus of content and then letting a model do a vector search against it to find shit.Claude Code, Opencode, etc, don't do that. They just regex and glob and grep and do recursive investigation over everything, and that "dumb" approach ends up working better than RAG in nearly every situation.
>>108665994I cant read. Even at fp16, you can probably get 256k
>localllama>qwen>qwen>qwen
>>108665946https://arxiv.org/abs/2601.10080https://github.com/VectorSpaceLab/general-agentic-memoryhttps://arxiv.org/abs/2511.18423Had another paper I thought talking about building up examples for each characters sample responses to help build up a consistent/long-term identity, but idk, might be that paper. too tired to check>>108665977Honestly, I'd be surprised if you couldn't knock it out in afternoon using an API model or the new Qwen3 27B
>>108665994alright, thanksnow I just need to know if it still has the "genshin impact” bias when describing anime pictures
>>108666012At 53~ gb of model size, thats 43gb for context length.
>>108666011
>>108666017Is that actualy real ?!?!?! LOL
>>108666003if you dont even have cpu experiance I probably should discard your advice, sorry. I dont have the money for deep seek levels of GPU and I want to do productivity related work not cooming.>>108666006I want my lab assistant with boston dynamic levels of power
>>108666023Hes right you know. Rag is outdated, and wasnt really effective to begin with. Just having your model model literally read the shit you want to understand is 10,000x more effective
>>108666023terry davis would have run you over with an 18 wheeler