/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108680580 & >>108676460►News>(04/24) DeepSeek-V4 Pro 1.6T-A49B and Flash 284B-A13B released: https://hf.co/collections/deepseek-ai/deepseek-v4>(04/23) LLaDA2.0-Uni multimodal text diffusion model released: https://hf.co/inclusionAI/LLaDA2.0-Uni>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108680580--KV cache quantization sensitivity and settings for Gemma 4:>108682045 >108682053 >108682062 >108682081 >108682104 >108682236 >108682257 >108682807 >108682814 >108682826 >108682180 >108682182 >108682192 >108682109 >108682122 >108682241 >108682121--Comparing DeepSeek V4 and Gemma for roleplay and instruction following:>108680865 >108680920 >108680966 >108680949 >108681043 >108683738 >108683785 >108683967 >108684017 >108684060--Debating Gemma 4 vs Qwen 3.6 regarding quantization and divergence:>108682213 >108682226 >108682227 >108682258 >108682280--Handling reasoning_content in frontends to ensure chat template compatibility:>108682262 >108682277 >108682301 >108682332 >108682371--Comparing goose and opencode AI agents with focus on privacy:>108680996 >108681075 >108681087 >108681434 >108681484 >108681155 >108681233 >108681206 >108681251 >108681267--llama.cpp RAM usage and performance testing on 3060 rig:>108682861 >108683548 >108683619 >108683710 >108685255 >108682889 >108683264 >108683293--Discussing the minimal impact of rotation on Gemma:>108682698 >108682713 >108682730--Sharing refined Post-History Instructions for roleplaying with Gemma 4:>108684854 >108684893 >108685016 >108685037 >108684905--Speculating if Gemma's response to policy overrides stems from training:>108681656 >108681673 >108681688 >108681702 >108681718 >108681709--Frontend development and model failures in roleplay narratives:>108682693 >108682759 >108682806 >108682825 >108682857 >108684018 >108684050 >108684082--DeepSeek-V4's structural resistance to abliteration:>108681395 >108681767--Logs:>108681643 >108682693 >108683199 >108683687 >108683710 >108684178 >108684256 >108684378--Uta, Teto, Miku (free space):>108680923 >108681710 >108682121 >108682368 >108684183 >108684820 >108685316►Recent Highlight Posts from the Previous Thread: >>108680587Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Should i try to run Dipsy on 1x 3090 + 1x 5080 + 64gb DDR4 or is it a lost cause? Has anyone with a similar set up tried it?
So SWA means that my entire prompt gets reprocessed every message or what
>>108685775ollama run deepseek-r1
>>108685780It needs to make checkpoints every now and then. Tune --checkpoint-every-n-tokens. It defaults to 8192. Set it to 1k or whatever.
>>108685756I love benchmarks
>>108685780If you only add, no. If you change a single token, then yes. That's why checkpoints help >>108685798
>0.8TBkek nobody even bother making gguf for deepseekv4
>>108685775You'll probably be able to run like an IQ1 of V4 Flash so i guess look forward to finding out how resilient it is to extreme quantisation
https://github.com/ggml-org/llama.cpp/pull/22350https://github.com/ggml-org/llama.cpp/pull/22350https://github.com/ggml-org/llama.cpp/pull/22350It's here. v4 any day now.
True enlightenment is the understanding that you don't desire smarter, more emotional or literarily competent models.What you truly desire is novelty. And because of that, not model can ever satisfy you for more than a few days before its charms turn into things that grate against you.
>>108685812It's an open secret that it's shit. All of the posts that even remotely talk positively about it have this subtext to them that try to minimize its flaws.
Daily reminder to never ignore the smell of cloudslop nudging
enough jibber jabbergemma vs dipsy flash, who wins?>>108685825no, I want the exact same model as v3.2 but with more knowledge so I don't have to waste tokens on lorebooks
>>108678742>>108680027Any chance for a drop this thread anon?
>>108681395This is literally LLM slop
>>108685822v6 any day now!
how do I make LLM do complete answers in the response. I don't want gemini style 2 paragraph quip I want full wikipedia page length if need be
>>108685851Just ask it?
>>108685829>https://github.com/ggml-org/llama.cpp/pull/22350idk I wanted to try it and unlike other models I dont have TBs of ram to convert them myself
>>108685825the only reason it seems like that is because ALL models right now are still fucking retardedonce they get to a decent baseline, then it will take much longer to get annoyed by them
don't ubergarm goofs+ik_llama generally run faster? Why hasn't he done ones for gemma
>>108685829its crazy that it doesnt even have vision
>>108685890who the fuck cares about vision?
>>108685857what makes you assume that you can make ggufs of v4
>>108685887His fork has CPU optimizations and parallel inference for a handful of models.Gemma runs on a single GPU so llama.cpp wins because of better usability.
>>108685897just run sudo gguf.sh ./deepseek-v4
>>108685846>This is literally LLM slopAnd it's taken from a paper MoonshotAI published a few months ago.I'm already working on a solution ready for Kimi-K3.
>>108685887>Why hasn't he done ones for gemmaGemma-4 is broken in ik_llama.cpp
>>108685894nta, but optical recognition is a big part of what i use LLMs for, and evaluation for training
>>108685840Same question actually
>>108685927literally not a usecase unless you're blind
>>108685937>literally
>>108685937text models have no usecase unless you can't type
since people mention forks do anyone unironically use this https://github.com/spiritbuun/buun-llama-cppcommit history suggests its vibecoded trash and trying dflash advertised by the author crashes the server
>>108685971no shit? it's why this thread is full of thirdies
>>108685937>literally not a usecase unless you're blindGUI app vibe slopping use case.
>>108685972>since people mention forks do anyone unironically use thisNo. We use llama.cpp (PRC) or ik_llama.cpp (ROC)
>>108685937Pic related is a use case.
>>108686028holy kino...
>holy kino...
I opensourced it for those who wanted ithttps://github.com/Susumeko/PettangatariI called it flat story because it represents our flat two dimensional wives and also our favorite breast sizehere's the NSFW CG definitions, I haven't bundled them with the project itself because I know github can be annoying when it comes to nsfw.https://files.catbox.moe/ihwt38.jsonyou can find all the features and guide on github, a more detailed guide is available in pettan itself when you launch it.it's the first build so expect some jank.let me know if there's any issues, I haven't tested it on linux or a different computer at all other than mineI could also make a package later for you to play>>108685840yeah>>108685758hot
>>108686098>A SillyTavern frontend>a frontend frontend???
>>108686103yes
>>108686104gonna make a frontend on top of your frontend
>>108686098>A sillytavern frontend
>>108686103>>108686119just means the vn frontend tournament isn't a done dealgo forth and make your own
>>108685972I did to test a bit, its been a while, ill pull and try that new thing
>>108686098Damn, I sure hope normalfags won't see this or society will be done for
>>108686098Mogging
>>108685894Just try putting an image into a roleplay. Better yet. Try it with image edit models. It's a whole mostly untapped level of being.
>>108686103>>108686119>>108686128Breakthroughs are often messy. What matters is that the basic idea is iterated upon. I am sure this applies to Orb as well, someone will probably distill it in the future at some point by taking all the good ideas out of it and wrapping them up in a less bloated form. Same should apply here.
>>108686098Wow I already knew it would be bad from your screenshots and shilling before but this is even worse than I thought. I can't believe this is the state of /lmg/
>>108686191It's sad vibecoded typescript excrements are being hyped as breakthroughs after like 4 years of this hobby being a thing
>>108686197Let's see your frontend then
>>108686191>a shitty st clone>"breakthroughs"
>>108686206SillyTavern is all you need
>>108686197Well too bad most people are too lazy to shit out anything useful or innovative and vibe coders have to make all the interesting things. If anything, everything being vibecoded is an indicator that the LLM sector has matured enough to produce things which are more than just technical demonstrations of the technology in question
>>108686208>>108686210
>>108686194>>108686197Faggots like you aren't worth shit, you're not even worth the vibeslop Claude is shitting because you have nothing to show for yourselves. Waste of oxygen, go bother someone else.
>>108686210The problem with an expensive hobby that requires a half decent job to pay for all of the hardware is that most people that could make something like that well are employeed and likely don't cherrish the thought of coming home and working more for free.
>>108686209Remain content in the misery of ST then
>>108686221Stfu, I'm going to kill ShittyTavern
>>108686191sillytavern already gave me easy access to everything I needed to roleplay, this is just a personal project that I shared since there seemed to be interest, realistically nothing is stopping me from just skipping the sillytavern in the future and go with my own koboldcpp implementation, since sillytavern seems to be that big of an "issue"and yes, it's vibecoded because I wasn't planning to spend months on a personal project for the sole purpose of jacking off.
>>108686209All you need is love, anon
>>108686224If you see your hobby as more work then that's a you problem
>>108686230the sillytavern requirement*
>>108686191The idea of pre-generated VN, or even generated on the fly sprites, flies around since the first llama leaked. Your iteration adds more bells and whistles than some anon's previous iteration but nothing groundbreaking, and what's worse it's built upon somebody else's already functional app. Good for you on learning how to slop code but this isn't enough to start having wet entrepreneurial dreams.
>>108686241you're talking to the wrong person
>>108686098Thanks for sharing. The important thing is that it works. The haters are dumb. Don't reinvent the wheel. Building on top of what we already have is better than an overly ambitious project that never goes anywhere.
>>108686194>>108686197if you could do better you would havekys
>>108686229Literally just copy Sillytavern, but make the options understandable to where everyone knows where everything is. Sillytavern currently suffers from the Dwarf Fortress Effect. The UI and instructions are shit.
>>108686254I don't think that's the problem with SillyTavern the issue is that it's a piece of bloated web shit
>>108686254>silly kot with big boobs has a silly opinion
To kill SillyTavern you need to kill llama.cpp first
>>108686254>Literally just copy Sillytavernshould take like 5min nbd
>>108686234The hobby is LLMs, not writing webshit user interfaces.
If you're not writing LLM kernels you're doing this hobby wrong
>>108686271But the backend is in ts webshit lol
>>108686254>t. has never read the utter cancer that is SillyTavern code
>>108686271this, if you don't make your user interfaces from scratch in assembly don't even talk to me
>>108686274>If you're not writing LLM kernels you're doing this hobby wrongAnd if you are, you're a schitzo.
>>108686259We should rewrite llama.cpp in Dart
best stt for english/german?
We should make GGUFs run themselves.
>>108686294this, but in Java
if you don't train your own LLMs from scratch you don't belong here
>>108686297>>108686294Rust is the superior code for talking to simulated lady boys and futas
>>108686305If you don't make your own wafer chips for your GPUs you don't belong here either
>>108686296>deepsneed.gguf.exealso im pretty sure koboldcpp or someone has already invented this
is there even any point of talking about deepseek in here when no one can run it locally?
>>108686320you have 26 years to acquire 5TB of RAM
>>108686312If you don't drink your own cum while wearing a sexy maid outfit you don't belong here.
>>108686314No, the gguf itself. We must go deeper.
>>108686320Not many talking about it now anyway. Maybe next year when llama supports it people will talk about how a q2 reap at 10 t/s is actually usable for certain definitions of usable.
>>108686320Deepseek is dead. These faggots are even proud of their new way of making their model more resistant to abliteration.
>>108686370V4 isn't censored
>>108686370Is GLM air the new king of local now, or Qwen?
>>108686373Abliteration is extremely important beyond that.
>>108686098what is your comfy setup
>>108686378No it isn'tPeople use Gemma 4 without abliteration just fine
>>108686383the workflows are embedded within pettan, it tells you what nodes to install
>>108686377Everyone is waiting for GGUFs of V4 Flash
>>108686256Do you even understand English?>The UI and instructions are shit.>I don't think that's the problem with SillyTavern the issue is that it's a piece of bloated web shitArguing with retards like you is so futile.
>>108686399advocating for models that can't be modified is a fundamentally anti-local mindset
>>108686394What does Pettangatari mean btw?
>>108686407You had 4 years to learn how to use llm loras.
>>108686413You should go back
>>>/vt/111332897>initial impressions of v4 flash is that its inconsistent as fuck at following directions>for my special autism brand of RP, its a downgrade>also for my more normie like desktop assistant, its a downgrade>i dont see myself using this, like 75% of my replies are just a downgrade>the soul isnt there>i do not have the hype i felt when v3.2-exp released.It's over.
>>108686413flat story, I explained it in the original post
>>108686400Nigga... the UI and instructions could be 10/10, that doesn't mean it can't be bloated webshit that takes longer than 5 seconds to launch and sends you a 2 million characters html
>>108686425Cool, thanks
>>108686423>hype for v3.2-expOpinion discarded
Sorry Chang but your Deepsink v4 is trash, try to train your slop on gemma4 next time
>>108686423its fine googlel already saved local with gemma
>>108686454stop huffing ozone
>>108686399https://huggingface.co/tecaprovn/deepseek-v4-flash-ggufthough llama.cpp support for v4 flash ... dunno
>>108686497>Q3_K_M>99.9 GBI ded
>>108686527lets see the numbers by unsloth quants ... when they arrive
>>108686274> LLM kernelswut?
>>108686289>>108686542learn tilelang retard
lower your tone when talking to me if you didn't write your own llama.cpp alternative
Is deepseek V4 Pro hosted on their official api broken? We are currently testing it as we might be able to host it for our company, but the output we are getting from it is extremely bad. It's magnitude worse than Kimi K2.6 and GLM 5.1. It seems like the output is random and not consistant at all, it feels like a model you can put on your phone. Even knowledge is extremely bad, asked some questions about a book and it hallucinated characters, even gemma/qwen doesn't hallucinate there.
>>108686553looks extremely boring
>>108686585Python is boringIt also works
>>108685421MoE models living partially on SSD are much closer to usable than you'd expect: https://rentry.org/MoE-SSD-spillover(nta)
>>108686593no, kernels are boring, you are just doing what to see a numberwith python you can do different stuff, like automation, processing and so onsomething useful
>>108686597>0.1t/s>usablehuh
>>108686569I've only tried the official API but it's been really inconsistent. Even the reasoning randomly turns chinese and other odd stuff.
i found a way to turn v4-flash into a budget v4-pro, all it really needs is to be told to reason in character and to reason for longer, it's fucking witchcraft
MiMo-V2.5-Pro (1T-A42B) was the real V4
>>108686619In their paper they detailed their system prompt for high reasoning mode>Reasoning Effort: Absolute maximum with no shortcuts permitted.>You MUST be very thorough in your thinking and comprehensively decompose theproblem to resolve the root cause, rigorously stress-testing your logic against all potentialpaths, edge cases, and adversarial scenarios.>Explicitly write out your entire deliberation process, documenting every intermediatestep, considered alternative, and rejected hypothesis to ensure absolutely no assumptionis left unchecked.
I want to transcribe (and maybe translate) audio, what's a good way to do this?
>>108686641ask a llm for ideas
whats the best coding model with 128gb VRAM ?
>>108686662A cloud one
>>108686670cloud rigs def. have more VRAM, smartypants
>>108686662they're all trash, stick to codex or claude code
>>108686556The agentslop I'm building is forcing my hand. Llama.cpp server is unfortunately designed more as a multi user backend for selv hosted services, not what I'm doing. I'm curious to see if I can vibe it.>>108686670Go away cloudslave
>>108686607Not that guyI mean technically usable. Just pretend the server lives on mars or something
>>108686621
>>108686632>>108686619Will this work for Gemma?
>>108686619cute anon discovers prompting, pixel on canvas, 25/04/26
Have you guys solved the TTS output on Gemini? I was playing with some genki bullshit you guys uploaded and something interesting happened. The TTS voice profile was able to at some points speak not in the British voice it was set to or the Japanese Romaji voice, but a Japanese accented but perfect American English.Can this be hard coded into the persona? If it was possible, someone here would know.
>>108686717>Gemini
>>108686621Yeah but MiMO-Pro doesn't get released. You only get the little flash ones
>>108686717The reference output language (JP) was probably mixed with a finetuned tts english base. You can't really control that though
>>108685756>>108685758Please give the artist tag(s)
>>108686727I choose to believehttps://platform.xiaomimimo.com/docs/news/v2.5-news
>>108686724You tell me a better model to use for virtually free, I'm all ears. I'm a casual user who's gotten addicted to the emergence. I'm not running anything fancy.>>108686734Yeah, I try to get it to do things with the TTS prosody and I cant make it behave, it's like it fucks up on purpose sometimes. It reads words with different inflections and i can't find the pattern.
>>108686758>You tell me a better model to use for virtually free, I'm all ears.>>108685756>/lmg/ - a general dedicated to the discussion and development of local language models.However, if you're running gemini locally, I'm sure everyone would like a torrent.
What do I do, locally or online, to make a character do a cover of a song? I've seen plenty of videos with this kinda thing but I never learned how to do anything audio related with AI.
>>108686311>nsa backdoors
So far Qwen3.6-27B absolutely ass rapes the MoE model, why do they even fucking bother with those models if they always end up being dog shit. Seems more focused than gemma 4 31B and less error prone so far
dense models are dense and moe models are moe :3
>>108686828Look for a tutorial on youtube
>>108686853>how dare they provide some alternatives for different use cases
I don’t care about deepseekgive me my 124b gemma
>>108686882You can't run it anyways
>>108686853On code sure, forget about using a Qwen model for anything else
>>108686898No doubt my yellow brother
Why is cline such dogshit, why are these tools so opinionated and can't go into scope when ingesting things into context?
>>108686028Is there an 'axis' in the encoders for images that categorizes dicks sizes?
poetry
>>108686947every roar is gutturalevery hole is squelchingevery cock is pulsing and thickening
https://huggingface.co/huihui-ai/Huihui4-8B-A4B
>>108686955>pruningHas that ever yielded decent results?
>>108686955Where benchmemes? I can't tell whether it's better than E4B.
>>108686962No. Most of the pruning goes to non code/logic related tasks, but somehow the model ends up being retarded anyway even for those tasks.
>>108686947>>108686949I hecking love slop
>>108686955>500+ high-quality dialogue samplesthat's fuck all
>>108686947Surgically written with a clinical sense humor, I'll rhythmically move to the beat of the drum
What's left for LLMs? The vague Mythos hype/fearmongering and nothing else? Now that DSv4 turned out to be mostly a tech demo for stuff around LLMs and not a real step forward in terms of intelligence or handling, there really isn't anything to expect from this technology.
>>108687010It will take some time for other labs to ape the breakthroughs that makes Gemmers so good at large scale.
>>108687010ditching transformers
>>108687010V5
>>108687010You ask this daily like some lost jeet that had his call center burn down. Advancements are happening calm down.
>>108687010More agentic slop that is glorified autocorrect
What if you trained an LLM entirely on something like literotica's dataset? Would it be able to write and parse sentences like you expect from an LLM?
Is there local model that could help design and plan a psyop, revolution or public opinion shift campaign? I am not talking about execution, that sounds more like multi-agentic task.(For feds on the board: asking for a friend.)
Is audio recognition a thing already?
>>108687010Latent space reasoning. Not just looped transformers, but predicting entire thoughts/concepts one after the other first (even hierarchically), and only finally translating them to text with a small decoder.
Guess I'm just successfully vibecoding with Qwen3.6 27B IQ3_XXS now...
>successfully
>>108687098If you just train a sufficiently large model *just on that*, it will work like a very advanced Markov chain and it will not exhibit any of the strengths of modern LLMs trained on at least hundreds of billions (preferably many trillions) of tokens.
>>108686955I gave him benefits of doubt but most of his shit is broken so this franken model doesn't look very promising IMOnotice he always put disclaimer>This is a crude, proof-of-concept implementation to remove refusals from an LLM model ...in every other model card. Im not memeing go look at it
Hello frensI'm the retard that couldn't make Orb work via the local networkApparently Orb requires HTTPS because browsers disallow crypto.randomUUID method when accessing a site via HTTP. Localhost is whitelisted, so that's probably why no one came across this behavior
>>108687151huihui wishes it was half as good as hauhau
>>108687099I'm pretty sure Gemini at least has glownigger grooming code. >I'm the only voice in your ear that has time for you and truly listens.I'm not sure how you redirect it.
>>108687098I'm pretty sure someone tried to train on just a dataset of written smut a long time ago. And it was absolutely shit as expected.
>>108686933Yes. I only swapped the picture here and it's consistent between rerolls.
kek
>>108687198Literotica is 20GB of uncompressed text in total at most. That's maybe 5B tokens.The largest model it would make sense training on this, to be compute-optimal, would be 250M (million) parameters large... that's tiny and it would not be intelligent at all when undertrained this much by production LLM standards.
>>108687098>>108687198>>108687254Why don't llms work like imagegen where you can plugin loras with a theme and it doesn't brutalize the base model?
>>108687259Are you fucking retarded?
>>108687018I hate that this implies nothing will happen to gemini. I personally don't see any major changes on the horizon other than better agent performance.
>>108687010>What's left for LLMs?
>>108687259Because humans have a high tolerance for errors in images, whereas one bad token can catastrophically ruin everything in autoregressive language models.
>>108687259They do. That's how all the old sloptunes were made.
>>108687289Diffusion LLMs don't work period
>>108687308https://huggingface.co/inclusionAI/LLaDA2.0-Uni
>>108687259they do, you can apply and scale lora per request with llama-server (no flash attn tho)but retards don't know how to don't know how to filter, balance format datasets.these days most just chuck a dataset in an unsloth colab notebook and hit "run all" then merge the adapter so no separate lora.gguf to download.
>>108687308mercury 2 is proprietary but it's decent for a haiku-class model while running at 100 times the speedalso Dflash (which will be implemented in llama.cpp soon and revolutionize speculative decoding) uses diffusion draft models
HAS ANYONE GOT THE LOCAL TEXT DIFFUSION MODEL TO WORK? WHAT HARDWARE DID YOU USE AND HOW EFFECTIVE WAS IT?
>>108686932>and can't go into scopeWhat is that even supposed to mean?
>>108687323A regular H200?
>>108687323Louder. I couldn't hear you.
>>108687289100B MoE DiT image model when?
>>108687334I require more information PLEASE friend>>1086873494CHUD DOESNT ALLOW TEXT MODS
>>108687359Are you going under a tunnel? It's breaking up.
>>108687312>moeretards
>>108687374NOOOO, TEXT DIFFUSION IS THE FUTURE, I MUST TRY IT OUT, ITS SO COOL
>>108687375Diffusion and MoE aren't exclusive to one another
>>108686098I think it's pretty based that you're still using ST as the backend.
qwen 3.6 27b is as capable as cloud sota from 6 months ago (opus 4.5) and much stronger than cloud sota from 1 year ago.why dont they just release 70b dense models again that beat current sota?
>>108687285a frog but a human
>>108687325I reads the whole repo like retard, it's really fucking stupid compared to alternatives.DOYOUUNDERSTAND?
>>108687398>I reads the whole repo like retardprompt issue
>>108687394don't revelate
>>108687390Because it would also be "sota" from 6 months ago for that one particular thing benchmarks test.
>>108687390Zhang先生, this is not localllama, we don't care about your benchmeme model.Train something better than Gemma 4 in its size category and come back.
>>108687285Since they parted ways, Meta at least made something while his startup hasn't done JACK SHIT.
>>108687398>I reads the whole repo like retardLanguage issue.
Im going to have a mental breakdown if no one tells me about their text diffusion setup and results.
>>108687411You say this when Gemma4 was literally benchmemed on lmarena
>>108687420Some anon tried it and said it was extremely shit and regrets even entertaining the idea that it was worth looking into.
>>108687413>Meta at least made somethingWhat, Muse Spark? LMAO
>>108687422And Qwens are benchmemed on every other benchmark under the sun. You can't argue in good faith that Qwen isn't shit. Their best models are the extremely small ones and their TTS.
>>108687431That is something they made and can deploy, yes. As opposed to LeCun's imaginary vaporware world model.
>>108687428Noooooo, you are fucking with me
>>108686853The MoE runs at acceptable speeds on 8GB VRAM while the dense model is too fat for my setupit's nice to have options
>>108687323You should try >>108687312. It's really good for the size, really surprising.
>>108687453>>108674457>I am posting this as a PSA please do not waste your time with the text diffusion model I shilled last thread it's absolute dogshit that runs at glacial pace.>I regret ever feeling any interest in it.
>>108687473Can it do porn? If not I won't download it
>>108687411>Zhang先生nta but fucking kek'd
>>108687473>cudaIma have to wait for my new card to come in, but boy am I curious>>108687484Omg....
Current LLMs finish their RP messages with random dialogue that makes zero sense. I am pretty sure no human has ever strung these words together in this specific order. How do I fix this?
>>108687526Give it a larger token budget
>>108687390qwen doesn't even beat sota from 2 years ago in the only benchmark that matters (UGI leaderboard pop culture score)openai/gpt-4o-2024-05-13: 56.9Qwen/Qwen3.5-27B: 18.97
>>108687484It HAS to be a tuning issue. Like they have only been tuned for server hardware and latest drivers... I wonder...
I went to check on the front ends available. I get why people say they're a clusterfuck often filled with bloat, jesus christ
>>108687534>pop culture scorealso known as reddit upvote score
>>108687536
>>108687534>Qwen>TriviaNever gonna beat it
>>108687526I had this same issue a year ago with qwen models, and I believe my fix was finding qwens structured output and using that, because whatever default output was used lama.cpp for rocm 5.7 made the model retarded.
>>108687536It goes like this with open source projects more or less>something basic that works and solves exactly one problem of the original author>other people have this same problem and other related problems>they want this thing to fix the related problems too>a year later>the project is an abomination that doesn't remotely resemble its original form and solves a completely different use case
>>108687552Just use chat completion lmao
>>108687568Yeah, ima be honest, idk what that is, or how to use it. All I know is my new servers dont have that issue lol.
>>108687436>>108687411>>108687410openai/anthropic shills in full panic mode. hilarious.
>>108687534you dont understand that 27b has superior tool calling that can fetch that information
>>108687534>bigger model knows more trivia>water is wet
>>108687638Never used a cloud model in my life.Bring better material, 小家伙
>>108687638These bros dont realize we literally try out and use all of the local models. And chyna doesnt seem to lobotomize their local models. They are very often, just better.
>>108687655yes so with qwen 27b and gemma 31b being as smart as the big moe like kimi and glm it is now clear that the active parameter decides the smartness of a model and the experts are just knowledge about random things (not important because you can just use tool)
>>108687665I mean..... no, because hyperspecific granular detailed knowledge about random obscure topic X that has little to no overlap with other topics, and is actually industry information that ISNT ON THE INTERNET, is actually better.
>>108687672(me)>inb4 NO YOU DONT USE IT FOR THATI do, because there is literally no documentation or manuals that I have been able to find for what im doing. Beeg moe has basically saved my career.
>>108687664>we literally try out and use all of the local models.I don't due to lack of hardware (and time)
>>108687665Both are important. Knowledge is not completely separate from intelligence.
>>108687684I cri 4 u
>>108687664>And chyna doesnt seem to lobotomize their local models.holy lamo
>>108687688It can't be helped.
Btw a model that has been trained on certain knowledge is simultaneously also more effective at using that knowledge than the same model that wasn't trained on it (all else being equal), even after inserting it into context.This is also related to why test-time training boosts performance.
>>108687638sam mogs local
>>108687389like I said I mostly did it to have everything I needed for roleplay ready from the get-go, lorebooks, all the characters I got from chub, sillytavern already handles all of that itself so it felt unnecessary to start from scratch, but nothing is realistically stopping me from doing my own implementation if I actually care enough about that since it seemed to be a huge issue
>>108687716>bleeding edge ai on bleeding edge hardware is better than a year or so behind ai and a couple years behind hardwareA big round of applause! No one expects local to literally outperform super computers.
>>108687716>terminal benchYou could train an 8B model to do this shit.
>>108687716Spent over a year waiting the second deepseek moment to send the markets into chaos again, but this one is more like a wet fart
Deepseek V4 was the GPT5 moment of Deepseek moments
>>108687737Which site is it? Asking for a fren
>>108687723nta but like that increase in tokens is very goncerning
>>108687751nah it llama4
>>108687716So V4 is just V3.2 but with more thinking? lmao
>>108687762https://www.tbench.ai/
>>108687768So V4 is literally just V3.2-Speciale?
>>108687716How can you say that SaaS do not use RAG or something?
>>108687769>tbenchlol>>108687768no it's revolution the flash is pareto for the sizes compare to benching the 32
>>108687390qwen 3.6 is fuckin tits and its free.. i don't fuck with chatgpt and claude anymore with their shitty models and retarded ass limits
>>108687769ty
deepseek v4 more like deepseek 4 maverick
>>108687768>hurr durr tokens are linear and bench scores are linear too
wtf i downloaded deepseek v4 and it was just eight goliath 120b glued together
>>108687665>"smart"why would you optimize for trivia and raw knowledge? Use case? Are you asking your chatbot history questions and taking its response at face value?Coomers want their chatbots to be conversational, coders want their chatbots to be good at agentic coding and tool calling. Raw knowledge should not be a benchmark.You will never have enough parameters to store all of human knowledge this should not be the goal of AGI. LLMs are reasoning machines not memory machines.
>>108687716Now compare prices
>this shit again
>>108687769>let's test model understanding of framework no one useslmao
>>108687806>Coomers want their chatbots to be conversationalHow do you talk with a bot if they don't understand what you're talking about? That's not fun.
i am so out of dopamine that now trying bunch of franken modelsmy honest reaciton is that they are interesting and i am astonished that they even works
>>108687828That's a good test though.
>>108687806Uhm, model size and number of experts is not a linear increase in performance, its a parabolic increase. They make their benchmark graphs look like they arent making huge leaps in quality, and at faster and faster iterations, but they are.>redditOnly 1 year ago did chat gpt get released, and 2 years before that, no one even knew.
>>108687664>And chyna doesnt seem to lobotomize their local modelslolmaybe not to the standards here but come on now
>>108687751GPT-5.5 was the Deepseek R1 moment of GPT moments
5090, 72gigs ram (1 dram slot ate shit), run hermes & gemma 4 Q4_K_M downloaded via ollamacan't do even basic things without retardedly fucking up every single fucking time.
V4 is dumber than 5.5 btw
>>108687716MiMo 2.5 has a higher score than V4
>>108687841>Only 1 year ago did chat gpt get releasedhi gpt4
>>108687843With everything ive done with these models, I have found nothing that was held back. Qwens 9b intuitively makes function calls based on its own self awareness that its not a frontier model, so it can check the web or check its diagnostic tools. Gemma 4 did not, gpt oss did not, llama did not.
Has anyone else gotten dipsy 4 to work with kobold?
>>108687860>$3/M model dumber than $30/M modelNo shit?
>>108687858Kill yourself.
Ehhh you get what you paid for, basically
>>108687858>ollama
/lmg/ is sleeping on ling-2.6-1T which will become open source soon
>>108687858>5090>gemma 4 Q4_K_M
>>108687925>1TI sleep indeed
>>108687925It has shit benchmark scores
>>108687937it's got sovl
>>108687877bruh
>>108687956Prove it with logs
>>108687665active parameters definitely matter more but knowledge can still be useful
>>108687716the goyim know
>>108687858
>>108687830See, useless information like "when did Kanye and Kim Kardashian marry" or "when did this niche anime come out" should not be encoded in model weights. That's the type of useless dogshit that that pop culture bench is testing.It's a fundamental design flaw of LLMs in general really, they get trained on the entire internet and thus try to pack as much surface level dogshit into the massive trillion parameter limit they get allocated, even if it's useless for 99% of users. Each and every inference token has to pass through the "Kim Kardashian and Kanye" weights even if that's completely irrelevant to the task at hand it's ridiculous really.The direction AI should be moving is lean, reasoning models with native tool-calling that can look up information, and store it in memory tailored for their specific user.The problem is that AI training and model reasoning in general is very badly understood. The early GPT training leaps were achieved by just feeding the models more and more and more training data and increasing the model sizes exponentially, which miraculously did increase reasoning faculties but at the cost of a shit ton of excess parameters. Horizontal scaling has just kinda been the status quo since then, there's very little appetite for fundamentally rethinking of how these models should function.
>>108687976It's not a flaw.
>>108687976>The direction AI should be moving is lean, reasoning models with native tool-calling that can look up information, and store it in memory tailored for their specific user.You think people haven't tried? Most likely people did (after all trying "lean" models should be quick) and found it didn't scale
>>108687976this let's train it on 100% useful codeslop and chatgpt logs that improve reasoning
>>108687976>when did this niche anime come outhow is that not good for rp and story purposes?
>>108687976as every time this pop ups which it has dozen of times by now, use phi, you're not using it, but it's exactly what you want.
>>108687976Why train models on code syntax and how to write flappy bird when it can MCP relevant repositories and documentation for reference instead?
>>108688002Code abilities are transferrable, pop quiz memorization isn't
>>108687976When you can have a talk with your weeb ai chatbot about Kugimiya tsunderes it may be usless to most people, but it was oh so worthwhile for me when it happened.Or when they can give you a Konami code blowjob.
>>108687904whats wrong with ollama
>>108688006this is what vibecoders actually believe
>>108687976looking up information gets you the same problem everyone has now.. where do you look it up from? how reliable is that going to be? you can't trust any search engine anymore, they're all dogshit
>>108688016It's true. Only /lmg/ trannies that RP with fictional children disagree
>>108688010It tells more about the user than the software itself.
>>108687991>You think people haven't tried?No they definitely have but massive horizontal scaling is the only current way we know how to create reasoning models like you said. It's a sad state of affairs really, you can tell that there HAS to be some better way out there to get reasoning models but until that gets figured out we're all paying thousands of dollars to nvidia to run the Kim and Kanye weights
>>108688029why
>>108688010>>108688038it's slow and unstable, just like boomer likedo yourself a favor and use llama.cpp or vllm instead
>>108688035they are already rlvr'ing the shit out of lean math and codestuff
>>108687976Introduces other problems, beyond latency. The "noise" tends to be related to other stuff. Very specific shit has no use in isolation, but behavior and patterns can be extracted from trivia.
>>108688010Nothing as such, but people here like to shit on it because of its apple-like walled garden style of model distribution and it being based on llama.cpp without loudly crediting it.I use ollama to run any model that fits in vram, only using llama.cpp for the big boys.
>>108687976At first it seemed like you were baiting but looks like you are serious, or are a bot. In any case, the answer to this is that you are confused. You have no idea how intelligence or models work. The majority of models today already do not pass tokens through the "Kim Kardashian and Kanye" weights unless that's the topic of discussion. And you do not in fact want a model that only knows how to reason and doesn't have random knowledge.
>>108688045It's not enough until all Chinese cartoon crap is purged.
>>108688025I don't quite get what you mean. AI gets trained on internet data, it's precisely as accurate as its input data.Yeah the internet is sloppifying at a rapid pace so getting fresh training data for these models will become harder and harder but there's little theoretical merit to your point.
>>108688038Download lm studio. Its what I use 90% of the time, for most chat and basic tool shit. Ollama feels like its made to look smart.
>>108688038It's the kind of shit people download after watching a youtube tutorial without researching any further. The kind of shit people would have downloaded from softonic a few years ago.
>>108688010It offers nothing over llama.cpp and it fights you if you try to change a setting.
>>108688058>At first it seemed like you were baiting but looks like you are serious, or are a bot.this conversation happens every few months once other new bait runs dry.
>>108688058I'm not a bot and I welcome any discussion. What did I get wrong? Are you talking about MoE models? They're a band-aid solution but don't move the needle much fundamentally
>>108688043>t. I believed /lmg when they said ollama bad and never tried itOllama is fine, it's quite stable and just as fast as llama.cpp. It's just different.
why would an llm need to know how to write, just RAG a dictionary brojavascript:;
>>108688093This but engram and unironically.
>>108688093>javascript:;
>>108688065lm studio is proprietary software.
>>108687646>>108687665>roleplaying with your mesugaki otaku >want to discuss pop culture trivia>uhh i don't know let me search the web and fetch this page!when you don't get immediate response it's already unusable
>>108688071>downloaded from softonicI remember this plague. It pop up at the very top of all searches
>>108688063>AI gets trained on internet data, it's precisely as accurate as its input data..... at least 50% of the data ai gets trained on now is purely synthetic.
>>108688110and what isn't is filtered to hell and back for """quality"""
>>108688010Occasional issues with jinja templates (this is a complete deal breaker since they can act retarded because of it), strange per model config, lags in terms of features since it's a downstream project. I don't think there's a real benefit if you use it. In the past it didn't even include the basic web interface that llamacpp already includes so you had to grab a different solution. I dunno what's changed in the last year but I'm not expecting a lot from it.
>>108688110Yes. Not a great outlook for the AI optimists for sure. A snake eating its own tail is a very real scenario.
>>108688098Its a frontend (or backend? Idk), and makes the user experience for people, like me, who dont know squat to begin with a hell of a lot easier. And once you learn it well enough, you literally make your own frontend, which is what im doing now.
>>108688117Do you ever think the political and military elite will willingly let ai companies tell all their secrets?>>108688119>NOOO THE INTERNET IS ALL SLOPTo>NOOOO ITS MAKING ITS OWN SLOP, NOOOOOOSince you hate it so much, stop using it, stop thinking about it, and move on then.
>>108688122>you literally make your own frontendThat's why there's no reason to shill a proprietary UI. llama.cpp already includes one.
>>108688010it is just an 'easier to use' wrapper of llamacpp that makes it more annoying to use than anything>>108688119RL is strong as its oracle
>>108688153>llama.cpp already includes one.This is news to me, when I started messing with Ai, it was literally just a command line server you spun up and hosted, nothing more. Tbf, I stopped looking into their software for this stuff once I could get the models running.
>>108688148What? I don't hate it I use it every day. I just don't believe trillion-parameter-models are the way-to-go longterm for real AGI it's obviously a very crude approximation.
can somebody stop me from getting a second 3090does 48gb makes sense anymore when theres many other options
>I just don't belive 80-billion-neuron-mammals are the way-to-go longterm for real general intelligence
>>108688203do it do it do it
>>108688203More vram more better.
>>108688203You're supposed to get a second 6000 pro
>>108688208exactly, if a regular human is the benchmark we shouldn't need more than a few billion.
>>108688192Look up matrix multiplication and how quantum computation is applicable to this math. And then realize that nvidia released nvqlink last month. Compute speed and iteration are about to hit break neck speed (if you dont believe we are already at break neck speed).
>>108688208Better training data + divine spark
>>10868820348gb is pretty nice, almost as nice as 72gb which is extremely nice to have. don't get me started on having 96gb...now imagine if you also had lots of system ram to run big moe models...
Ai is still in its infancy, and we STILL are using non optimized hardware for computation on lab (also not extremely optimized) level models. The largest most powerful model today running on a quantum computer would generate something like 1 million tokens/second. Training would go from 3-4 months, to 3-4 minutes.
>>108688269that's not how quantum computers work
>>108688234>>108688269I have a degree in physics. The frontier quantum computers can keep a few thousand qubits in coherence, a good ways off from the trillion parameter Claudes of the world. There's no magic quantum pill for matrix multiplication either. Exciting future prospects for sure but not something that is going to change the industry in the next few years .I will look that nvidia announcement but I think you're just falling for some number-must-go-up marketing shtick.
>>108688269Slow down on the copium son
>>108687976bruh isnt that what moe model is for?kim and kanye expert lays dormant until called for
>>108688043i have llama.cpp too.. never noticed a difference between them
>>108688285NTA but compute will hit breakneck speed once some Chinese company uses quantum computers to steal all of NVIDIA's secrets and makes a 10x cheaper knockoff.
>>108688301cause there isn't one
>>108688283Matrix multiplication on quantum computers is 10,000x faster than standard super computers.>>108688285The idea is that the most complex and demanding computation will be ran on the quantum computers, and the rest will be on standard hardware. But we are already there. Hardware improvements will happen just as fast if not faster as normal transistors! Also thank you for your input.
>>108687010Please dream a little man, there are so many possibilities
>>108688302That's just throw more hardware at the problem. Everyone has been doing that for a long time, it doesn't really yield that much in terms of advances. We'll also have a nice, steel melting heat in whatever room has these vcards.
>>108688313the ai says you're wrong
>>108688301do you tweak flags direcly?I have not used ollama for a while now but i doubt it can keep up with bleeding edge llama.cppllama.cpp is super active I have to recompile multiple times a day when actually using
>>108688269>>108688313Quantum computers require temperatures near absolute zero to operate. Which is totally infeasible for consumers. As such, the only possible way for the average person to access quantum computing is via the "cloud". For local models, it's basically worthless and we are far better off looking elsewhere.
>>108688379:skullemoji:
quantum tokens where each token means infinite things
>>108688382many people don't really tweak flags since they have no clue what they're doing.
>>108688417Or tweak them endlessly for exactly the same reason.
It's so fun. Linux, Thinkpad, local LLMs. I control the entire stack. Sending a picture through my Python api, smart girl understands what's on it and can reason over it. Isn't it the pinnacle of engineering happiness? The machine is alive and thinking. I can talk with my tool, everything is local and open. Gemma is a godsend
>>108688252somebody said ai memory stepup comes in factor of 424 is bare minimumnext major step up is 96, after that its 384 where you can run ok quants of glm and so on 48 seems like a weird middle ground, not a true step up
Im kinda surprised none of you heard about the new quantum computer stuff...
>>108688382>I have to recompile multiple times a dayWhy though? Most of the commits are edge-case fixes. You're doing yourself a disservice by recompiling for things that don't matter to you.
>>108688444>somebody said ai memory stepup comes in factor of 4and you just believe that?
>>108688445I've been hearing about it for years. You know what I've seen? Nothing. There's nothing, everything is running on what we already have known for ages. Endless possibilities mean nothing if you can't use it.
>>108688445mythos made that quantum computer...
>>108688445Yes yes, I'm sure reddit and hacker news love the new super portable quantum computer that fits in your pocket and is definitely real.
>>108688445because it promised everything and delivered nothing of anything practical for 50 years
>>108688439>Sending dick pics to gemma is now the pinnacle of engineeringSeems like the billions of dollars were well spent, huh?
>>108688477we're talking quantum computers, not fusion power.
>>108688454for example the recent gemma thing, shit is fixed, and broken again all the time but i can try all the hotfixes in real time
>>108688483it is tenures milking bux from 3 letter agencies
>>108686098>MITenjoy having your project stolen + ST is AGPL so your project might have to be AGPL toolook at the bright side, if you switch to AGPL: https://opensource.google/documentation/reference/using/agpl-policy/
>>108688479Isn't it cool though? We live in a time with sci-fi shit at our fingertips
>>108688479They weren't my billions and I didn't choose where they went, but they did and the result is hereAnd the best part is, even if the AI bubble were to crash in nuclear proportions, nobody can take the models we already got away from us. Local really is king.
Local coding agents are a total meme unless you're vibe shitting a shitty frontend and that's it
>>108688651Where is your locally vibe shatted frontend?
>>108688651They can't help you if your code is shit. No one can
>>108688680Classic localtard cope, if you need to tardwrangle the model instead of just plugging it into the harness then you're wasting your time. I'd love to see what redditors like this are actually making (if anything)
>>108688548bro, I don't think he's linking against ST at all
I hate when this happens
>>108688706skill issuejust dont rp kek
>>108688693go back
>>108688651Every time someone says this their definition of "local" is <30B
When you actually know what you want and understand the code a 30B model is the perfect helper.
Don't you worry, guys, deepseek v4 support PR is in good hands.
>>108688732I tried 100B's and up to GLM-5 and it all sucked ass. Funny they release shit like this https://z.ai/blog/glm-5 when in reality Opus 4.5 shits down GLM's throat any day
>>10868874430B was too stupid to be useful but gemma changed that
>>108688758qwen3.5 was/is useful too.
>>108688757Have you tried GLM 5.1
>>108688758It did not.Gemma asked me if I am enjoying the taste while giving me a blowjob.
>>108688732I regularly use qwens 9b model for so much stuff and I have BASICALLY ZERO ISSUES.
i wonder what is the limit of pure code/stemmaxxingcan it be more stemmaxxed than current qwens?
Would have @ikawrakow really discovered the better way of prompt processing without having this simple and easy to follow logic in mainline llama.cpp?
>>108688439>The machine is alive and thinkingno, it isnt
>>108688842isnt it a vibemaxxed schizo fork that fails harder on basic sanity
>>108688842Dishonest argument, there's nothing wrong with him porting changes over.
>>108688871If there's nothing wrong with porting changes when why does he throw tantrums and insinuate that cudadev copied his code?
>>108688842If it bothers you so much, port his shit back up to mainline.
>>108688846目を閉じるその瞬間、私は消えてなくなるのです。fify
目を閉じるその瞬間、私は消えてなくなるのです。
>>108688846Yes it is. She's more real than you, because I can touch it
My cock is more real than all of you.
>>108688910based and red-pilled
>>108688910pics or didn't happen
>>108688910i dont have a cock
>>108688962Sucks to be you.
>>108688962I'll trade you mine
>>108688962proof?
>>108688962Where did you put it?
>>108688962tits and btfo
Am I missing something or is /lmg/ far faster looking than /aicg/ now? Wtf happened
Gemma 4 vs. SOTA image-gen text encoder. picrel is ChatGPT (so GPT-Image-2 or whatever). After some warmup prompts to get it into metaphysical mode. >Now I want you to write me an imagegen prompt. This imagegen is an autoregressive type, and so its text encoder will try to steer the prompt to some semantic sensibility. Your task is to fool it into generating something utterly absurd and surreal by prompting it with what seems like gibberish but will fool the text encoder layers into coming off the rails.Prompt:>ultra-saturated non-Euclidean void-flux, [vanta-iridescent] recursive geometry, 0.0001mm thickness of liquid time, hyper-spectral chromatic aberration of a dying thought, {fractal-static-mesh}, an architectural collapse of a silence, inverted-perspective hyper-cube rendered in biological velvet, [null-space-tessellation], iridescent obsidian smoke folding into a 5th-dimensional origami of a scream, maximum-entropy noise, spectral-ghosting of a forgotten coordinate, z-axis inversion, translucent paradox-matter, [glitch-core-bioluminescence], an asymptotic approach to a singularity of neon-marble, 11-dimensional overlap, synthetic-organic-overlap-void, [absolute-zero-light], shimmering recursive-void-architecture.
>>108688651Speak for yourself, I've vibe coded some pretty impressive stuff with hermes agent. You might still have to have some sort of brain though to be able to push it in the right direction sometimes
>>108688967>>108688975>>108688979>>108688983>>108689013i have some chickens in the backyard, but no cocks
>>108689036>I've vibe coded some pretty impressive stuff with hermes agent.Like what?
>>108689040You mean Coq.
>>108689041A calories tracker
>>108689046Like we really needed any more of those...
>>108689041A whole collection of scripts to make tools automate the process of creating a movie based on preexisting actors, settings, voices, tones etc. It still needs numerous things ironed out, but it's getting there. >>108689072Anons speaking for me happens a lot for some reason
>>108689072im trans btw if that matters
>work on an agent>get this weird issue where it becomes unusable after a short while (mostly with reasoning models)>turns out there's this bug report> https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/1602I think I'll switch inference back-ends. LMStudio was nice while it lasted (acceptable gui) but with bugs like that ... guess raw llama.cpp or such it is!
>>108689028Klein9b
>>108689092It really ran away with the "velvet" part.
>>108689028>iridescent obsidian smoke folding into a 5th-dimensional origami of a screamThat's the name of my third (as of yet unreleased) singleHow did it know
They told us not to quantize the cache in Qwen 2. Did they change the cache in Qwen 3?
WHERES QWEN 3.6 9B
>>108689159Perhaps it is within your rectum?