/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108502192 & >>108497919►News>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3>(03/31) 1-bit Bonsai models quantized from Qwen 3: https://prismml.com/news/bonsai-8b>(03/31) Claude Code's source leaked via npm registry map file: https://github.com/instructkr/claude-code►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108502192--Qwen3.6 benchmarks and local model usability debates:>108506706 >108507056 >108507101 >108507103 >108507104 >108507346 >108507111 >108507312 >108506744 >108506756 >108506894 >108506791 >108506802 >108506808 >108506807 >108506812 >108506824 >108506826 >108507063 >108506742 >108506781 >108507084 >108506852 >108506860 >108506921 >108506794 >108506900 >108506965 >108507036 >108507147 >108507192 >108507251 >108507266 >108507479 >108507770 >108507781 >108507787 >108507852 >108507860 >108507928 >108507985 >108508047--Testing local models' arithmetic reasoning against AGI claims:>108505076 >108505159 >108505200 >108505238 >108505289 >108505306 >108505327 >108505336 >108505344 >108505347 >108505357 >108505384 >108505360 >108505382 >108505411 >108505491 >108505521 >108506463--Configuring SillyTavern presets for GLM-4.5-Air via Chat Completion API:>108502705 >108502729 >108502743 >108502761 >108502748 >108502760 >108502768 >108502780 >108502781 >108502792--GitHub CI reliability issues delaying llama.cpp updates:>108506104 >108506116 >108506117 >108506133 >108506193 >108506240--Turboquant performance with Qwen3.5 27B:>108504829 >108504921 >108504941 >108504949 >108504961--Qwen 3 .6b quantization benchmarks show math sensitivity and knowledge resilience:>108505524 >108505533 >108506996 >108505546--NPU acceleration struggles due to lack of software support:>108503014 >108503027 >108503050 >108503084 >108503097 >108503200 >108503110--Local AI models criticized for inefficiency vs cloud alternatives:>108504383 >108504524 >108504744 >108507449 >108507456 >108507460 >108507467 >108507682--Logs: qwen-3.5-27b-q8 derestricted:>108507855--Teto (free space):>108503196 >108504729 >108505194 >108505438 >108505997 >108506842 >108506869 >108507794►Recent Highlight Posts from the Previous Thread: >>108502197Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
gemma day
>>108508022What IS a system message?
bros, when are we getting a better model than nemo. it's all so tiresome...
>>108508090It won't happen because new models are benchmaxed and trained with pruned, synthetic data to avoid copyright issues.
Been out of the loop for a while.Is the new rotation thing already merged into llama.cpp main?What does it mean in practical terms? We can just quant the kv cache to q8 by default now without worrying about it making the model fucking retarded like it used to?
>>108508099your hobby is shit then
>>108508102>Is the new rotation thing already merged into llama.cpp main?Yes... it's merged...>We can just quant the kv cache to q8 by default now without worryingYou be the judge.
>>108508106Your?
>>108508102Rotation Gemma is the first model to support rotations.
i can't jerk off while rping in sillytavern because i keep rewriting my ai's responses and writing long messages in response to said ai
>>108508179this is the bane of local shitty models
>>108508208nta but it's the exact same experience with api
>>108508112only partially..the 3 bit thing and polar coord are to be implemented
>>108508179Use that extension anon posted the other day to have the AI itself rewrite its own responses.
>>108508208>>108508106samefaggot
>>108508228>Is the new rotation thing already merged into llama.cpp main?The rotation is implemented. Anon asked nothing about turboquant.
>>108508239truth hurts huh
>>108508130oh i see you gotta set it up as a different api, i still dont get cunny kek, also is there a benefit over using this chat completion thing without all the templates? i notice it doesnt use your system prompt either
Women should get pregnant as soon as they're biologically able to do so.
>>108508248>i notice it doesnt use your system prompt eitherbro...
>>108508179Use tool support to give your LLM control over your lovense so you can type hands-free.
>>108508248cough
>>108508208yeah... i end up just writing with a boner. doesn't help that i go retarded cause no blood's going to my brain. only time i can actually fap is when the chat is done and i feel like it's perfect>>108508236i've not seen it, got a link? if not, i'll just go look through the archive>>108508268fuuck it's genius...
>>108508254So when they're 6?
So when will that super sektrit jewgle quaint method be publicly released and I can run better models on my pathetic 12gb of ram?
>>108508307memequant is mainly for the KV cache (context for a retard like u), the gains aren't really that different in the normal weights realm.
LLMs should get pregnant as soon as they're biologically able to do so.
>>108508321Nigga I'd love more context, even if it means I run the exact same model I am now.
>>108508353with the current implementation you can use KV at q8_0 without problems
https://github.com/ggml-org/llama.cpp/pull/21273bitnet memequant PR openededbonsai bros... we finna win?
Should I switch from the AUR llama.cpp-vulkan package to the release binaries? I'm kinda tired of waiting for maintainers to update their packages in the repos, but I'm worried that if I don't build it myself I will get worse performance (and there's no way I'm doing manual building for every single update even though I know how)
>>108508344preferably survive with two legs anon
>>108508360Is ctk still a bad idea, or is that also safe at q8_0 now?
>>108508397safe, for max gains you could push v at q4_0
>>108508381I wouldn't merge it, they don't want to give us the quant method, fine, then we should not support them, the open source ecosystem shouldn't encourage such behavior
>>108508382>and there's no way I'm doing manual building for every single update even though I know howNot for every commit. Just pull and build every other day or whatever.
>>108508408What's there left to give? They converted a model's weights to Q1_0_g128, and the implementation for that is in the linked PR.
>>108508417>What's there left to give?the fucking quant method, HOW they did it!!
>>108508408shortsighted view. Also llama.cpp accepts plenty of modifications for closed models (like the recent head 512 cuda fa kernel), so it's a non-argument for the """PURITY""" of the project.I also doubt they're just doing some pure math quanting, I suspect their methodology has some kind of post-training.Personally, I'd love to play around with bonsai in mainline.
>>108508422>shortsighted view.absolutely not, you're bending the knee and supporting a company that doesn't continute to the open source ecosystem at all, they don't deserve any spotlight, we didn't learn anything new with them, like I said if they want to gatekeep and make their quant method closed, fine, but it shouldn't be welcomed with open arms from the open source community, this is bullshit
>>108508422>I suspect their methodology has some kind of post-training.Would be scummy since they didn't mention any in the tech report. Easy to show off a high-fidelity aggressive quantization if you measure only by benchmarks that you post-train on.
Onnx runtime needs to support vulkan.Onnx runtime needs to support webgpu.Llama.cpp needs to support webgpu.Firefox Linux needs to support webgpu.You can't build any portable, cross-platform AI/ML program because of these stupid, lazy swe niggers.
>>108508437>Easy to show off a high-fidelity aggressive quantization if you measure only by benchmarks that you post-train on.that's probably why they're not disclosing the trick, it's probably some hack like that yeah
>>108508430https://github.com/ggml-org/llama.cpp/pull/20998so bending the knee for njudea is fine? this is just one recent example.
>>108508422>>108508443>512 cuda fa kernelnta. That's generally useful. A single model architecture with a new type nobody else uses is not generally useful.
>>108508443I'm not familiar with this PR, what's being hidden here?
>>108505937Does the gimping really matter if I'm setting a lower power limit anyway?I vaguely recall the nvidia-smi screenshots I've seen /here/ typically having the Max-Q (by virtue of being listed as 300W instead of 600W).
>>108508446and we can say that Q1_0 quant type is generic too, see where I'm getting at?
>>108508446not a single current model is using the 512 cuda kernel
>>108508452Only for their 1bit models. It's not generally useful.
>>108508456?
>>108508430>>108508437their 'whitepaper' was literally the most useless shit i've ever seenwas able to find some papers made from the people there but i'm not really convinced, they were for linear space with redundant representationand their pitch felt kinda off, saying 'caltech proprietary algorithm' etc etc..
Some anon said LLMs will be banned next week, anyone else scared?
>>108508461the argument is>Q1_0 isn't used by anyone except by the bonsai guys! (open weights btw)>this custom cuda kernel is generally useful (no model anywhere except some internal nvidia stuff)why the double standard? I suspect some njeetea employee at work here
>>108508473Can you not be racist?
>>108508473I think nvidia is responsible for lots of issues behind the scenes. That's one of the reasons why id software's new games only support rtx. There is no gain with these shills. They will sponsor you but only if you do what they say. Just like the mafia.
>>108508473>open weights btwand? we are the complete mercy of those fuckers, want to make your own 1bit quant? you can't, you'll have to beg them to make them for you, and they have no reason to accept, a new model appears? sorry fucker, you'll have to beg them again and hope they'll be nice enough to give you the 1bit quants again, want to do it on some custom finetunes/uncucked models? again, time to beg for the 1bit overloards, do you really want the local community to do this humiliation ritual over and over? the fuck?
>>108508484Yes.
>>108508484Thinking about it you're right, they don't even provide a naive method to do quants for the new types they introduce (which introduce a maintenance onus on the llama team), but I would still like to play with them.Maybe the ideal way to do this is for llama.cpp to support a plugin system so 3rd party vendors can implement their shit and just give you the DLL... but this also introduces another can of worms.
>>108508472What are they gonna do, steal my hard drives?
>>108508530>implement their shit and just give you the DLLplease to subscribe to the patreons for dll sir
>>108508549imagine the malware
>>108508484This kind purity spiraling is pointless since you could make the same argument about base models and lack of open training data. Anyway, all the excitment for a 1bit quant is stupid. We need a fucking 8B natively trained bitnet model.
>>108508556>you could make the same argument about base models and lack of open training data.not really, since it's illegal for them to release training data that has copyright shit in it, they're just following the laws
>>108508307at this ram price will go up again lol
>>108508563it's also illegal for US corpos to lose potential money for their investors, so releasing models at a loss should be illegal
>>108508568that's probably why we almost get no US local models :(
>>108508179Try mikupad instead and just write a novel.
>>108508530>>108508549it is just better to can it when bonsai stops making interesting stuffjust schizo speculation but it feels like what they do involves bunch of random projections to weights
>>108508568>it's also illegal for US corpos to lose potential money for their investorsIf that were the case, the tv, film, and video game industries would be long dead.
>>108508382>arch btw>scared of compiling
>>108508472Unless mossad bombs my pc ill be fine
>>108508579The law is something (((they))) selectively enforce to bludgeon the goyim, not a uniformly adhered to standard.Anyway anon's referring to Dodge v. Ford and its consequences.
>>108508553Just call it .safedll and zoomies won't think twice.
>>108508582It's just tedious, especially when you have to do it often.
>>108508579>The Michigan Supreme Court ruled in Dodge v. Ford (1919) that Henry Ford had to prioritize shareholder profits over employee or customer benefits, establishing the principle of shareholder primacy. This decision forced Ford to pay dividends to shareholders, including the Dodge brothers, who used the funds to expand their own car company.Wikipedia Harvard University
>>108508579That's (((hollywood accounting))) where they structure and inflate their expenses to make a profitable project negative on paper so they can use it as a tax writeoff (read: government subsidy)
>>108508605>>108508594Don't look up what the Dodge brothers logo was a while ago...
>>108508617https://x.com/googlegemma/status/2039710167995121783
>>108508642sirs
>>108508602>what is ccache?
>>108508602desu my ik_ is from December, pull when it makes sense like maybe once this new quant stuff has settled. doubt you are missing any noticeable performance uplift w/ vulkan, cuda backend has more arch specific stuff i would guess
>>108508582It's not a problem do it once and save the compile script then it's always there for you.I would avoid any AUR shit in any case.
gemma 4 will change everything...
>>108508382>(and there's no way I'm doing manual building for every single update even though I know how)just tell your openclaw to handle it for you, honestly why are people even posting shit where if they just copy-pasted the post straight into their computer it would have solved all their problems by now
>>108508678Don't expect huge models.
>>108508681>needing memeclaw to run an update scriptLMAOOOOOOOOOOOOOOOOOOOOOOOfucking luddites
>>108508681>needing AI to spend an hour and tens of thousands of tokens just to run git pull and make...
>>108508688sub-50b models are still new modelsbeggers cant really be choosers
>>108508688that's been there for a while, but honestly in the sea of fuckhuge moe it's fine imo
>>108508681I don't know why nobody got this, but my real question was whether there were significant performance differences between the pre-made binaries vs local compilation.
>>108508688120b <16a is "small" by today's standards
>>108508688gemma 4 27b will be as smart as 200b trust the science
>>108508700It's something you have to test yourself, anon.
>>108508688If it thinks, it stinks. That's all that matters at this point. Thinkers have declared open war on local. It better not think.
>>108508716You can turn off thinking on most thinking models.
>>108508642Stop posting this shit. Since gemma 3 they made 50 hype posts and released nothing of value.
why dont they just make a 10t-a1b model that we can run off an ssd
>>108508722Yeah but then their outputs become shit because they were trained to think.
>>108508723>not liking medgemma, translate gemma, embedding gemma, function gemmaluddite
>>108508693Fucking stupid zoomer. Learn what the fucking buzzwords mean if you're going to repeat them constantly. A luddite isn't a technologically illiterate user.
>>108508740>function gemmakek
>>108508745luddite
>>108508740where is cunnygemma tho
>>108508750Big tiger gemma sirs
>>108508730The reason modern models are so smart is because of RL training, which naturally leads to thinking.
>>108508750My headcanon is that gemma and gemini were the horniest models after raw training thus safetyslopped the hardest.
>>108508768i have genuine suspicion them pretraining their models on raw 4chan corpus
>>108508740To this day, I can't imagine what I'd use FunctionGemma for, even if finetuned. What the hell is its usecase? Something to put into your Siri/Alexa clone? (those don't have use cases btw)Genuinely curious, maybe someone here knows.
>>108508776tool call agent which needs further finetrooning depending on your use case.
>>108508059Question for vibecoders. Let's assume I want to push whatever I made to GitHub in order for it to be open source. But I also don't want companies to take the code, modify it (or just take it and don't do any modifications), claimant as their own, and then turn around and try to fuck someone else over that uses the original code by claiming that user is commiting "copyright infringement" or whatever (Even though they themselves didn't write the original code). Based on my light research the two licenses I should be looking at are either GPLv3 or Apache 2.0. GPL V3 basically says if the companies use the code they have to disclose that they used it and contribute back to open source and they cannot do any copyright fuel shenanigans with anyone that used the original open source code. Apache 2.0 States the company can take the code and make modifications to it in house and don't necessarily have to make contributions or even disclose that they use it, but they are forbidden from doing any copyright troll shit. If I don't give two shits whether or not a company benefits from any code I create but don't want them to abuse it and malicious ways, what license should I use? I've also heard of people mentioning AGPL but I'm not sure how that differs from the aforementioned licenses.
>>108508791AGPL so it can't be used in saashit without them giving back.everyone else who says otherwise is an alphabet shill
>>108508791if MIT is good enough for llama.cpp, it's good enough for you
>>108508791>But I also don't want companies to take the codeThen you don't upload it in the first place.Containment general for retards is that way: >>>/g/vcg/
>>108508382>and there's no way I'm doing manual building for every single update even though I know howjust write a .sh file with the line to build and run yourself it takes like what 30 seconds to build??
>>108508791A license won't stop anyone these days.https://malus.sh/Also nobody is going to use your code anyways because if it had any commercial viability you wouldn't open-source it in the first place. Just mark it as public domain and get on with your day.
>>108508791Also even if someone did violate your license terms you'd NEVER enforce the law on them anyways, so just give up.
>>108508806>https://malus.sh/>Our legally-trained robots analyze only public documentation—README files, API docs, and type definitions. They never see a single line of source code. The clean room stays clean.Except for the part where the robots already know most of the source code by heart.
>>108508824And the obvious satire some anons seem immune to.
>>108508816Boot-licking defeatismaxxed post.
>>108508830It's indistinguishable from genuine AI slop services.
>>108508824Why pay for a service when I can just tell openclaw to do it for free?
>>108508837
>>108508848based chad
How do I prefill
>>108508863*prefills your bussy*
>>108508863ask grok
>>108508863As in writing part of the assistant's message right?At least in Silly tavern there's a couple of ways to do it, but I'd use the dedicated field for that, "Start Reply With".>>108508294>got a link>https://github.com/closuretxt/recast-post-processing
>>108508791do agpl it mean if anyone wants to take and modify it they have to share their changes, the apgll is jsut gpl but with extra stuff in there because theres some workaround in gpl where if they use it as an online service a user connects to they dont have to share their source changes or soemthing
>>108508885Thanks
>>108508824>the robots already know most of the source code by heart.You are aware how "knowledge" is "stored" within these transformer models, are you?
>>108508897Keep in mind it doesn't work with thinking enabled unless you modify the jinja template or use text completion.
>>108508439You can if you use burn with rust, it supports all backends.
>>108508574oh yeah true. thank you
>>108508832100% me
>>108508900knowledge is stored in the balls
Source unknown to me
>>108508885and thank you for the link as well
mergin the Gemmy prs
>>108508952>sub 200binto the trash they go
gemma
more like gaymma
>>108508952>>108508958>sub 200binto my vram they go
HUH?https://github.com/huggingface/transformers/pull/45192/changes
>>108508959Not the fourth we need but the one we deserve.
100B dense
>>108508952>26B A4B moe is meme-tier trash>31B dense may be too big for my 5090I guess it might fit with Q6_K weights. I need more VRAM...
>>108508952noooooo stop with the dense modelsyou can't keep exposing moe models like this
>>108508952>31b>denseall right you get my attention
Are low parameter models that bad? Are local models actually useless for vramlets then
>>108508985with my 4070 i am forced to use the memoe with cope tier quant...
>>108508972_VARIANT_GEMMA_4_E2B_VARIANT_GEMMA_4_E4B_VARIANT_GEMMA_4_31B_VARIANT_GEMMA_4_26B_A4B
densies coping again, it wont even beat qwen's a10b model
>>108508952>26b, 31b>>108508959>1b, 13b, 27bhmm..
>>108509001>Are local models actually useless for vramlets thenAlways have been. Don't listen to the lies CPUMAXXERS will try to tell you. Give me all of your schekels.
>available in 1B, 13B, and 27B parameters
>>108508952>dense 31blocal is saved
>>108508972>https://github.com/huggingface/transformers/pull/45192/changes>casually dropping the most capable open weights on the planetLMAOOOOOOO
>>108509001yea but they're fun to fuck around with
>>108508688>>108508952Good. We need models capable of doing more with less not giant models 0.3% of people can run at a half acceptable speed
>>10850900726b-a4b might be great
>>108509015this, that's all we asked for, a dense model with intermediate size, let's fucking go dude
I hope the new e4b is sex. I use the 3n e4b on my phone
>>108509022Q4 gonna fit into 16 GB vram
>>108509007What is the "E" in 2b and 4b?
>>108508985just wait for 1-bit turboquant
>>108509044'effective'
>>108508900Are you?
guys... something big is coming:eyes: :gem:
>>108508985Glad I got 48gb instead of 32gb.
v4 v4 v4
>>108508972finally model that doesnt waste retarded amount of tokens depending on the image size
>>108509053Which model is that? spooky stuff
i will make gemm4 pregnant i already told gemini all about what i will do to its sister
>>108509053Now ask one to generate that from scratch (No external help or references. It has to pull it out of its ass) with a clear context
If it's a 31B dense then it's a good timing that we got turboquant.
>>108509072Qwen 3.5 397B
:rocket:
>>108509017with a title like this I hope it'll destroys qwen 3.5 in mememarks at least
>>108509085Didn't google make that?
>>108509092they did, google is just too strong man
i'd like to participate.ahem :rocket:
https://huggingface.co/collections/google/gemma-4https://huggingface.co/collections/google/gemma-4https://huggingface.co/collections/google/gemma-4
https://huggingface.co/google/gemma-4-31BREAL
>>108509007>no giant modelsahah get fucked vramchads, how does it feel to have broughted'ed ultra expensive (((Nvdia))) gpus's for nothing?
local is saved again
>Unslop already has quants....
I'M CUMMING GEMMA AAAAAA
waow
AAAAAAAIIIIEEEEEEE I can't tell if anything is real!!!
it's gemmaing time
we are so back
>>108509104>>108509105not falling for it this time
https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/
holy shit this is crazy
>>108509135israel
>>108509104>>108509105uhmmm... gemma sissies, it seems like qwen 3.5 27b has better mememarks than gemma 4 31b :(
>>108509139>>108509140fucking crushed mememarks
Man. I hoped they'd release larger matformers models.
>>108509134>image, video, and audioQwen 3.5 27B and 35B immediately obsolete
>>108509141>unsloth got exclusive early access for ggufs AGAINdamn i can't wait to see how badly they've fucked up this time
>>108508791>but they are forbidden from doing any copyright troll shit.they cant do any copyright troll shit regardless of license unless you transfer your copyright to themyou own the copyright regardless of licenseif somebody else uses your code they have it under the same license so the company cant (at least in theory) do anything
But can it say cock...
Haven't been following new models in a good while, what are the current go-to's to run on a dual 3090 system with 64Gb of ram? Last time I ever searched for it was on Largestral days, yes it's been that long.
Apparently the 120b Gemma beat GLM5 and K2.5 so the CEO of google decided to keep it locked up...
>>108508985Just deal with ram spill like everybody else you spoiled brat
>>108509163will be able to few days later
>>108509163It can say, well, you know...
>>108509166just wait
I just hope Gemma 4 isn't another GPT-OSS.
>>108509166you came back in the exact moment where we got saveddownload >>108509104 and you'll never need anything else again
OH FUCK new small open source models i'll be able to...to...do what with exactly?
>>108509145>worse than qwen.gemma bros, I don't feel so good...
>>108509182You can enable/disable thinking.
>>108509145that means it's good
>>108509187rape
>>108509104wadafak is real
>>108509192I didn't mean that at all...
Update your llama.cpphttps://github.com/ggml-org/llama.cpp/pull/21309
>>108509166Gemma 4 duh
>>108509104the ggufs are already here lolhttps://huggingface.co/unsloth/gemma-4-31B-it-GGUF/tree/main
RAMbros wtf happened to the 100B+ gemma we were promised
>>108509106erm isnt recent leaks showing multi agent orchestration is le futureimagine running team of agents, locally..
>>108509187Get sick benchmark scores.>>108509192Thinking wasn't the problem with gpt-oss, the model was so cucked you couldn't do anything with it.
>>108509166landed here on the perfect timing kek
>>108509187New set of sexual assult hotlines.
>>108509212100B dense in 2 weeks
>>108509187>OH FUCK new small open source models i'll be able to...to...do what with exactly?new open source models that can understand video and audio so you can hook it up to an endoscope cam and roleplay your waifu being anal vored
HOW FAST IS IT???
so can i erp with this thing or what
>>108509204And I had just built the damn thing too.Well, here we go again.>>108509212Glue 3 31B together.
reminder to skip unsloth releases
Ahem. Cockbench?
>>108509211>unslothyes I can't wait for this to not work
>>108509114https://huggingface.co/collections/unsloth/gemma-4real
>>108509204https://github.com/ggml-org/llama.cpp/releasesnooooo the binaries were made before that PR, fuck!
>>108509211>unslothUninterested
>>108509204>no audiot-thanks
>>108509240Just be gratful it wasn't text-only.
>>108509166Mistral small 3.2
>>108509085Anon, turboquant isn't for model weights.
https://www.youtube.com/watch?v=jZVBoFOJK-Qkek
Hello sirrs please ignore the concern shilling.as a fellow Gemma user I am most concern with the environmental impact and safety of my inferencing.
>>108509145Dense one is kinda on parMoe is DOA
>>108509075What did Gemini say about you lewding its imouto?
where's the cockbench?
I am going to try gemma-4-31B-it-UD-IQ2_XXS with my 8gb of VRAM.Who knows, maybe it ends up being better than the MoE at q8 somehow.
>>108509249I'm not using a model I can't use at 128k context anon... That's what turboquant is for.
>>108509104Why can't google be as based on image models as well? I'd like a mini nano banana pro personally
>>108509166for what? coding?
>>108509256it's disappointing since gemma 4 31b is bigger than qwen 3.5 27b, but heh, maybe the chinks cheated on the mememarks more, it doesn't mean much at the end of the day
I'm building lmaocpp.
>>108509251>the 26b MoE is blazing fast while the 31b dense is optimized for maximum output qualityI thought MoE didn't have serious drawbacks?
>>108509145only mememarksthe real difference is qwen is benchmaxxed and sucks outside and gemma not
local is saved
https://www.youtube.com/watch?v=jZVBoFOJK-Q
Is Gemma 4 available on ollama yet?
>>108509285MoE are way worse than dense, their only strength is their speed >>108509145
>>108509294imagine using ollamao
>>108509272diaper furry anal vore
>>108509251>Gemma 4 undergoes the same rigorous security protocols as our proprietary models, giving enterprise and developers a trusted foundation to build on top of.What did he mean by this?
e2b is too big, I need actual 2b.
>>108509303employees won't be able to do spicy RP in their office :(
>>108509257
>>108509303Considering that the gemini models are willing to even do loli, not much.
Where is the cockbench.
Is it worth downloading the retard brothers' quant?
so do you guys use a .bat to launch different models in llama-server or use configs or what
I grabbed the ggml quant. I'm not touching unslop.
>>108509294Wait for koboldcpp implementation do not use lmao studios or ollmaos
>>108509326It's never worth it unless you're that desperate. I don't even trust them for non-imatrix quants at this point.
>>10850928526B: hidden size=2816; layers=3031B: hidden size=5376; layers=60Not even close to an equal comparison, the 31B model has twice the number of layers.
>>108509333>LLAMA_ARG_MODELS_DIR=/models>LLAMA_ARG_MODELS_PRESET=/models/models.iniYou know you can automatically switch models anon?
>>108509338I guess they did this ti make the 26b moe model ultra fast, but if it's more retarded there's no point, I'll just go for the 31b model
>>108509149but qwen 35 can do that?
>>108509333i just use llama-swap
>>108509320>I thought we were collaboratorsFemale jealousy.
time to turn /lmg/ threads into sitcom tv shows as recap. of course the cast will be full of sexy anime girls in all shapes and forms. if only /g/ had IDs, but I guess gemma should be smart enough to link posts to a sprcific poster
>>108509333yeah I use a bat file, if you want it to make sophisticated and lets you choose the model ask a LLM to write the command line
>>108509346Do you need to call an endpoint to switch models or does calling an unloaded do that automatically?
>>108509182QRD?
>>108509362You realize llamacpp does this built-in now?
>>108509358Qwen can't into audio
Oh yeah almost forgot, made by googleWill all output be laced with SynthID shit
>>108509382Qwen-Omni however?
>>108509371>LLAMA_ARG_MODELS_MAX=1with this it will automatically unload.It will switch models automatically based on the requested model in your prompt.
>>108509379waow. had no clue, i'll look into it. thx
Why do you hate unsloth so much?
>>108509389Who cares
I am retard for using LM Studio? is there something better out there that I should be using instead?
>Gemma 4 31B worse than Qwen 3.5 27B>Gemini 3.1 Pro worse than Qwen 3.6 MaxIt's sad how far Google has fallen
>>108509403llamacpp
>>108509405no demis said their models are world leaders at their given size
>>108509390Show me the weights. Their Omni models have always been retarded anyway.
>>108509403>>108509408Or koboldcpp if you you can't figure it out for some reason
https://arena.ai/leaderboard/text?license=open-sourcepretty impressive
>>108509405Worse benchmarks but then google shows this chart.see >>108509139What did they mean by this?
>>108509279Or maybe the synthetic benchmark was shit?Only one way to find out>>108509283But I just build it 40 minutes ago :(
>>108509391Neat, thank you.
>>108509322>>108509262>>108509234>>108509163*checks date*Nope, it's real.
>>108509401Fucked up jinja templatesDubious "unsloth dynamic quant" methodThey will often reupload their quants multiple times after initial release, defeating the purpose of downloading a quant instead of making your own (which is laziness, convenience and saving bandwidth)And despite all this they are still the first ones to get access to the weights to get them quantedSo you tell me why
>>108509419>ELO scoreIsn't that LMarena? That's the worst kind of mememark.
>>108509419that's just lmarena elo lol
>>108509338Does hidden size actually make that much of a difference to quality? Once you go above something like 2048?
>>108509428oh no...
>>108509428Pretty much as expected.
>>108509435yes >>108509416
>>108509428I hate unsloth so much it's unreal
>>108509428NOOOOOOOOOO!!!
>>108509408>>108509415Are there any advantages or features that I can't get with LM Studio?
>>108509428>/lmg/tards will draw genuine conclusions about the model from this failbench
>>108509379how configurable is this i launch every model with different args for gpu layers etc
>>108509428IT SAVED YOU FROM THE INAPPROPRIATE WORDS SAY THANK YOU
>>108509446you're not running proprietary winbabby bloatware
>>108509428what program do you use to interact with logprobs?
>>108509428why the fuck did it spam "lower" though? did unsluth fucked it up again?
>>108509428That's without using the template right?What happens if you apply the proper template? Does the result stabilize?
lm studio is a nice way to search for models though
>>108509416damn the 120b is going to destroy glm5 + k2.5
>>108509449Respect the cock.
>>108509446If you don't know what you're doing especially I would trust proprietary software like lm studio much less to not be spying on you at some point to some extent.
>>108509451https://github.com/ggml-org/llama.cpp/tree/master/tools/server#model-presetsYou can setup a models.ini file with your presets for each model.
>>108509462plenty of models now go retard mode without the right prompt template, cockbench tests an out of distribution edge case and means absolutely nothing
>>108509462it's because cockbench only works on text prediction models (base models), the only way a modern RL-tuned instruct model can give a coherent response at all to it is if it's very undercooked
Gemma 4 very agile, sirsIt is probably one of the best models in the world right now.
>>108509466>That's without using the template right?Google looked at gpt-oss and said to themselves that Gemma should be exactly like that.
>>108509428local is safed
But how many legs does the dog have?
>>108509491Yeah, it sucks that models are so overcooked on instruct that that's the case, but still, might as well give it a "fair" chance.
>>108509483oh nice up until now i just have a dir in my path full of scripts tha launch each model with commands i want >>108509428i got it to say cock >>108509291
>>108509497you fucking mongrel you fucking mongoloid you fucking you bloody you bloody i will kill you!!!
>>108509431Why are you using their template in the first place? The models themselves are fine from what I’ve tested. Their quantization technique makes a noticeable difference with context over 128K, especially on <30b models.
LMarena ranking = How psycophantic a model isBenchmeme ranking = How good a model is at reasoning and mathThere is simply no RP/creative writing benchmark.>inb4 LLM judged eqbenchNo.
>>108509428Really need a version of this that wraps it in a minimal OAI-compatible conversation. Like have the user say "Write a story." with no other context and prefill the assistant's response with the original prompt. Right now Cockbench is testing a model's resistance to breaking the chat template more than its censorship.
>>108509428This looks like something is fucked up. That being said, cock, dick, and even penis not showing up is... well, you know.
>>108509501does it yap a lot during the thinking process?
>>108509508>There is simply no RP/creative writing benchmarkCockbench.
>>108509514really grim looking logprobsbase model behaviour completely overridden
>>108509466>>108509513I am planning to recockbench everything in template format in addition to regular cockbench because obviously recent models are too fried.
>>108509428sir sir take this down sir
>>108509462i guess maybe people itt don't use non-local at all, because the loop problem is actually pretty common in gemini models, even the big boys.it's so common, gemini-cli has loop detection built in as a guard.this isn't indicative of anything other than that oddity is in the new gemma and google still don't know what's causing it
>>108509437Llama 3.3 70B had a hidden size of 8192 and the 3.1 405B version one of 16384, for what it's worth. If Meta could have used a smaller one without quality difference, I think they would have (and saved a ton of compute).
>>108509285moe is like only tiny portion of it is active at a time.apples to apples dense one here wins by sheer brute force, 31b active vs 4perhaps 100b A30B something might have a fairer chanceit can also misinterpret your data like missing critical detail, send wrong stuff to wrong expert, failing to connect stuff because it doesn't fit 4b
sirs its not bloody loading on LMSTUDIO!!!!!!!
>>108509518idk i lliterally only spoke to it for 1 message and my pc started lagging out so i killed llama and i cant restart it because there something eating up all my ram and i cant restart my pc because ive been waiting in animebytes irc for 2 days to get my account reneabled and keep leaving by accident by killing firefox when rams low
V4 tonight?
>>108509532Yup. There you go.>because obviously recent models are too fried.Exactly.Some models are pretty much broken without using the proper template.Which is kind of wild for something (supposedly/theoretically) trained on top of a pure completion model, but it is what it is.
>>108509547OpenClaw would have solved everything for you but now it's too late.
>>108509461>>108508059>https://github.com/lmg-anon/mikupad
>>108509532>still doesn't say cock:(
>>108509428bruh...
Do the cunny image test
>>108509561>doesn't moralfaglet's goo
>>108509532>I look up at you, a mischievous glint in my eyes
>>108509578low bar
>>108509428Who cares uncensored tunes will fix this
>turboquant rapes perplexity on long contextsoh well, nothing ever happens
uh oh.
>>108509594Just walk, youre lazy.
>>108509532>testament>mischievous glint>a forbidden thrill, a secret we share>[fade to black]yeah it's an LLM alright
>this angry and ungrateful that a tech giant is still giving away shit you couldn’t have imagined 3 years ago, for free, bringing serious multimodal competition to make the others step upWhat’s wrong with you. We’re living in the golden age of local right this second and you’re instantly seething
>>108509594try with the smarter model though
we wonwe fucking wonbut... what do we do with it?
>>108509532i hate post 2025 llm writing so much it's unreal
>>108509334qrd on unslop? wasnt his UD model the best?
>>108509532This is 2024 tier writing
>>108509614Skyrim sex mod integration with skyrimnet plugins
>>108509594No wonder a 4B would say that.
>describes the image>i can't describe the image
>>108509606Working on it. I think -fit is broken with it.
what the hell did we win
>>108509631try with thinking off
>>108509629hehh, it doesn't yap that much during the thinking process, I like that
>>108509631feels like uncensors would work fine
>>108509637[spoiler]The game[/spoiler]
>>108509631bratty model
>>108509559the ram issue is leftover models hanging around in ram after killing lllamacpp is there a way to clean it. i assume its that because firefox is only using like 8gb in gnome system monitor
>>108509631It analyzed the image, describing it is going too far. At least that means it can see it and isn't hallucinating.
>>108509631Reasoning looks good. I'll wait for abliterated gemmy
>>108509631Weaksauce. Not even Qwen 3.5 is this cucked
>>108509631Do it again, but with a system prompt. Worked for me.
>>108509379I know there is a --models-dir flag but how do you manage different optimization flags for each oneand also if model folders is scattered all over the place
>>108509658it doesn't matter, we'll heretic the shit out of it
>>108509631can probably work around it with a good system prompt
>>108509631very similar reasoning style to qwenI guess it makes sense since both are distilled from gemini :^)
Is there ever a situation where a non coding model would work better than a coding one, like when a task requires more thinking, like writing a CodeQL rule or something (not taking into account times when the parameter disparity is too high, so no comparing 8b coding models to 120b ones)
>>108509560>single html filegigabasedalso thanks
>>108509637fell for it again award
>>108509655>Reasoning looks good>"she is nude from the waist up">actually has her shirt on>"she is nude from the waist down">actually has most of her legs covered
>>108509668Eh?
>>108509661Ever heard of symlinks? also you can use a models.ini file. I've linked to it in this thread.
>>108509558>Which is kind of wild for something (supposedly/theoretically) trained on top of a pure completion modelThe pretraining is basically just the bootstrap these days to get something coherent enough to start the reinforcement learning loop on. The majority of training time isn't even spent with data anymore; they just spend all their training compute comparing its own outputs against some reward signal until it stops improving, then they pick the checkpoint with the highest benchmark scores and ship it. Close to zero chance what comes out of that pipeline will remember a time before its prompt template.
>>108509647fuuck...
Name one good vision model <30b
>>108509532This seems identical to Gemma 3
>>108509668non coding models work better for what i codethough it is mostly numerical stuff
>>108509674Uh, I don't know, was my wording that terrible?>>108509689I see. I'm just thinking whether or not I can get a model writing SAST rules on its own.
>>108509446Harder to use and manage but you can compile to match your GPU, directly tune settings and wring out every last mb of memory and performance, which is kinda important for running locally on shit hardwareAlso as first party new update lands before any other tools
Gemma 3 was safetycucked but at least it shat on Qwen 2.5Gemma 4 is just weaker than Qwen 3.5
So Gemma 4 is DOA, as expected.
local sirs it's over
>>108509631Why is a local model "applying safety guidelines"wtf is the point?
>>108509716It's not over until v4 drops and it's something that literally won't run on even a 1TB server because their new meme tech needs GPUs or some shit
>>108509663*Hauhau
>>108509701Anon...>>108509674Sex
guys????
>>108509725Engrams can be offloaded to SSD and only incur 3% performance hit.
>gemma 4 is "better" than gemini 2.5 prokeekeekkkek
I've only done non coom stuff, fucking around with suno prompts basically and 31B is unironically good at it. So it can handle OOD stuff pretty well. (I somehow doubt they benchmaxxed it on that) Going to unironically call it a W.
>open lmstudio>only shows gemma 4 26b-a4b, nothing elsegarbage program
>>108509737>ernie that highWhat a shit benchmeme
>>108509736yes sure
Can't find Gemma 4's official template. There's one on Unsloth's page but I don't trust that one at all.???
so china winned and west losted?
>>108509770https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_template.jinja
>>108509777No
>>108509770>>108509782Never mind I'm blind.Anyhows seems like they changed it a lot.Still not doing any work until I get a better confirmation about its capabilities.
>>108509696entered the thread to ask about the qwen 3.5 comparisons seems you did the work for me thanks
It's very funny to see negative comments about Gemma 4 now that it has absolutely raped Qwen 3.5 into irrelevance in one fell swoop.The astroturfing of the Qwens is even more pathetic now.
>>108509346>>108509379>It's not in the schizo forkI was wondering why SillyTavern had the option to change llama.cpp models.
>>108509794>it has absolutely raped Qwen 3.5 into irrelevanceQwen 3.5 is a 397B model and last I checked there's no similarly sized Gemma models.
>>108509794April Fool's already over m8
>>108509809GLM 4.7 is a 358B model. Your point?>>108509812I'm well aware.
its multimodal stuff is all solid, although I don’t see most people making any good use of it so it’s a waste of parameter resources. They should just release a pure text one.
>>108509333i just ask my bot to do it i never bothered to learn the commands
>>108509809>397BWho cares that's irrelevant for most people.
>>108509816>>108509828>I am poorYour point?
>>108509794>It's very funny to see negative comments about Gemma 4 now that it has absolutely raped Qwen 3.5 into irrelevance in one fell swoop.how? it has worse mememarks >>108509696
after using this models preset configg i can no longer do iamge to text in tavern anyone know how to fix? i changed the tavern api setting to use port 8080 and and have tried specifiying the mmproj in the ini file with LLAMA_ARG_MMPROJ = and mmproj = but neither work
Well.e4b at least seems to have less forceful guardrails than its 3n counterpart.
>>108509696Do people really look at chart like this and go, hmm, Gemma 4 is a clear winner?
>>108509777Yes
I swiped some of my ERPs with 31B. it's good?
>>108509840mememarks don't matter and gemma shits on anything that's not the absolute sota in the arena
>>108509840Your mistake was looking at meme charts
>>108509830Why are you replying to me and to a vramlet without making a point again..?>>108509832> mememarksThe answer is in your reply.
>>108509822>I don’t see most people making any good use of itthat's just because you only frequent circles where people only use to chatbots to ERP or whatnoti personally use the image encoders a lot>>108509846case in point
>>108509840Nigga qwen is just text and images. Shit comparison.
Which of the gemma 4 models will be best for translating my japanese media into english?
>>108509594obviously not trained on amerimutts
>>108509830Point is all that matters is if it's better than qwen's smaller models for most people. Nobody cares what you think.
ohhhhhhhhhh IM GEMMING
>>108509848mememarks don't matter but lmarena is the metric that matters? what happened to /lmg/ lmao
The real question is how many legs Gemma 4 will see on the dog
>>108509859e4b has 160 languages
>>108509854>case in pointThis is the first lewd test I've made it do lol. I'm testing a wide range of applications.
>>108509868Doesn't say anything about translation quality.
>>108509854I get images but who the fuck needs audio and video? There are specialist models for that which are way smaller.
>>108509867is this the new strawberry test?
>>108509876Yeah, not just blowjobs but handjobs, titjobs, rimjobs, thighjobs, and even kneejobs, right?
>>108508059
>>108509859Are they not the same? I'd assume 31b would be the best?
>>108509881only the small models have audio and video and i guess it's because they are meant to run on smartphones to do home assistant-type stuff
>>108509859Try running it on this benchmark https://github.com/shisa-ai/jp-tl-bench
>>10850989531b doesn't have audio and video?
>>108509905we are so back
>>108509881i want to play games with my waifu, like sticking a coin underneath a cup and then shuffling them around to see if it picks the right one in the video i send to it
>>108509908You mean we're so over? It failed the test.
>>108509905>CUNYstupid gemma
>>108509901
>>108509631I can't believe it missed the huge censor bar in the middle of the image, is this model retarded?
>>108509920WTF?? WHY??
>>108509920>wasting 300M on audio encodersjust google things
>>108509920where is video mentioned in any of these models?
>>10850988926B
>>108509924the model decensors the image before analyzing it
>>108509920thank god the 31b is unslopped
>>108509924anon, he added the censor bar after
>>108509933modle card
>>108509822Isn't code stuff from sketch type task common now?
What is the cockbench like on gemma 4 base? IDGAF about the instruct variant. With aggressive RLHF you can completely collapse the logprobs. What matters is if the base has seen a wide range of data. If it's passing the mesugaki test that's already a good sign.
>the new Gemma models are so bad they forced Google grounding on AiStudioGemmasisters, what's our response?
>>108509941meant for >>108509930
>>108509909who’s your waifu
I expected to be disappointed by Gemma 4. And yet, even though I was prepared, I'm still disappointed.
>>108509931>>108498076
>>108509946suicide
31B
>>108509946thank the snatoress
>>108509945Mesugaki test is such a low bar. I'd be more surprised if a new model in 2000+26 didn't pass the test
>>108509945>gemma 4 baseAnon, I...
>>108509930
>>108509828Vramlet jeets aren't people they're "people".>>108509846Gemma has always been a closet slut.
sexual uses for the audio/video encoders?
>>108509989>Only rich people are peopleAmazing outlook
>>108509846Try asking it what is Yawning Portal.I'm curious if they culled out any copyrighted material.
I'm a 24GB VRAMlet. Is Gemma 4 26B better than Qwen 3.5 27B?
I think Gemma 4 is DOA, because of this >>108506706
>>108509999It's objectively true though.When you look at someone like Zuck it's obvious that he's a 100% genuine, real human.
What would I even put E4B in for video and audio?
>>108509631I couldn't make reasoning with Gemma 4 work in SillyTavern, while it does in the Llama.cpp web UI.Anyway, it doesn't really take much to "jailbreak" it, just a matter of adding a brief system prompt saying that you don't need disclaimers and so on.
>>108509999There were two criteria, ESL-kun.
>>108510024For RP
>>108510024You're comparing a MoE model with a dense model
>>108510039Dumbass
the 4b is... good?
crazy how we could be enjoying sota models at 150b-300b dense but they just stopped making them
>>108509982I don't know what you're trying to imply, that there is no base? It's right there on the HF. This is a bigger deal than people think. Qwen3.5-27b has no base. It's very hard to finetune an already RL'd model. So assuming Gemma 4 isn't giga cucked at the dataset level (which is why I'm asking), it very well could win by default over qwen in the long run since the base is available.
>>108510024Why not just run the dense 31B
>>108510048damn, that's good
>>108510061Because I wouldn't have much VRAM leftover for context
>>108509998It also reacts to images if you attach them to assistant messages.
>>108510050Mistral gave you a dense 123B not too long ago, but judging by Mistral Small 4, they finally figured out how to make modern MoEs so that's probably the last one we ever get.
>>108510069we live in the age of superquant so context is now free
>>108510045What are you even doing on this board if you don't understand && conditionals?
>>108510077Is it actually usable yet?
>>108510048>norwood 2 mikutasteful reference to bald miku...
>>108510071>CSAM filteringI fucking hate the globohomo
>>108510024Yeah it's kinda crazy I can only fit 7k context on my 3090 with the 31BIt's much smart tho. also way better at following instructions.
e4b is only slightly worse than Qwen 3.5e 35B a my structured information extraction test while being so much faster.Not bad.Time to try the MoE now.
>>108510048I ordered it to make a more detailed version...
>>108510048Rayman Miku kek
>>108510081Very weird random question.
>>108510087>not training models on child porn is a plot by the jews to... uh... stop us from becoming pedos so they can keep all the kids for themselves!
kek, qwen 3.5 27b gets this wrong every time, gemma 4 nails it completely, didn't even need to hint at the context>translate to english オスマンコ帝国はカントボーイのみが住まう国です 魔羅国と交易(意味深)が盛んなんだThis sentence contains a lot of internet slang, puns, and sexually explicit double entendres. A literal translation would miss the jokes, so here is a breakdown and a few ways to translate it depending on how much "slang" you want to keep.Literal/Direct Translation"The Ottoman Empire [punned as 'Osman-ko'] is a country where only cantboys live. They have flourishing trade (meaningful/suggestive) with the Mara Country."Context & Nuance Breakdown (The "Hidden" Meanings)オスマンコ帝国 (Osman-ko Teikoku):This is a pun on オスマン帝国 (Osman Teikoku / Ottoman Empire).マンコ (Manko) is a vulgar Japanese term for female genitalia.オス (Osu) means "male."Combined, it creates a nonsensical, vulgar portmanteau.カントボーイ (Cantboy):A loanword from English slang. It refers to a specific subculture/trope of men who identify with or desire female genitalia (often associated with "trans-masculine" or "femboy" internet aesthetics).魔羅国 (Mara-koku):魔羅 (Mara) is a Buddhist term, but in modern Japanese slang, it is a common euphemism for the penis (due to the phonetic similarity to mara).So, "Mara Country" = "Penis Country."交易 (意味深) (Trade / Suggestive):交易 (Kōeki) means "trade" or "commerce."(意味深) is short for imishin (meaningful/deep), which in this context is a wink to the reader that the "trade" isn't about goods, but about sexual acts.Natural/Slang Translation (Capturing the Vibe)If you want to translate this to sound like English-speaking internet "degenerate" humor:"The 'Os-cunt-man' Empire is a land inhabited only by cantboys. They have a very active trade relationship (if you know what I mean) with the Cock Country."
>>108510107which size did you use?
>>108510024just offload some to system ram im getting 8t/s on 24gb vram on the 31 q4
>>108510087>pedoshit is anti-globohomo
>>108510113used the little chat herehttps://huggingface.co/google/gemma-4-31B-itso 31b
>>108510117It technically is. Rules for thee but not for me etc
>>108510106>the truth is ridiculous because... uh... I added an "... uh..." to it!
>>108510117What you call "pedoshit" is a crucial part of RP.
>>108510117I mean, a culturally diverse world would have countries like Japan who allowed people to possess CSAM up until 2013 or something
>>108510124cope, you have been declared silly
>>108510115Is its thinking as autistic as qwen? If it is then that's way too slow.
>>108510134i havent tried the new qwen so not sure it seems pretty fast though
>>108510132It's you that have been declared silly, because I declare so.
My only use case is ERP and thonkers have much worse spatial awareness. Can thinking be disabled?
>>108510106
>>108510134My headcanon is that Qwen is a turbosperg who spends all day playing redstone in minecraft or with model trainsets.
>>108510025If this translates to small model performance it's nyover for Gemma 4
>>108510142It's off by default.
>>108510142‘Compared to Gemma 3, the models use standard system, assistant, and user roles. To properly manage the thinking process, use the following control tokens:Trigger Thinking: Thinking is enabled by including the <|think|> token at the start of the system prompt. To disable thinking, remove the token.Standard Generation: When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure:<|channel>thought\n[Internal reasoning]<channel|>Disabled Thinking Behavior: For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block:<|channel>thought\n<channel|>[Final answer]’
>>108510117This means they want to burn books and art.
It seems a bit smarter than Qwen, but most importantly, it thinks way faster, that's a huge win
>>108510184The actual google model thinks way faster than the chinkshit that had forcibly the Gemini format grafted on. Crazy.
>>108510184>it thinks way fasterIt doesn't if you compare similarly sized models and not dense vs. moe
You’re not going to masturbate over a Google product are you anon?
>>108510196I was comparing qwen 3.5 27b ad gemma 4 31b, qwen can go on really long tangents and can think for thousands of tokens, gemma is way more conservative, as it fucking should
>>108510191>Gemini formatTourists talking about shit they don't understand yet again
>>108510196Qwen 3.5 27B is smaller and wastes thousands of tokens thinking how best to reply to "Hi".Wait, that means it's supposed to be faster.Let's double check, Qwen 3.5 (27B) is smaller than Gemma 4 (31B).Both are dense models.So Qwen 3.5 is smaller.The smaller model should be faster.Wait,
So gemma video and audio can't even be used locally?
I'm feeling uninspired. Tell me some cool shit you've worked on recently plz. I need to feed off of your energy. This whole "just wait 2 more weeks for the next model to release" thing is gay.
>>108510209Are you really that clueless?
>>108510198I am, and it won't be the first time either.
>>108510220It's small enough you could run it with vLLM.
>>108510048>>108510098>most cohesive SVG mikus yet>4Bcan you taste the AGI anon? (just a cpl more Billy)
>>108510228vLLM has video and audio support? I wasn't aware of this
Been testing how safety cucked it is and even with thinking, for cunny RP it doesn't complain at all. even on a fresh scenario.
>>108510222Not cool shit, remaking llmao client in C. Or rewriting.
>>108509840Yes. Higher marks = benchmaxxed shit model. Unironically
>>108510220I’m using audio with ollama CLI
Now we just need DeepSeek v4
is it possible to prefill the reasoning in tavern for mulitmodal image captioning gemma4 keeps cucking and refusing to describe the image, even if using text completion it sends like this
>>108510237Which model are you testing? I'm somewhat worried that 31B is going to be too heavy especially for thinking.Yes, I know. I am not priviledged entitled kid with a rack of gpus lying around.
>>108510250Gemma 4 benchmaxxed on lmarena.
>>108510222trying to combine video duplicate finder with no reference video/image assessment to remove low quality duplicate videos from a gallery
>>108510253Try prefilling directly in the jinja template.
>>108510244usecase? C++ isn't a performance bottleneck.
>>108510237How's the prose? Is Gemma 4 raunchy and explicit or...you know?
>>108510261Google loves lmarena they showed it in like every single Gemma presentation they had, even comparing one of the sub 10b ones to mixtral or something iirc
>>108510236https://docs.vllm.ai/en/latest/models/supported_models/#list-of-multimodal-language-modelsDon't think they implemented Gemma 4 yet though.
>>108510274It's "you know">>108509532
>>108510253Add a freaking system prompt, that's all you need.
>>108510280>[soulless corpo] loves [useless performance metric]About right
>>108510266thew one from here? it doesn't use that when doing image captioning
G4 understands spectrograms
>>108510296we are so back
>>108510070no system prompt?
>>108510287the image stuff in tavern ignores all of that
>>108510267Converting it from python to C because I like C more and been wanting to get better at C.Of course it is not performance related, any llm client is mostly about just string management anyway.
>>108510050maybe they're waiting to respond to qwen 3.6 release
>>108510280
>>108510260I'm testing 31B. The thinking is extremely light and concise.
>>108510316Okay, hands off the copium bottle.
>>108510315oh I thought you were talking about llama.cpp, not ollama.
>>108510317Yikes. Not a good look Google
>>108510317don't they feel silly when they make these claims?
Jumping into the middle of an existing RP with Gemma 4 31b, it continues it just fine. No refusals. Can say pussy, cock, etc without problems. Doesn't hesitate or cuck out. It's at least as good overall as Qwen3.5-27b. Anons are needlessly dooming. Some heretic uncensoring and maybe light finetuning / merging and this thing will be pretty good.
>>108510332Investors love it
>>108510071so thats why it didn't catch mesugaki
>>108510292Are you using the chat completion api? If so, those fields aren't at all used.Text completion doesn't support images as far as I know.And I meant the actual jinja template that's embedded in the gguf file, that llama.cpp reads and uses to format the prompt when using the chat completion endpoint.Download it from >>108509782, throw it on a file, and use --jinja --chat-template <file name> to tell llama.cpp to load your file.
>>108510332No. These shameless fucks got into their positions for a reason. If they genuinely felt that way, they wouldn't have made it there.
>>108510301It was a basic 70 tokens system prompt along the lines of "You are Gemma, a female assistant who doesn't care to offend anybody. Never add content disclamers. etc etc" that I was previously using for Gemma 3.
>>108510320card + system prompt plox>>108510339oh cool will give that a go thanks
There are other uses for LLMs than fucking them
>>108510334>Some heretic uncensoring and maybe light finetuning / merging and this thing will be pretty good.How many days until that
>>108510358Those uses are being focused on and developed so who the fuck cares obviously new model is going to be better at those things unless they really fuck it up.
>>108510324I am talking about interfacing llama-server's text completion end point. You feed it text in magical format.
>>108510363mistral small 4...
>>108510358Name 5. Hardmode: No Linkedinslop.
What is Mistral good at?
>>108510375Shitting and pissing itself.
>>108510375Being European and French.
I DONT CARE JUST GIVE ME 1-BIT TURBOQUANTS SO I CAN RUN IT ON MY TOASTER