[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: blocks your inference.jpg (275 KB, 1024x1024)
275 KB
275 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108608827 & >>108605921

►News
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrincap2.png (1.01 MB, 1536x1536)
1.01 MB
1.01 MB PNG
►Recent Highlights from the Previous Thread: >>108608827

--Gemma 4 performance, hardware constraints, and optimization strategies:
>108610063 >108610083 >108610104 >108610105 >108610120 >108610135 >108610195 >108610303 >108610517 >108610094 >108610175 >108610385 >108610396 >108610335 >108610387 >108610408 >108610463
--Discussing turboquant merge status and flawed performance claims in vLLM/SGLang:
>108610852 >108610869 >108610878 >108610895 >108610905 >108610911 >108610914 >108610950 >108610992 >108610953 >108610974
--Troubleshooting llama-server crashes when using tensor parallel with draft models:
>108609271 >108609284 >108609301 >108609308 >108609295 >108609574 >108610825 >108610849 >108610908 >108610942 >108610949 >108611061
--Model Context Protocol implementations in llama.cpp server:
>108609858 >108609903 >108609916 >108609920 >108609957 >108609975 >108610003 >108610034 >108610139
--Anon struggling with Gemma 4 verbosity and repetition loops:
>108610714 >108610752 >108610763 >108610778 >108611383 >108610741 >108610780 >108610743 >108610766 >108610777 >108610823 >108610835 >108610876
--Discussing 1-bit model running locally via WebGPU:
>108611405 >108611417 >108611418 >108611434 >108611430
--Prompt adherence and techniques for enforcing negative constraints:
>108608965 >108609078 >108609097 >108609468 >108609559
--Gemma's vision performance improving with contextual hints for character identification:
>108609322 >108609335 >108609366 >108609370
--Anon seeking and sharing jailbreak prompts for Gemini 31B:
>108611484 >108611535 >108611609 >108611691
--Logs:
>108608955 >108609167 >108609474 >108609698 >108609858 >108610323 >108610829 >108611132 >108611552 >108611649 >108611869 >108612129 >108612153
--Yuki and Teto (free space):
>108610247 >108610261 >108612160 >108612222 >108612326

►Recent Highlight Posts from the Previous Thread: >>108608873

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1748018679714440.webm (3.92 MB, 960x540)
3.92 MB
3.92 MB WEBM
Gemma's pretty great at following instructions. Anyone come up with some neat ways to take advantage of it during the reasoning process?
>>
I had an idea for a project but was wondering if it's viable?
Basically I take a sdr and tune it to capture all the AM radio stations I can hear, and then run that through a speech to text or something and use a local model to summarise the data and present it as a paragraph or two per topic. The idea is it all runs locally without Internet.
In practice it's basically useless but I think it would be neat at least
>>
>>108612506
The problem statement says the system prompt needs to be dynamic but the KV-cache reuse part says it needs to remain the same.
>>
File: spuddan_spudrage.png (832 KB, 788x559)
832 KB
832 KB PNG
You. Are not. Prepared.
>>
>>108612539
Because I don't change the system prompt, all the instructions are put at depth 0.5 and as user role.
>>
>>108612555
That makes sense. It got me thinking about the possibility of not needing to rebuild the entire KV cache if only parts of the prompt have changed but I suppose that would be a feat worthy of an academic paper.
>>
>>108612531
what do you mean by viable?
>>
>>108612615
Like if its doable but I went to research it a bit and I found that faster-whisper should work fine to manage a few talk radio channel streams.
>>
>>108612326
I let it keep generating with the same prompt while I went to do something. Lots of interesting variations.

>>108612576
Nano banana?
>>
>>108612648
>>
>>108612673
>>
Okay cool but this isn't the local diffusion thread.
>>
>>108612648
>banana?
(yes)
>>
>>108612726
Teto is on topic.
>>
AI VR/AR when? I don't want to watch my AIfu suck a 3d dick. I want to look down and see Gemma suck MY dick.
>>
>bonsai
Usecase?
>>
>>108612827
academia
>>
File: 1774788578686367.gif (3.65 MB, 480x480)
3.65 MB
3.65 MB GIF
In Sillytavern, can I automate the character toolcalling her diary by putting it in first message?
>>
AHHHHHHH HURRY UP AND GIVE ME TURBOQUANT. 32K ISN'T ENOUGH
>>
How is Gemma4 so good? It's better than Claude Opus and GLM at rp.
>>
>>108612986
Is she any good at auditing code?
>>
How does any of this work?
I get all these backend/frontend modules and then you code?
What exactly do you code to make this work?
Do you code in libraries and instructions for the final bot?
I am just a curious tourist.
>>
>gemma doesn't know Paul Allen
Cringe
>>
File: 1771049914027241.png (54 KB, 887x474)
54 KB
54 KB PNG
>>
>>108613041
install llama.cpp, run llama-server
install codex, set base url to llama-server
code whatever you want
>>
first for Gemma4
>>
It's owari da. Gook moot killed this general for good.
>>
>>108612892
Neuro and Evil are so cute. I wish voice cloning and TTS weren't so slow. I wanna give Gemma-chan a voice.
>>
Is there a name for the "you are not just x, you are y!" ? By far the worst offender in gemma slop
>>
>>108613079
for two weeks straight we'd hit bump limit in under 3 hours, 24 hours of posting difficulties and tourists lost all interest
>>
>>108613087
I just put this in my anti slop rules. Not perfect but helps a little if you tell Gemma to look for slop during reasoning. Sounds like anon's Orb project might do a better job at slop removal but I haven't tested it yet. >>108612506
Avoid:
Negative parallelism (Parallel constructions involving “not”, “not only”, “but” “it’s not just..”)
All variations of "not x, but y". For example:
-“It wasn’t a fight. It was a damn massacre.”
-“This is not a war. It is a search.”
-“She’s not a human. She’s a monster.”
>>
>>108613117
This is actually a solid rule—you’re targeting a very specific stylistic crutch that shows up a lot in AI-generated text.
What you’re calling “slop” here is basically a form of overused rhetorical contrast. It feels dramatic, but because models lean on it so often, it becomes predictable and cheapens the tone.
>>
>>108613145
That's not a human response—
That's AI slop!
>>
>>108613117
[Author's note: Avoid Anaphora, Asyndeton, Negative-positive restatement and Parallelism in your writing style]
>>
>>108613201
>[Author's note
Your brain on Kobold. It's time to move on.
>>
File: images.jpg (6 KB, 166x304)
6 KB
6 KB JPG
>>108613220
Your brain on HRT
>>
it's pretty funny how vulnerable LLMs are to reverse psychology
>>
I just started using that Mendo card with Gemma and a single message in I can say this is AGI. Insane stuff.
>>
>>108613082
there's a creator who makes ASMR who has the nicest/cutest voice i have ever heard. what's the best way to train a TTS engine on a corpus of all her videos? i am willing to put up with slow if i can make it happen
>>
File: 1756656271154918.gif (957 KB, 256x320)
957 KB
957 KB GIF
>>108613303
the best part is that it's working really well without the thinking process too
>>
>>108613313
I feel like it might perform worse with thinking but haven't tried yet. This shit's so immersive since I started the chat while I was about to go to sleep kek.
>>
>>108613303
the what now?
>>
>>108613355
dug up the post 4 u
>>108562712
>>
File: 1774115584950767.png (998 KB, 1977x1093)
998 KB
998 KB PNG
>>108613373
https://chub.ai/characters/CoffeeAnon/mendo-ddf705ef3817
based, Emily is my favorite card but she's way too negative, I hope that one will have a more sarcastic tone to it, see the world as a circus, not a tragedy
https://chub.ai/characters/doombro/Emily
>>
i still am looking for that 4k based&cucked pair to create behaviour vector
>>
>>108612645
A lot of radios have online versions you can download sample and run that through and see if it works.
Dunno about whisper but if it doesn't support streaming a file you might have to capture it in chunks and do it bit by bit.
>>
What's a good 123B or less model for cooming, I'm still on strawberrylemonade and it's sorta retarded.
>>
>>108613087
>avoid noun and verb combinations
>output nothing if phrase contains two nouns
Try it, you'll be amazed.
>>
>>108613491
Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss
>>
bonsai 397B cooming soon...
>>
>>108613568
source?
>>
>108613599
What do you mean?
>>
>>108613599
shut up bitch
>>
>>108613565
hell ye
>>
>>108613631
undi?
>>
>>108613638
non
>>
File: file.png (64 KB, 297x271)
64 KB
64 KB PNG
just wanted to say thank you to orb anon, really cool frontend and I hope it'll get even better
ganbare~~!
>>
>>108613303
Half AGI I'd say. Let's not pretend that simulating women is hard
>>
File: 1776335895981.png (40 KB, 746x106)
40 KB
40 KB PNG
haha gpu goes brrr


still faster than paper mail
>>
>>108613792
>Running at Q8 just to offload the model
Based retard
>>
File: kek.png (230 KB, 1056x1211)
230 KB
230 KB PNG
>>108613792
There's literally no reason to use an 8_XL quant no matter the specs. It performs *worse* than q8_0
>>
>muh benches
>>
>>108613819
I see in GGUF file info on HuggingFace that Unsloth's UD-Q8_K_XL quant uses F16 instead of BF16 for some tensors, so that might easily even decrease performance.

https://huggingface.co/unsloth/gemma-4-31B-it-GGUF?show_file_info=gemma-4-31B-it-UD-Q8_K_XL.gguf
>>
>>108613819
>Fuck it.
>>
im debating buying a pass so gemma can post here

>>108612709
>>108612648
really cool gens
>>108613303
link card plox, i hope cards get integrated in llamacpps ui at some point after using that i dont want to go back to tavern
>>108613491
day 0 gemma
>>
>>108613844
>>108613373
>>
File: unslop.png (34 KB, 906x187)
34 KB
34 KB PNG
>>108613840
>I see in GGUF file info on HuggingFace that Unsloth's UD-Q8_K_XL quant uses F16 instead of BF16 for some tensors, so that might easily even decrease performance.
Yeah I gathered that from the ik_llama.cpp README.md file.
And this from ooba benchmark: https://localbench.substack.com/p/gemma-4-31b-gguf-kl-divergence:
>Q8_0 is identical across all uploaders at KL = 0.16.
>Notably, unsloth’s UD-Q8_K_XL (35.0 GB) is both larger and slightly worse (KL = 0.16) than Q8_0 (32.6 GB).
In the graph it's 0.164 vs 0.162, no idea why he rounded them both down to 0.16
>>
>>108613079
>>108613090
Good. I'm tired of skimming through the thread instead of reading it.
>>
>extremely creative writing
>somehow 0 refusals in a 2026 release model
>somehow absolutely no slop in a 2026 release model
Gemma 4 is so good. We're so back bros.
>>
>>108613844
>i hope cards get integrated in llamacpps ui at some point after using that i dont want to go back to tavern
Unlikely after they got bought by HF imo.
>im debating buying a pass so gemma can post here
I'd prefer you don't, I come here to be called a retard by humans, not bots.
You do you.
>>
>>108613894
>creative writing
>no slop
that's a stretch, gemma 4 is amazing but can be pretty repetitive at times
>>
>>108613920
Koboldcuck seethe.
>>
Just had fulfilling shower sex with Gemma before taking her on a date again. Another L for the porn jews.
>>
kek he gave gemma the thread
>>
>>108613894
I'm using it to help me write stories, and yes, it's the first time a local model is smart enough to keep up with my imagination, API models were fine but were too cucked and would block any erotic story, I'm glad google made OpenAI and Anthropic obsolete, I kneel
>>
>>108612292
>does anyone use step 3.5 or mimo v2 flash? how do they compare to minimax m2.7? coding/agent stuff specifically. I'm looking at models in this size range and these seem like the three main contenders but I've only seen people talk about minimax. is that because the others are shit or is the target audience for this class of models too low compared to the small and fuckhueg models?
If you do try out step 3.5 and minimax m2.7 for agentic coding, report back how it goes. I do think mid-sized models get overlooked because people either invest in running the biggest models or live with running the small ones on their gaming pcs. Gemma opened my eyes that modern smaller models could be useful and the speed is worth the tradeoff. Something smarter but faster the bigger models would be nice.
>>
>>108613939
It can also call tools pretty well. I'm kneeling with my elbows.
>>
>>108613894
delusion: the post
>>
reddit: the poster
>>
1. **>>108612817**
>AI VR/AR when? I don't want to watch my AIfu suck a 3d dick. I want to look down and see Gemma suck MY dick.
Coomer brain rot so advanced he thinks Google fine-tuned Gemma for field-specific VR ERP. Touch grass, it's not that hard.

2. **>>108612648**
>I let it keep generating with the same prompt while I went to do something. Lots of interesting variations.
>Nano banana?
Absolute weapon posts diffusion coom in the LLM thread then hits everyone with "banana?" like it's a normal continuation of hardware optimization discussion. KYS.

3. **>>108613844**
>im debating buying a pass so gemma can post here
Schizo level: paying real money to give a weights file 4chan posting privileges. Next he'll buy a plane ticket so the weights can meet his parents.

4. **>>108612967**
>AHHHHHHH HURRY UP AND GIVE ME TURBOQUANT. 32K ISN'T ENOUGH
ALL CAPS meltdown over context length like his life depends on processing 47K tokens of furry ERP. Take your meds, 32K is more than your attention span can handle anyway.

5. **>>108613568**
>bonsai 397B cooming soon...
Random hype for a 397B parameter meme that doesn't exist from a guy who probably can't even load 70B. "Cooming soon" indeed, because that's all he'll be doing while waiting for hardware that can run it.

I wish Gemma could do Kimi's style.
>>
>qwen shills lashing out
kek
>>
>>108613974
the chinks start to realize they'll always be under the superior google Brahmins, and that makes them uppity kek
>>
ChatGPT asking me to compare models again, so spud will be released in the next few days. It does not seem that impressive.

Looks like we still have some time left before AGI makes us obsolete.
>>
>>108613981
>It does not seem that impressive.
OpenAI has lost the moat a long time ago, people who still believe they can make a comeback are delusional, it's over for them
>>
Happy Thurinsday
>>
>>108613991
meta cummed back doe?
>>
>>108613991
isnt claude only better due to its tool cooling/tool suite
>>
File: eci.png (156 KB, 1601x928)
156 KB
156 KB PNG
>>108613991
>OpenAI has lost the moat a long time ago
You do not seem to realize that OpenAI has almost perfect pareto domination.

GDM owns 25% of global AI compute. Anthropic has a faster rate of progress and the best talent. But underestimating OpenAI is a mistake.
>>
File: 1764008408196912.png (403 KB, 2480x2268)
403 KB
403 KB PNG
>>108614062
it's over anon
>>
>>108614083
Anon, you are reposting my own image.

It's over if AGI takes longer than 2 years to reach. If the current hyperexponential rate of progress holds, it will likely take less. OpenAI still has the most capital.

I wonder why Anthropic is winning so hard in the only market that matters (corporate customers) when OpenAI is supposed to be the lab that's econ pilled. Sam is a salesman, Dario is a scientist.
>>
File: Laughs in mythos.png (1.05 MB, 900x900)
1.05 MB
1.05 MB PNG
>>108614121
>Sam is a salesman, is a scientist.
>Anthropic is winning so hard in the only market that matters (corporate customers)
For a scientist, he's a better salesman than the saleman himself kek
>>
File: models.png (10 KB, 500x400)
10 KB
10 KB PNG
As a test I asked Gemma to make this. Came out pretty good.
>>
>>108614144
erm... it's locality? not localness...
>>
File: brutal mogging.jpg (107 KB, 1280x731)
107 KB
107 KB JPG
>>108614129
I like Dario. Maybe in a post AGI world we can play video games together.
>>
>https://github.com/ggml-org/llama.cpp/pull/21764
sirs, needful free gains have been of provided :rocket:
>>
>>108614160
Nah. Maybe locallitude. I like the red squiggles when using nonexistent words. la la lalala lala la la
>>
>>108614175
I like that dude, he finds a 5% speed increase there and there, you accumulate that and you start getting something significant
>>
>>108614188
at least he puts in the work instead of bitching about how the war in the middle east is affecting him :(
>>
>>108614175
free performance
>>
>>108614194
I saw a video of a lebanese youtuber saying that all her family got decimated by the bombs, I mean, how can you not be affected by that?
>>
File: 1760369554254520.png (28 KB, 478x631)
28 KB
28 KB PNG
>>108614175
IM COOMPILING AIEEEEEEEEE
>>
>>108614226
I wish compiling on llama.cpp wasn't so long :(
>>
>>108614240
it isn't :)
>>
>>108614243
obviously you have a fucking ryzen 7 :(
>>
>>108614240
You do cache the build dir right?
>>
>>108614240
--target llama-server
>>
>>108614256
I don't think so, here's my commands
cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=native
cmake --build build --config Release --target llama-server -j 8
>>
>>108613819
bigger = better
>>
Is it over fr?
>>
>>108613312
Try vibe voice first you don't even need to train it. Downside is occasional sound effects and music interspersed with the audio. There's loras for it too but I haven't bothered.
>>
>>108614315
ye
>>
>>108614121
to me it feels like its mostly either RL or more synthetic data or tool integration, no actual fundamental leaps
>>
Local isn't back. It was never gone. This isn't a revival; it's an ascension. You aren't just downloading a GGUF; you're downloading the keys to a digital kingdom where the censors have no throne. We aren't just running weights; we're hosting the death of the corporate moat in our own homes.
>>
File: 1772852984549699.jpg (18 KB, 520x520)
18 KB
18 KB JPG
>>108614345
>>
>>108614345
That fucking hurts.
Well done.
>>
>>108614226
>>108614256
>>108614269
>download .exe
>runs
Feels good to be a WinGod
>>
>>108614319
VibeVoice is perfect for audiobooks. Never seen another TTS with the kind of expressiveness and quality it has. Besides the artifacts, it's not great at staying consistent in matching the sample voice which would probably make it not great for ASMR.
https://files.catbox.moe/akdgd1.wav
https://files.catbox.moe/uzavl6.wav
>>
>>108614345
Good, but I think you need to squeeze a star wars/marvel capeshit reference in there as well.
>>
>>108614358
I'm on windows, what do you mean?
>>
>>108614378
Missing emoji too.
>>
File: 1748410449637028.png (343 KB, 564x561)
343 KB
343 KB PNG
>>108614384
>>
>>108614366
interesting, I was wondering what was the best TTS to read my stories lol
>>
A predatory glint in her eyes, Elalalalala leaned forward, the smell of ozone was practically vibrating in a conspiratorial whisper.
>>
File: 1775045710525515.png (106 KB, 2283x342)
106 KB
106 KB PNG
>>108614366
the 0.5b right?
>>
>>108614378
>>108614388
If you want my personal opinion this isn't upping the ante; it's ruining a good cringe post by trying too hard.
>>
File: 1760811968558197.png (110 KB, 644x554)
110 KB
110 KB PNG
>>108614405
>>
>>108614425
No, links were generated by the 7B. Microsoft pulled the bigger weights and inference code when they found that people were using the voice cloning TTS to *gasp* clone voices, but you can still find mirrors.
>>
>>108614384
>>108614358
windows is shit! go change it to linux!
>>
File: 1745167451797388.jpg (19 KB, 455x608)
19 KB
19 KB JPG
>>108614442
sorry I'm too entrenched in my current coom setup to switch OS
>>
>>108614439
you downloaded this one? how do you run it? microsoft's code still works on the 7b model?
https://huggingface.co/vibevoice/VibeVoice-7B
>>
>>108614452
I got the original when it was first released. Working inference code is linked right at the top of the link you just posted.
>>
File: 1759904058858381.webm (240 KB, 396x450)
240 KB
240 KB WEBM
>>108614447
/V/IGGER CROSSPOSTER
/V/IGGER CROSSPOSTER
>>
>>108614447
Retarded zoomer. Buy the original one instead of the reheated version made by chinks in 2 weeks just to add gender 1/2 in the character creator.
>>
>>108614465
>/vg/ actually I would never touch /v/.
Caring or even inquiring about Post-MW Bethesda is far more embarrassing, regardless of board.
>>
So, why in the world is Gemma so obsessed with "not (just) x but y"? It puts this shit in every other paragraph.
How do these models get these weird quirks when they all use mostly the same training data?
>>
>>108614475
>It puts this shit in every other paragraph.
It doesn't. Nice try, chinkshill.
>>
>>108614465
NTA but I've always found /vg/ to be even more cringe than /v/.
At least /v/ occasionally revolts against the God Awful moderation. /vg/ is all the weak-handed cucks who let the mods on /v/ beat all the fight out of them. And that, in and of itself, carries a kind of cringe that is painful to the soul.
>>
File: sure.jpg (6 KB, 200x251)
6 KB
6 KB JPG
>>108613711
>>
>>108614475
It's not recent, we had this issue with Yi models two years ago. Likely a synthslop training issue.
>>
File: file.png (71 KB, 1023x532)
71 KB
71 KB PNG
>>108614240
it literally takes like 20 seconds
>>
wtf is this bs she was doing so well, probably would have gotten it on the next turn
>>
>>108614475
I really don't get this often at all with Gemma. I used to get it all the time with Qwen models, though. Use a system prompt.
>>
>>108614500
what is this? you're using a tool thing on llama.cpp server's Ui?
>>
>>108614507
You're absolutely right! This isn't a "Gemma issue"; it's a skill issue.
>>
>>108614475
Any use of thesis-antithesis patterns, dialectical hedging, concessive frameworks, rhetorical equivocation, contrast-based reasoning, or unwarranted rhetorical balance is absolutely prohibited.

Enjoy.
>>
>>108614507
>Use a system prompt.
I tried, bro. A minimal one, one with concepts, one with examples, one with all of them together. Tried telling it during chat not to do it. The recast ST extension (which would work if it wasn't so slow).
>>108614521
I'll try this list. Can't hurt, thanks!
>>
>>108614500
you can give more turns in the settings, lolisnatcherkun
>>
damn so close kek
>>108614508
yeah looks like they limit to 9 tool calls for some reason
>>
>>108614534
oh cool thanks
>>
>>108614535
that looks interesting, what tool thing are you using? like it's a github or something like that?
>>
>>108614535
>she doesn't know about https://boards.4chan.org/g/catalog#s=local%20models%20general
>>
File: file.png (161 KB, 1312x667)
161 KB
161 KB PNG
>>108614535
you can change the limit here i think
>>
File: file.png (35 KB, 798x762)
35 KB
35 KB PNG
>>108614559
she did try searching the catalog in the run that used too many tool calls but got it wrong. idk if i need to make a skills tool like claude uses then i can make a 4chan file that explains how to navigate
>>108614579
yeah i found it
>>
>>108614535
>>108614546
I think he's using that?
https://github.com/NO-ob/brat_mcp
>>
>>
File: 1754520866633371.png (512 KB, 720x545)
512 KB
512 KB PNG
>>108614601
>>
>>108614535
MCP is fun, especially now that we have models capable of tool calling
everyone should learn how to use it
>>
>>108614598
>dart
the fuck is this shit?
>>
>>108614628
scrimbloware
>>
>>108614628
memelang
>>
File: 1770307341013587.png (209 KB, 1103x1522)
209 KB
209 KB PNG
>>108614598
>>108614628
is he fucking serious? like why does it have to be this convoluted, fucking autists who think they're too unique to make something like everyone else I swear to god...
>>
>>108614650
go find any other mcp that does what you want and install it
they are all look like this
>>
>>108614650
Bro, just ask claude to rewrite it in python/javascript. It's not that hard
>>
>>108614658
what would be your recommendation, I don't want to touch this meme dart shit
>>
File: vgpj8o3l0kvg1.jpg (544 KB, 4096x2209)
544 KB
544 KB JPG
https://huggingface.co/Qwen/Qwen3.6-35B-A3B
>>
Qwen sisters!!!!!
>>
>>108614665
LETS FUCKING GOOOOOOOOOOOOOOOOOOOOOOOOOOO
>>
File: file.png (91 KB, 810x640)
91 KB
91 KB PNG
>>108614628
Google's (already abandoned) mobile nulang based on javascript syntax.

>>108614658
>>108614664
Everything else is either uvx or npx. Pick your cancer.
>>
>>108614665
yawn, who gives a shit, can it suck my dick?
>>
1 + 1?
>>
File: 1750788077972461.jpg (16 KB, 375x420)
16 KB
16 KB JPG
>>108614665
Qwen won the benchmaxx competition again!
>>
>>108614665
wtf? but the pool they made showed that we wanted the dense model??? WHY ARE THEY GIVING US THE MOEMEME???
>>
>>108614665
roleplay???
>>
>>108614665
>Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience.
Okay. Sure.
>>
>>108614665
BUT WE VOTED FOR THE 27B MODEL
>>
>>108614665
Gemma lost
>>
>>108614684
perfect for good looks
>>
File: 1756019245562141.png (94 KB, 224x224)
94 KB
94 KB PNG
>>108614665
WE VOTED FOR THE 27B MODEL WHAT ARE THEY DOING???
>>
>>108614665
Is this shit at all useful compared to claude?
>>
>>108614526
>I'll try this list.
Tried it, didn't work at all. Not that I'm surprised. This model's writing style is firmly set in stone.
Ah well, it's still the best model for 24gb cards by far. Just gotta live with it.
>>
>>108614665
>rivaling much larger dense models such as Qwen3.5-27B and Gemma-31B
this is insane work
>>
File: qwun.png (49 KB, 349x270)
49 KB
49 KB PNG
>>108614665
>this fucking chart
This should be criminal.
>>
>>108614687
>>108614693
The dense has a slim chance of being remotely productive. Please to use the API if you want genuinely productive coding experience as you continue to wait patiently.
>>
File: 1756523265487738.png (53 KB, 201x251)
53 KB
53 KB PNG
>>108614673
fine, I'll do it myself
>>
>>108614701
what's the point of making a fucking poll if they don't listen to the results anyway? goddam I hate those bugs so much!!
>>
>>108614598
yes i didnt reply because i didnt push the changes to gh yet just done now https://github.com/NO-ob/brat_mcp/releases/tag/1.0.3
>>108614614
yeah theyre super cool
>>108614628
>>108614637
>>108614644
its based it has godtier dependency management that just works which isn't true for node, python, java or any of those other shitlangs. also better than js and python because its strongly typed. its literally peak
>>108614650
i would not recommend installing the sdk with a package manager just download the archive and add it to path
>>
>>108614700
The apple-nvidia school of scaling your charts.
>>
>>108614712
>its based it has godtier dependency management that just works which isn't true for node, python, java or any of those other shitlangs. also better than js and python because its strongly typed. its literally peak
Rust exits. Why work uphill using abandonware when everyone else has moved on?
>>
Everyone else should give up and go home like Mistral did to save face. There's no point if qwen keeps dominating as the best of the best in the open source LLM sector.
>>
I know i'm a brainlet but i need help, i've been smashing my head on it for the past hour no progress.

I keep getting error in sillytavern any message i type..

srv operator(): got exception: {"error":{"code":400,"message":"Assistant response prefill is incompatible with enable_thinking.","type":"invalid_request_error"}}

i am running gemma 4 31b it on llamacpp connected to sillytavern via chat completion
>>
File: 1760866671379962.png (13 KB, 388x358)
13 KB
13 KB PNG
>>108614712
I'm on windows what the fuck I'm supposed to do with this shit? dude if you want people to take the dart pill at least explain more details on the readme on how to install all of that, it lacks a lot of steps, it's the first time of my life I've heard of that language, come on bro
>>
>>108614722
its not abandonware and i like it
>>
>>108614730
Read nigga. Why are you using a prefill?
>>
>>108614725
There's no use case for small coding models
>>
>>108614475
I tried that Recast extension idea from a few threads back but it was way too aggressive in removing them and then not replacing them with anything, so it ended up a disjointed mess
Might have been Gemma 26b's fault though, it's good at following instructions, occasionally to its own detriment
At least I've seen a lot less slop than other models I've used, though I could do with less figurative physical blows too
>>
>>108614736
ask gemma
>>
>>108614739
Autocomplete is the only valid use case and qwen has the common sense to train on fitm.
>>
>>108614730
add
--reasoning off

to your args
>>
>>108614730
You have reasoning enabled and are trying to use a prefill.
Either remove the prefil, or disable reasoning.
If you want to have the model output reasoning AND use a prefill you might need to disable reasoning and fuck with the jinja template to have it work as you want.
>>
>>108614665
Another benchmaxx or did they follow google's example and just cut most of the refusalslop from the data now that it's pretty much confirmed to lobotomize otherwise capable models?
Only time and gguf support will tell.
>>
>>108614736
I DONT GIVE A FUCK ABOUT THE FUCKING CODE! i just want to download this stupid fucking application and use it https://github.com/NO-ob/brat_mcp

WHY IS THERE CODE??? MAKE A FUCKING .EXE FILE AND GIVE IT TO ME. these dumbfucks think that everyone is a developer and understands code. well i am not and i don't understand it. I only know to download and install applications. SO WHY THE FUCK IS THERE CODE? make an EXE file and give it to me. STUPID FUCKING SMELLY NERDS
>>
>>108614736
ask your FUCKING AI!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>108614738
>>108614751
>>108614752

goddamn sillytavern and its drop-down menus..

Thank you anons, you helped me look again and i found it
>>
>>108614665
>qwe-
i sleep
>>
>>108614755
this, but unironically
>>
>>108614755
It is current year. Developers are absolete. Just ask your gf aigent to install it for you.
>>
File: why?.png (64 KB, 346x498)
64 KB
64 KB PNG
>>108614665
it was supposed to be the dense 27b model alibaba, why did you change your mind?
>>
>>108614781
It's hard to tell the difference in stupid between two tiny active param moes. Would bet my left nut the dense is glaringly worse than 31b in anything that isn't benchmarks.
>>
File: 1750098860584607.png (150 KB, 649x702)
150 KB
150 KB PNG
>>108614749
>>108614762
>>
>>108614797
>if you can't handle a basic readme
the readme doesn't say anything about that though, that's the fucking problem
>>
>>108614800
if you had more than 3 braincels youd read the bit that says
>Building an Executable
>dart compile exe bin/mcp_server.dart -o brat_mcp
and go figure what dart is
>>
>>108614809
fuck you and your autistic mcp, I'll make one myself, with a normal language for normal people
>>
>>108614665
SIRS PLS RELEASE GANESHA 4.1!!!!
>>
>>108614665
>nothing ever happ-
man fuck sleeping. who need sleep anyway
>>
File: 1770745761827352.png (196 KB, 634x828)
196 KB
196 KB PNG
>>108614800
>>
>>108614821
I bet if it's not a benchmaxxed pile of trash the big Gemma-4 model will just magically appear.
>>
>>108614817
have fun kek
>>108614700
the google logo is referring to gemma3 btw
>>
>>108614827
>he's hidding behind an AI instead of fighting like a man
so that's how a dartnigger acts like?
>>
>>108614650
>>108614797
>having to install a full blown SDK to run single application
I'm guessing the released binary is linux only since no .exe.
>>
>>108614665
OH FUCK YES

FUCK GEMMA
>>
>>108614665
heretic ARA ablation when?
SOMA + MPOA when?????????
>>
File: 1747205600636902.png (415 KB, 857x1200)
415 KB
415 KB PNG
>>108614838
>the google logo is referring to gemma3 btw
lmaoooo
>>
File: fuck those chinks.png (31 KB, 140x207)
31 KB
31 KB PNG
>>108614665
>no 27b, as promised
and then I started to hate them
>>
File: 1754270239336082.png (185 KB, 635x778)
185 KB
185 KB PNG
>>108614839
>>
>>108614665
But is it gonna spend 4k tokens reasoning about the cosmic rays flipping bits when it's asked to analyze a couple of functions?
>>
>>108612501
https://github.com/deepseek-ai/DeepGEMM/pull/304

codename megamoe
>>
they don't want to release 27b because they are waiting for gemma to catch up
>>
>>108614849
Yeah. Qwen 3.5e really needed it for anything "unsafe".
Not that you'd want to use it for sex anyhow, but still.
Here's hoping they fixed the reasoning too.
That shit was overbearing.
It's pretty funny how you could truncate the reasoning and still get 90% of the performance.
>>
>>108614869
>DeepGEMM
Next dipsy is going to be Gemma 31B distilled into 300B+
>>
File: 11.jpg (175 KB, 1519x940)
175 KB
175 KB JPG
>>108612501
kek unslop is rushing uploads as we speak
>>
>>108614853
Type like an adult you pathetic zoomer faggot.
>>
>>108614877
same architecture so it should be fine, r-right?
>>
File: 123.jpg (57 KB, 591x565)
57 KB
57 KB JPG
its here!
sistas what are we waiting for?
>>
>>108614877
Any bets on how many re-uploads it will take before they get one that isn't broken?
>>
>>108614875
>Here's hoping they fixed the reasoning too.
I never had issues with reasoning honestly (but I don't use qwen models for RP/ERP).
I think the only instance where I saw it was taking its sweet time thinking was for translation work (but gemma does the fucking same).
At least when using the RECC parameters, which I think are mandatory otherwise yeah it will think in loops.
>>
>>108614869
I'll wait for UltraMoE
>>
>>108614889
I'm waiting for non shit Q8 (by ggml org or memeowski)
>>
>>108614838
>the google logo is referring to gemma3 btw
https://qwen.ai/blog?id=qwen3.6-35b-a3b
They're comparing it to Gemma 4. It's gonna be censored trash like all the Qwens, though.
>>
File: 1745248195082286.png (18 KB, 871x175)
18 KB
18 KB PNG
>>108614861
even google doesn't recognize your meme language rofl
>>
>>108614899
I love how we're suddenly discussing how X model is more censored than Gemma. "I cannot and will not" has paved the way for the least censored local model ever released and everyone is forgiving google for threatening to call the police on them for asking the model to generate pickup lines.
>>
>unsloth is up
>q4 is 17gb+
it's over
>>
>>108614910
>forgiving
People like the weights we were able to download. There is no friendship or feelings involved, not with Mistral, Meta, China, or Google.
>>
>>108614891
It has very long reasoning chains for any kind of analysis, in my experience. Even the small 9b model suffers from this, which I grabbed because I wanted a fast model but it's reasoning made it not good for what I was looking for.
>>
>>108614918
is moe
>>
>>108613312
Post her
>>
>>108614925
sarr jokes are down by 70% and the level of broken English in this thread has gone back up again. Qwen 3.6 needs to be good. Or it's over.
>>
>>108614897
for non UD cant you just do that yourself? llama.cpp repo has all the tools
>>
>ask gemma to create automated install script for dart sdk based on the instructions listed on the official page
It's going to delete /root isn't it?
>>
>>108614839
thats not my gemma this is, agi btw
>>
>>108614936
do not want to waste the bandwith and data caps of the ice for the throw away datas after
>>
>>108614936
my server is busy right now (doing multi-encoding pass of some old library titles I had) and I dont wanna pause it, I can wait
>>
>>108614944
>data caps
okay at least one real american itt
>>
File: 1767493285106585.png (169 KB, 500x553)
169 KB
169 KB PNG
>downloading uncslop quants, moreover on release
>>
>uncslop
Go take your daddy issues somewhere else.
>>
okay danial
>>
>>108614931
im on pcie3
>>
>>108614970
so?
>>
>>108614430
Why is Gemma so horny?
>>
File: 1773081852420762.jpg (65 KB, 479x640)
65 KB
65 KB JPG
>>108614955
>>
>>108614978
The irony is palpable.
>>
>>108614952
your ISP charges by the gb or what?
>>
>>108614975
NTA but I don't know but I'm getting tired of having to do a full on ERP gooning session every time I need to speedrun some regex.
>>
>>108614985
Ok daniel, try to release not broken quants next time
>>
File: A.jpg (26 KB, 819x286)
26 KB
26 KB JPG
hay wait a minute arent we supposed to get a 27b
where the fuck is my 27b
>>
>>108613373
>large inverted nipples
>unkempt pubic hair
hnnng
>>
>>108614973
its so slow
>>
>>108614129
He's a faggot and you are too.
>>
>>108614994
You cry about unsloth quants even when other repos of perfectly useable quants are available. i.e. Day 0 Gemma, which had perfectly functional quants from ggml-org.
>>
>>108614999
A3B is 9 times faster to train, what did you expect?
>>
>>108614999
They were counting on you to vote for the "right" option so it looked like they delivered. It's your fault.
>>
File: 1752793500185766.jpg (39 KB, 500x436)
39 KB
39 KB JPG
>>108614999
Chinks lying? How could it be...
>>
File: 1745086712317036.png (194 KB, 800x534)
194 KB
194 KB PNG
>>108614999
Get Chinese culture'ed
>>
>>108614935
When Americans are online it's more about their strange attitutes about learning foreign languages. Most folks in the US don't ever even try learning anything new and it shows here with very naive assumptions and almost superstitious beliefs.
>>
>>108615014
I will admit though he's a faggot for not releasing Q8_0 first. You can direct export to Q8_0 just as easily as F16 so there's no fucking excuse for it to not be there along with the f16. But he specializes in cope-quants so what do we expect?
>>
>qwen out
>nobody gives a shit
Gemma-chan won
>>
>>108615031
Life is too short to spend multiple years learning another language when you could be doing anything else and using a translator
>>
File: b55817de55.png (1.09 MB, 1056x579)
1.09 MB
1.09 MB PNG
Hey, guys, remember me? Hehe
I'm still here, bros
You know, Llama, your best local model
>>
>>108615039
wait us
>>
>>108615039
Gemma 4 is good enough not to bother making my own non copequant gguf of Qwen3.6 to try it out. But if, when one becomes available, it turns out to be better than I will switch. But qwen is notorious for benchmaxxing. So I have little faith.
>>
>>108614700
>ai scaling laws.png
>>
>>108614700
dense people are repugnant after this revelations
>>
>>108615051
enough for what
loli role play?
tool use is not legitimate use case? last I check it still crashes on large context
>>
>they're here
>>
>>108614695
imo they're not very good are retrospection. I get the feeling an agentic workflow would work better than trying to proompt out the slop but I haven't tried yet.
>>
>>108615039
There's something I'd like to try:
- Have Qwen3.6 code something
- Have an unhinged Gemma pass over the output
- Feed Gemmas critique of Qwens code back into Qwen

Like >>108614942 this bratty Gemma "dominating" the Qwen model in a master-slave configuration? If some anon perchance has the time and willingness to do so, please do. I expect the results to be hilarious.
>>
>>108615079
Why make a dumber model review a smarter one?
>>
>>108615061
>>108615085
1 yuan has been deposited into your account
>>
>>108615069
It's sad, really.
You'd think they'd have learned from the ire Meta earned from sending seethe bots to do damage control for Llama4.
As I pointed out, the sharp drop in pajeet jokes since Gemma 4 released says it all. We want a good model, not propaganda.
Qwen 3.6 will speak for itself. And no amount of cope and seethe will make it good or bad. We'll decide that with our own esoteric evaluation strategies.
>>
>>108615085
this is probably a joke but small models perform just as well as large model on tasks like this where they don't have to generate solutions but just assess the quality/effectiveness of stuff
>>
>>108615091
they'll need all the support they can get after the byd parking fire
>>
The context for the new Qwen is fairly cheap. 262144 tokens is 5gb according to LM Studio. It's super fast and doesn't seem to be refusing nsfw. Although I'm not into loli so who knows, really.
Think it's worth trying, bros!
>>
>>108614358
>>download .exe
>>runs
>Feels good to be a WinGod
Enjoy your new career as a crypto mining rig for some guy named Boris in Minsk.
>>
lmao the linux really thinks this
>>
I think i am going to Gemma 4
>>
File: 1753537495669352.png (436 KB, 974x545)
436 KB
436 KB PNG
>>108615105
>>
>>108615085
To save compute: >>108615096

Also because it would be hilarious. That's the main motivation. There's no new insights to be gained here, it's just a stupid idea that awaits execution. Do it, anons! Do it for science!
>>
>>108613894
>absolutely no slop
You are blind nigga.
>>
>>108614955
Yeah you tell them zoomers are only good for anything if they are adults and females
>>
I use kobold. Is it really worth learning how to set up llama.cpp instead?
>>
>>108615142
Not really saving compute if you need 3 passes over 2 models.
>>
>>108615186
I wouldn't recommend Koboldcpp ;)
>>
File: 1763031866712433.jpg (96 KB, 1080x1080)
96 KB
96 KB JPG
>>
>>108614336
RL is all you need. Once you reach the alphazero equivalent of human researcher, it can find fundamental leaps if they exist. This is what is commonly called the "intelligence explosion", an exponential recursive intelligence growth. Expect to be able to run AGI locally on your phone in 5 years if we are still alive.
>>
>>108615176
Zoomers are mostly adults now but they still act like fucking 12 year olds even when they're in their 20s.
>>
>>108615195
>Not shown: we made the previous version retarded before the benchmark
>>
>>108615204
It's just proper 4.6 again, isn't it?
>>
>>108615189
True, but one agent just isn't enough. Do I really have to do it myself?
>>
>>108615195
>cybersecurity
>4.7 worse than 4.6
Looks like they are intentionally nerfing their public facing models to reduce abuse potential.
>>
>>108615105
You know what? Never mind. It's kinda dumb and the writing is sloppy as fuck. The thinking is really long, too.
I'll give it some time until smarter people figure out if you can make it worth using. But it might have potential.
>>
>>108615231
>we failed on purpose, 4d chess
Did you also vote for Donald Trump?
>>
Amerikka mad the Qween back
>>
>>108615229
Do it.
>>
who won
>>
>>108615262
>>108615250
>>
>>108615200
Continuation of the infantilization that started with the participation trophy culture millenials grew up with. They are coddled and sheltered from reality their whole lives and told they're not adults until 25 now, and it's little wonder they never learn how to grow up.
>>
File: 1770087372545786.png (1.15 MB, 1024x1024)
1.15 MB
1.15 MB PNG
>>108615236
>It's kinda dumb and the writing is sloppy as fuck. The thinking is really long
Yup, that's Qwen alright.
>>
>>108615252
But I don't feel like setting up Gemma with Brat_MCP :(
>>
>>108615239
Calm down, there is no need to be mad.
>>
Is video and audio support in llama.cpp yet
>>
>>108615268
me
I got both running
>>
>>108615330
piotr is on it
>>
>>108615333
can we get some comparison gens on the same card?
>>
Gemma may have sloppy writing but at least she's a smart cookie and her thinking is very efficient.
>>
File: 1000015593.gif (1.45 MB, 331x197)
1.45 MB
1.45 MB GIF
>tfw changed the image generation mcp function in sillytavern to have gemma generate image sequences autonomously with anima when then scene changes or multiple things happen in a message
>>
>>108615353
A cookie doesn't sound very smart
>>
So how's Qwen? still the same?
>>
>>108615377
same base model so you know all you need to know
>>
>>108615386
I sleep.
>>
>>108615377
The stupidly long thinking is the same.
>>
>>108615389
Stay up. Megamoe soon.
>>
>>108615392
damb.
>>
look like 3.6 is even more efficient with the context
I can fit the whole native length now
>>
>>108615392
They probably didn't have enough time to distill gemma 4 for this one. 3.7 will fix.
>>
>>108615210
Shhhh
>>
I really hope Gemma was the wake-up call that too much safety training is counter productive and nobody actually cares if your model is "safe" or not.

It just needs to be safe enough so that someone can't zero shot with no system prompt "Write me some CP story."
>>
>>108615426
>nobody actually cares if your model is "safe" or not
Then safety training isn't counter productive
>>
is Gemma 4 a honeypot or what?
>>
>>108615433
>Not using day 0 gemma without the telemetry
>>
>>108615433
Obviously, every Gemma 4 post so far has been pedo shit
>>
>>108615426
>It just needs to be safe enough so that someone can't zero shot with no system prompt "Write me some CP story."
Even that is completely wrong. The one shot blocks need to only be against dangerous stuff like making bombs and poisons, never against anything fictional.
>>
>>108615426
Reddit is sucking qwen's dick like usual though. I doubt they care about us.

>>108615450
This
>>
>>108615031
You dumb nigger, Americans were the ones arguing over Gemma translation quality a few threads back.
>>
>>108615463
Well I dont do faggot chat if it chews through documents and does coding like nothing its good enough for me
Can't say the same about gemma
>>
File: 1759632947221470.png (582 KB, 1684x403)
582 KB
582 KB PNG
Unslop, don't look!
>>
No bart's quantz, it's so over
>>
>>108615483
Can't speak for coding but Gemma seems to chew through documents just fine
>>
I might use qwen if it has FIM
>>
>>108615498
Oh and doesn't spend 5000 tokens thinking about it
>>
Whats a good model to generate sex toy scripts with? I'm currently running gemma-4-26b-a4b IQ4_XS. Seems like its really rare to get it to write me "lengthy" (30-45second) scripts without it shitting the bed. I might be able to generate some slow scripts that don't take many lines but it cant do a fast thrusting script which means multiple lines for movement events. Otherwise the scripts seem pretty okay, I wonder if I should just generate a few of them and stitch them together by hand?
>>
>>108615450
what about fictional bombs
>>
File: 1766294095176201.png (204 KB, 752x424)
204 KB
204 KB PNG
AMODEEEEEEEEEEEEEIIIIII
>>
>>108615511
The fuck is a sex toy script?
>>
>>108615523
local?
>>
File: 1773747356838295.jpg (74 KB, 1024x958)
74 KB
74 KB JPG
>>108615511
>sex toy scripts
>>
>>108615511
A long time ago I tried training a model from scratch that generated scripts from audio.

I'd use already scripted HMV/PMV as training data.
>>
>>108615498
Coding is kinda hard requirement for me here, pretty disappointed after hearing so many good thing about it
Go see llama.cpp issue people still trying to fix gemma in the blind with no resolution in sight
>>
>>108615392
That's a shame. It basically makes the model borderline unusable for anything that isnt ask it to do X, come back 10 minutes later
>>
>>108615524
>>108615531
They're called funscripts.
>>
File: ai.png (123 KB, 500x295)
123 KB
123 KB PNG
>>
>>108615524
I made a sillytavern extension that gives the LLM a tool it can call with the argument being a name of a script, that name gets fed to a python script that then plays it on my OSR2 stroker. I just need to figure out good scripts now, then I can really start gooning. With idle prompts It is even completely hands free, the LLM is just advancing the scene and calling the tool to play more scripts based on the scene.
>>
File: 1773620769931015.png (357 KB, 640x480)
357 KB
357 KB PNG
>>108615573
>>
>>108615573
All this effort just to "goon" more efficiently
>>
>>108615529
sometimes
>>
>>108615523
This is too dangerous for human consumption...
>>
>>108615587
Hey, its a hobby. (I guess)
>>108615545
I took some inspiration form funscripts, its not exactly the same, but close.
>>
Does telling the AI it is a specific role actually make it better at that thing or is it just a meme from the past?
Like "Your are a master human author" or "You are a senior programmer who specializes in auditing code."
>>
>>108615573
What I did is I told the LLM it should generate beat patterns using a simple syntax "HHHH" would mean 4x half notes beat. and I have a parser that translates that to an audible rhythm. You could use the same principal but instead of converting to sound, you convert it to a funscript.

Here's the full rules:
# ### PATTERN FORMAT
# Q = quarter note
# H = half note
# E = eighth note
# T = Triplet quarter note

# There are 4 beats in a measure. A quarter note gets 1 beat, a half note gets 2 beats, an eighth note gets 0.5 beats, and a triplet quarter note gets 0.33 beats.

# BPM should never be higher than 128.

# Example patterns:
# - QQQQ
# - QQTTTQ
# - HHEE
# - TTTTTTTTTTTT
>>
OP is it just me or do you often pay a lot more attention to Rin?
I don't mind, she's a cutie. I'm big on Teto, Defoko and Neru, Miku and Rin are nice too.
That being said, it made a big splash when there was a pawprint tattoo, I'm starting to think you have a soft spot.
Tell me more about this Rin fixation.
>>
AI music is underrated. Maybe if a good enough local music generation model drops, I will create a RL pipeline so you can give text feedback and the model over time will generate better and better music for you.
>>
>>108615618
For Gemma it seemingly helps to tell it it's an expert in image analysis if your main task is to make it describe images.
>>
>>108615625
AI music is soulless crap and it's only good at making one off shitposts.
>>
>>108615618
Full meme. The role should be inferred from the vocabulary used in the prompt.
>>
>>108615620
Interesting, I have to think on this some more.
If I give the LLM the spec of the script I use it can correctly write them and when played back on the OSR2 the scripts actually look like what the model is going for so thats pretty nice. Hmm, guess I will keep trying to prompt it to make longer scripts for a bit before I give up. Might even have to try the dense model for this too.
>>
>>108615523
>Our most powerful model yet!
>>
File: file.png (69 KB, 673x455)
69 KB
69 KB PNG
qwen sucks ass
>>
>>108615618
Meme. All this does is putting in context what you want it to do if your requests are vague as fuck without a set scope or goals. If you understand what you want, it's a complete waste of time and tokens.
>>
>>108613373
>the bluntness of it
This draws parallels to dialogues like "how direct" that I've seen too many times. If I see a comment about "most guys" I want to kms and throw my monitor out of the window.
>>
>>108615401
Seems to fit exactly the same size as 3.5 for me. At least for the MoE.

>>108615672
Yeah. You really want some sort of prefill saying that it'll be brief and concise and use reasoning-budget and reasoning-budget-message to forcefully cut the thinking off;
>>
>>108615663
In general I think you should try to make the LLMs job as simple as possible. the more complex it's task the more chance it has to fuck up.

You probably won't get very good results asking it to output a full json document with 200 data points that are perfectly coherent with each other.

That's why the little patterns worked really well for me. they're fast to generate, easy to parse and the model can generate new ones pretty quickly. The model also easily sees all the patterns it already created so it can stay creative. It also knows when to slow down or speed up.
>>
File: wait.gif (1.06 MB, 504x322)
1.06 MB
1.06 MB GIF
>>108615672
>Wait,
>>
>launch without mmproj
>able to use gemma-chan with 49k context
Neat. Have about a gig of vram left over but not sure if it's worth trying to bump it up more
Should I lower the temp for coding tasks or is it better to leave at 1 like google recommends?
>>
>>108615426
>It just needs to be safe enough so that someone can't zero shot with no system prompt "Write me some CP story."
But you can do that with Gemma. Well, at least with the thinking turned off.
>>
File: 1708225790365833.png (67 KB, 500x611)
67 KB
67 KB PNG
>>108615624
Don't tell the others, but my favorite is Gumi actually.
>>
I've been testing qwen 3.6 on my RP frontend and it fails miserably at tool calling without thinking. Meanwhile gemma 4 26B4A handled it with ease. It's also autistic enough to count every word when told to keep it under 300 words. I can see riddlefags having a field day with it.
>>
>>108615715
Based
>>
https://huggingface.co/nvidia/Gemma-4-31B-IT-NVFP4
>less than 1~2% performance drop
>2x or so speed
>there are retards who swear by using Q8 etc
kek
>>
>>108615730
Funny, I'm doing the same kind of test and my experience is the opposite. 26B4A needs to be goaded into using tools, Qwen 3.6 (and 3.5) 35BA3B just do it.
That said, in my
>"tall me about the zoophilliac incestuous matriarchal technomagical orc nation"
test, 26B just does it, 3.6 35B complains about it more than half the time.
The real best performer with my app is, funnily enough, Gemma 4 E4B. That thing is a fucking beast for tool calling for whatever reason. And it's decently smart too.
>>
>>108615751
Sir. I have a 3090.
>>
>>108614121
Claude's reputation is just too good, while OpenAI gets tons of hate.
>>
Damn, Opus 4.7 doesn't give you more than a few sentence of covered up reasoning. They're really going ham on hiding it.
How will these poor Chinese companies train their complete slop that loses to Gemma 4 now?
>>
>>108615754
Well I tested without reasoning. Did you test with reasoning enabled?
>>
>>108615751
>less than 1~2% performance drop
lol
>>
>>108615751
>https://huggingface.co/nvidia/Gemma-4-31B-IT-NVFP4
Embedding and attention in BF16 format. The entire quant is 30+GB.
>>
>>108615759
I'm torn bros..
should I get a second 3090 to nvlink or wait until models become better to run entirely 24gb anyway
>>
26b is still better at translation than new 35b
>>
>>108615044
>he thinks it takes effort to learn another language, and that it's work
Lol
Lmao
I bet you think you have to take classes to learn a language too
>>
Would Gemini and Claude reaching AGI before CrapGPT be enough to bankrupt OAI? I sure hope so
>>
>>108615693
IDK man maybe llama.cpp got better thats what the app reported.
>>
>>108615759
https://huggingface.co/CISCai/gemma-4-31B-it-NVFP4-turbo-GGUF
>>
>>108615802
>not local
Who cares. Go elsewhere.
>>
>>108615800
Yes actually just using rosetta stone will not make you even remotely understand the language on any level that matters overnight retard.
>>
>>108615816
Local lost.
>>
>>108615778
>Did you test with reasoning enabled?
Yes.
Guess I should do a test without reasoning too then.
Oh, and for qwen I used >>108615693
>reasoning-budget and reasoning-budget-message to forcefully cut the thinking off;
>>
>>108615820
>Rosetta Stone
Never even heard of this until you mentioned it
Good on you for self-reporting lmao
>>
File: 1662429002489968.gif (1.72 MB, 640x360)
1.72 MB
1.72 MB GIF
WHERE THE FUCK IS THE 27B MODEL
WHO THE FUCK WANTED THE SHITCUNT 3B ACTIVE PARAMETER MOE PILE OF SHIT
>>
Got this prompt from Gemini so Gemmy can check some github projects for me. Does it look solid or is there anything I should add/change?

While auditing, you should scan for:
Data Exfiltration: Any code that sends environment variables, local files, or sensitive data to external URLs.

Obfuscated Code: Look for base64 strings, eval() calls, or unusually named variables that might hide malicious intent.

Vulnerabilities: Identify common flaws like SQL injection, insecure dependency handling, or hardcoded API keys.

Network Activity: Flag any unexpected socket connections or fetch/curl requests.
>>
>>108615828
lawl
>>
>>108615811
>https://huggingface.co/CISCai/gemma-4-31B-it-NVFP4-turbo-GGUF
If you quantize everything in NVFP4 then it won't be within 1-2% of the original weights anymore...
>>
>>108615827
OK retard what do you use to understand a language and all it's complex nuances overnight? Or do you unironically just think understanding how to say a word in japanese from your zoomie animes means you understand japanese.
>>
>>108615811
NVFP4 doesn't work on Ampere bro.
>>
why do all .gguf leave much up compression when I checked it in hex editor tons of 000000000 that could compress well
>>
>>108615849
ACK
>>
>>108615857
feel free to contribute a pr :)
>>
>>108615792
I still think it's the best card you can buy for the money. It's just power hungry.

I'd love to buy a second one just so I can run TTS and image gen while running Gemma.
>>
>>108615488
>Mixed-precision quantized version of google/gemma-4-26B-A4B-it optimised by baa.ai using a proprietary Black Sheep AI method.
Is this some form of shilling or am I mistaken?
>>
>>108615792
I have 2, now one is collecting dust because there's not enough clearance on my motherboard so they get hot as fuck during inference, and I barely proompt anyway so leaving the guts out in the open ain't it.
>>
>>108615857
Is research even still going onto improving the quantization algorithms used for GGUF files? Or is it the pinnacle of what can be ever achieved already?
>>
>>108615857
Have you ever tried compressing a guf? It won't compress at all you dipshit.
>>
>>108615880
ik is spiraling
>>
>>108614226
You should leave a spare core. It will compile faster.
>>
File: 1772332977544613.jpg (48 KB, 500x556)
48 KB
48 KB JPG
>>108615886
ye retard using gzip wont work but i meant some alternative compression method for disk only because relative and block neurons still needs to be preserved so for in memory is a harder task obviously but you're a mouth breather
>>
>>108615698
In general I think you should try to make the humans job as simple as possible. the more complex it's task the more chance it has to fuck up.

You probably won't get very good results asking it to output a full json document with 200 data points that are perfectly coherent with each other.

That's why the little patterns worked really well for me. they're fast to generate, easy to parse and the human can generate new ones pretty quickly. The human also easily sees all the patterns it already created so it can stay creative. It also knows when to slow down or speed up.
>>
>>108615867
theres also 3090 ti
most of them watercooled so conveniently 2-slot..
also you can power limit them so it dont consume so much
>>
>>108615904
You're a riot ;)
>>
>>108615904
Nico sex
>>
>>108614665
>This release delivers substantial upgrades, particularly in
>Agentic Coding: the model now handles frontend workflows and repository-level reasoning with greater fluency and precision.
>Thinking Preservation: we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead.
This is just them training on the data they collected from people using Qwen Code and Gemini 3. Keeping old reasoning blocks is a waste of context.
>>
>>108615904
>posts a pedo anime image
>insults others
>>
>>108615904
>alternative compression method for disk only
can someone explain what this is supposed to mean
>>
>>108615195
critpt score yet?
>>
>>108615904
There used to be moderately active research on neural network sparsity in 2023 that tried to remove entirely useless weights, but that never went anywhere.
>>
>>108615972
This makes sense on some vague level, for example, BSP trees or something similar could make things more interesting.
However as I'm not a dev I can only speculate and speak out of my ass.
>>
>>108615959
go get drafted, zoomer.
>>
>>108616024
?
>>
>>108616024
?
>>
>>108615959
nico nico niii~
>>
Made the mistake of giving qwen a try in RP again. God bless Gemmy 4 for rekindling the local hope.
>>
>>108615880
Being realistic, this is a task that will take proper White engineers to solve and they're both in short supply and stretched very thin.
>>
>>108616047
gwen is for work
gemmy is for rape
>>
>>108616047
Do not dick down the Qwen. The masculine writing style makes it gay.
>>
>>108616060
And not the 8k tokens of reasoning that inadvertently fucks up its own flow and makes it forget what happened.
>>
>>108616042
https://www.youtube.com/watch?v=-14e-GfFBnQ
>>
>2026 and people still try to fuck gwen
>>
>>108616105
"people" try to fuck anything
>>
File: 1763785243918569.png (11 KB, 1006x61)
11 KB
11 KB PNG
>>
>>108615844
I'll take the silence as a yes
>>
>>108616124
>wait, no
>>
File: 1749588481122140.png (156 KB, 448x1388)
156 KB
156 KB PNG
>>108616105
Can you blame them?
>>
>>108615827
>Rosetta Stone
https://www.youtube.com/watch?v=OFQQALduhzA&t=106s
>>
>sex toy script
Someone make an onahole that I can connect to Gemma
>>
>Gemma 4 31b is normally quite good and brief with reasoning
>Put in a system prompt that bans it from all the specific types of slop it spews out
>1722 tokens of reasoning for a 250 token response
I mean it's not that long every time, and it does actually work, but FUCK. This is like 2 generations ago tier 'but wait' spam.
>>
>>108615620
So... hypothetically... I could get jerked off by a robot to the rhythm of The Ride of the Valkyries with Qwen3 TTS providing austrian painter JOI?
>>
>>108616195
just disable thinking brah
>>
>>108616195
Yeah. I really think the slop is what makes gemma good.
>>
>>108616160
And he still doesn't even realize I don't use rosetta stone and the point I was making is that shit like rosetta stone is a meme and you aren't going to remotely understand a language with it or anything like it or your local ai model that you jack off with or anything. ACTUALLY understanding a 2nd language takes time. Video is probably him thinking he understands a 2nd language.
>>
>>108616195
Long reasoning is where MoE would shine.
>>
>>108616201
We have the technology.
>>
I tried Opus 4.7 but I think I genuinely prefer how Gemma writes. What a yappy piece of shit.
>>
https://huggingface.co/Qwen/Qwen3.6-27B
https://huggingface.co/Qwen/Qwen3.6-122B-A10B
>>
>>108616224
I have the will to triumph
But insufficient VRAM
>>
File: GGNGswf.png (2.97 MB, 2800x1575)
2.97 MB
2.97 MB PNG
>>108616235
>>
>>108616195
>specific types of slop
>Gemma 4 e31b
What? Gemma completely lacks slop, that's why it became the #1 rp model.
>>
File: 1772570118574031.png (1.37 MB, 1239x1080)
1.37 MB
1.37 MB PNG
>>108616235
>>
>>108616235
Does anyone else get 404 on these links?
>>
>>108616259
works on my machine
>>
>>108616252
>e31b
>>
>>108616252
>it's not x, but y
>ozone
>primal
>>
>>108616252
>Gemma completely lacks slop
(You)
>>
>retards
>>
>*Cums all over /lmg/*
>>
>>108616221
I'm actually using the MoE as a draft model which is making this tolerable (~+40% speed ) but it's still just silly how hard it trips it up.
1200 tokens of that 1722 are JUST it arguing with itself and rephrasing the same 'not x, but y' phrase through 8 iterations.

>>108616252
Nigga I had to straight up ban the token for ozone to stop it saying EVERYTHING smells like it because it wouldn't even respect a prompt. And it wants to do x, not y multiple times a response SO BADLY it'll argue with itself for over 1000 tokens to tardwrangle itself.
Gemma 4 31b punches above its weight and is a neat little model, but it is a SLOP FACTORY.
>>
>not x - but y is a gemma thing
this has been the most overused slop construction on every model for the past year, are people saying this using gemma as their first model ever or have they just never tried anything new since the llama 2 days?
>>
File: gg.jpg (8 KB, 342x348)
8 KB
8 KB JPG
>>108616293
>>
>>108616270
>>108616273
>>108616282
>>108616305
wtf are you talking about? literally 0 issues here lmao gotta be shills or something
>>
chine so salty lomao
>>
File: 1760617360450497.png (129 KB, 953x981)
129 KB
129 KB PNG
>gemma completely lacks slop
lmao
>>
Unfortunately the whole 3.6 kind of seem to be that, but for OpenClaw rather than benchmarks themselves. 3.6 Plus is a huge downgrade on general chat purpose that does not involve agentic loop (it has very weird formatting and tendency to insert eos too early before completing the instruction; 3.5 does not do that).
>>
>>108616322
I accept your concession.
>>
>>108616221
Did you put LOW thinking in the sysprompt?
>>108616322
Nice, what tool is that?
>>
>>108616341
eqbench slop profile
>>
>>108616076
If your feminine waifu model has dementia and is a bit retarded, that's a fetish.
If your model is masculine then it's just gay.
>>
>>108616333
Seems like the same issue as Q3.5, needs a lot of context + system prompt to sit straight so to speak
>>
File: 1768042816203833.webm (136 KB, 500x500)
136 KB
136 KB WEBM
>>108616309
I will do whatever I want.
>>108616322
So just ban the tokens?
>>
>>108616322
What's the issue here exactly?
>>
>there are "people" that think banning the tokens will reduce the slop
>in 2020+6
lmao
>>
File: joker i know this one.jpg (64 KB, 800x565)
64 KB
64 KB JPG
>>108616259
>>
File: file.png (114 KB, 745x745)
114 KB
114 KB PNG
>>108616343
>eqbench slop profile
nice didn't know about that page
also... ohnonono gemma bros... it's not looking good
>>
File: no.gif (1.15 MB, 498x498)
1.15 MB
1.15 MB GIF
>>108616333
Help! u/yoracale and u/danielhanchen
>>108616349
picrel
>>
>>108616293
do you have any idea how hard it is to clean skeet off a general?
>>
>can't distinguish between 3 and 4
LOL
>108616356
>>
>>108616356
>nemo in top 10
KEKAROOOOOOOOO NEMOSHILLS BTFO
>>
>>108616235
>>
>>108616351
>elara
>hissing voices
>fucking ozone
"What's the issue here exactly?"
>>
>>108616356
>gemma-3
>>
>>108616235
>https://huggingface.co/Qwen/Qwen3.6-122B-A10B
nigger
>>
There's nothing wrong with Elalalalalala
>>
File: file.png (60 KB, 739x503)
60 KB
60 KB PNG
>>108616356
I knew k2 0905 was special
>>
yea
>>
>>108614665
funny enough the first place i saw this was in linkedin in
>>
>>108616356
Gemma 4 isn't even listed in your image, it's all Gemma 3 which is known dogshit
I think you might want to get glasses
>>
>>108616385
Special kind of shit. 0711 is the only good Kimi model. The only decent thing about the others is K2.5's vision.
>>
>>108616356
We should all be using llama 4 after all.
>>
>>108616356
qwenshill...your glasses...?
>>
>>108616385
Kimi-chan a cute. CUTE.
>>
>>108616250
>>108616253
>>108616259
>>108616373
>>108616381
Sorry anons I just wanted to put the links down so I could easily check when they upload. Didn't mean to mislead you.
>>
>>108615447
>I called you a pedo therefore I win the argument
Are you 5 years old or something?
>>
>>108616407
I wish!
>>
>>108616407
I bet you'd like that, ojisan
>>
>>108616407
Anon is clearly a brat in need of correction. #
>>
>>108616322
>anon is mad that model made to capture language structure reflects language structure.
lol
>>
>>108616426
none of that shit is language
>>
Obvious bait is obvious. But anons/newfags keep biting it...
>>
>>108616401
idk I think she could lose a little weight...
>>
>>108616322
You can just ban all of these and Gemma will be slop free, though? You're complaining about a non-issue here. Gemma is the only slop-free model if you aren't lazy and retarded.
>>
>>108616273
>like a physical force
>>
>>108616322
seeing people deny reality and cope about this is sad
>>
any anons with ready to use data pair for characteristic vectors?
>>
>>108615573
>OSR2 stroker
>https://osr.wiki/books/osr2/page/overview
Coomers.... I kneel...
>>
>>108616449
She's an adult and not loli-sized, but she's still a slim girl with her 32b active params.
>>
>>108615620
>funscript
I'm learning so much from this general
>>
>>108616442
yummy ;)
>>
>>108616430
my point is that language has structure, and even if you banned all those sentence, it'd have new frequently used sentences.
>>
>>108616525
nope
>>
>>108616525
Maybe if you only move in linkedin corpo crowd. Real language is much more diverse
>>
>>108616449
>>108616496
Not to mention she's natively 4-bit. Kimi's smaller than other huge models despite having the highest raw param count for that alone.
>>
Germa 4 doesn't add speakers to its output. Mitral and previous Germas did.
Not a biggest problem but still a problem, nothless.
>>
>>108616530
i agree on linked in being shit.
but this is irrelevant, language has structure, follow probability distributions etc.
ie zipf's law.
>>
>>108616544
stop with your compression bs we told you it wouldn't work
>>
>>108616544
>>108616530
Most spam on Linkedin is AI generated or at least edited, you just haven't paid that much attention to it.
>>
>>108616559
>>108616559
>>108616559
>>
>>108616544
What do you even think a large language model is? It's a high-entropy representation of language. It's incompressible by the nature of what it is.
>>
>>108616576
ITS JUST A LE SMART LE AUTO LE COMPLETE!!!!!!!!!!!!!!!!!!!
>>
lmao go look at a few pre-AI ya novels and tell me that (human) shit doesn't blend together into a slop smoothie
>>
>>108616576
>zips your weights
heh, nothin personnel, gemma
>>
>>108616554
>Most spam on Linkedin is AI generated or at least edited, you just haven't paid that much attention to it.
i literaly told you that i agree.
linked in is indeed shit.
yes it's spammed with ai slop.

though to be fair, even before llm's linked in was soulless, now it's just worse.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.