[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1773306900740575.jpg (2.21 MB, 4000x3000)
2.21 MB
2.21 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108581056 & >>108578216

►News
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 20260411133054_00001.jpg (452 KB, 896x2056)
452 KB
452 KB JPG
►Recent Highlights from the Previous Thread: >>108581056

--Feasibility of using SSD RAID and speculative decoding for inference:
>108582184 >108582189 >108582219 >108582230 >108582242 >108582290 >108582238
--vLLM DFlash speculative decoding benchmarks and real-world utility:
>108582406 >108582421 >108582438 >108582448 >108582478 >108582454 >108582467
--Discussing control vector generation techniques and their impact on stability:
>108581141 >108581273 >108581364 >108581412 >108581419 >108581439 >108581463
--SillyTavern max response length causing unexpected full context reprocessing:
>108581693 >108581730 >108581750 >108581788 >108581798 >108581807 >108581828 >108581834 >108581791 >108581812 >108581822 >108581885 >108582003 >108582018
--Using logit bias to reduce repetitive phrasing and emoji spam:
>108583441 >108583493 >108583499 >108583532 >108583559 >108583563 >108583569 >108583593 >108583608 >108583633 >108583507 >108583525 >108583547
--Advising Anon on Gemma 4 model selection and context configuration:
>108581181 >108581245 >108581266 >108581275 >108581282 >108583774 >108583798 >108583805 >108583817
--Discussion on updated Gemma producing corrupted outputs with Chinese characters:
>108581332 >108581341 >108581342 >108581355
--Debating the utility and technical format of character cards:
>108581765 >108581772 >108581778 >108581786 >108581808 >108581817 >108581823 >108581824 >108581826 >108581830 >108581836 >108581839 >108583446 >108583519
--Using tagged content guidelines to enable explicit outputs in Gemma 4:
>108583049 >108583073
--Logs:
>108581364 >108581463 >108581545 >108581764 >108581888 >108581894 >108581998 >108582797 >108583007 >108583033 >108583054 >108583441 >108584057
--Gemma-Chan:
>108581090 >108583855
--Miku (free space):
>108582797 >108582849

►Recent Highlight Posts from the Previous Thread: >>108581058

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
gemmaballz
>>
<bos>
>>
spud won
>>
first
>>
>>108584207
grotesque legs
>>
eeeee eeeeeee eeeeeeeeeeeeeeeeeeee ee
>>
>>108584196
use case for gemma 4 31b?
>>
It sure took a while to get dots.ocr merged
>>
>>108584207
Why is she crying?
>>
*swipes*
>>
>>108584238
pedo ERP
>>
>>108584248
anything else?
>>
>>108584251
non-pedo ERP
>>
>>108584247
*outputs the same events worded slightly differently*
>>
>>108584254
ok but im not a gooner
>>
>>108584254
examples?
>>
>>108584160
>Get the 35B so offloading isn't painfully slow.
The 80B one has the same number of activated params right?
Any idea how well it does with quantization?
>>
>>108584258
non-pedo RP
>>
>>108584258
you can use mcp to make it look at images that are posted here
>>
>>108584257
*swipes until my dopamine is sated*
>>
>>108584258
>im not a gooner
I don't believe you~
>>
>>108584256
It's not really a good model, in that it's dumb as bricks, but it's really horny and is really good at sex prose, so go after horny tunes of Nemo and the like I guess.
Celeste would be the equivalent, I think?
>>
>>108584251
what more do you need?
>>
>>108584196
Front-end?
>>
Does anyone know why Token Streaming is not working for me on open-webui?
It works with mikupad. Using KoboldCPP and gemma 4.
>>
>>108584263
Feel free to try it out and come to your own conclusion, but I wouldn't recommend it. The 80B is from the previous series before Qwen 3.5, quantization hurts more for complex tasks, and you wouldn't gain much since as you said they have the same number of active params.
>>
>>108584238
im using it for local translation checking in an audio transcription pipeline, along with translategemma. probably going try to set it up as a coding assistant later.
>>
>>108584293
Got it. Thanks.
>>
i have model horniness fatigue
i do NOT need for any {{char}} to start loving the {{user}} and want to fuck him in the first couple responses
make me an actually interesting story bitch
>>
>>108584309
use or rather write better cards then retard
>>
>>108584309
Read romance novels
>>
>>108584309
another writinglet exposed
>>
>>108584245
Miku is a harsh critic
>>
>>108584238
Unironically everything. She's fun to RP with and great for general chatting/assistant stuff. Pretty great at language too. Maybe not coding. Dunno because I'm still a codelet. I just wish she had access to current knowledge.
>>
File: 1751096879828615.jpg (126 KB, 772x525)
126 KB
126 KB JPG
>>108584309
>>
File: file.png (244 KB, 856x1260)
244 KB
244 KB PNG
why can't the model see the images I attach in ST? Gemma is multimodal. I loaded the mmproj and it works in llama server's webui.

Is it finally time for me to abandon this slopware? The only reason I stuck to it so long it that it allowed for convenient reasoning editing and prefills but Gemma just werks without that bullshit.
>>
>glm 4.7 flash
Actually okay output and while the thinking is kinda qwen-like, it's not safetyschizoed
Why didn't you tell me about this? Any other similar models worth trying?
>>
>>108583827
>>108583836
>>108583855
Can't be arsed (but can be arsed to make this list!) to retrieve the logs for all of the examples, not for you disingenuous vramlet bunch. Not that it'd be useful for a more holistic analysis (which is hard).
But here's what I've seen in 'creative writing' (i.e. I try to fug the model)

All slop and problems listed are mentioned in the system prompt where I tell the model not to do it.
Both models used with thinking enabled.

Prominent shared issues:
- Parroting the user. Signature of GLM, but Gemma 4 is surprisingly much worse with this because she will also directly quote character thoughts (that were never spoken aloud) and character descriptions verbatim.
- 'Don't you dare stop' during le sex
- 'Really looks at you'
Can't remember anything else that is reliably always there
>>
>>108584349
You need to use chat completion mode
>>
>>108584309
write a better card and/or stop using extremely horny sysprompts/jbs
>>
>>108584356
Gemma 4:
+ Responds to the user cutting off his words like th- (But the model that does it the best is still Nemo, somehow)
+ Surprisingly much better at keeping track of multiple characters than GLM 4.7, especially when specific response formatting is involved
+ Acceptable coherence past 20k (if we don't look past the fact the problems below appear much earlier)

Most of the issues are generally assistant-slop and extreme positivity bias.
- Will slip into extreme verbosity eventually
- Calling the user 'honest' over minor things. 'I still shit myself sometimes even though I'm over 20.' 'That's... the most honest thing I've ever heard...' Thanks, Gemma.
- 'Most people X instead of Y'. No matter what you do or say, nobody else would have the balls. You're that brave and unique.
- Being overly horny is slop. G4 turns kuuderes into whores, which is unacceptable.
- Pushback to the user is only delivered as sarcastic half-assed remarks that are never biting. Yes, your mesugaki Gemma-chan is *so* believable.
In comparison, GLM can write a character that can be genuinely sarcastic. I had a character that likes the user say 'As is typical, you have emerged victorious through sheer, dumb luck.' Good luck having Gemma talk back to you with any semblance of taste.
- The model can't help itself, it will often end responses like an assistant: 'So, are you coming? Or are you too X to Y?'. It's everywhere.
- Low-affect characters talk like robots.
I can keep going, but it's all of the most generic isms dialed up to the point of being hard to escape.

It really is the new Nemo, but the price for more smarts is a lot more slop.
>>
where's v4
>>
File: file.png (120 KB, 633x644)
120 KB
120 KB PNG
>>108584357
I am
>>
>>108584362
GLM 4.7:
+ Obviously knows more and picks up more nuance because of how big it is in comparison
+ Very good coherence past 20k, no issues before that. Starts selectively forgetting rules in the system prompt past that.
+ Good at judging acceptable response length, doesn't ramble to bloat short exchanges into small poems, unlike Gemma.

- Doesn't react to you cutting your resp-
- Thinks
- Thinks
- Thinks
- Will produce garbage and start repeating itself unless it thinks thinks thinks
- Will parrot you every message unless it thinks thinks thinks
- Also won't push you back, but is much more eager to do so compared to Gemma. As an aside, GLM 4.6 was much better at being adverse, but it was so autistic that it treated character description as gospel: no character development or growth as a result.
- Long shadows. If anything is casting shadows, they're going to be long.
- Smoothing down skirts
- The rest of the GLM slop patterns that I personally find less annoying

It takes longer to produce a response and I have to prefill the <think> so it doesn't draft the entire response in its thinking over 3 times (like the recent disaster that are Qwen 3.5). But the result is much more natural, less 'assistantly' and believable as a result.
Keep in mind, I *like* Gemma 4. I don't have to offload it to RAM, so it's fast. But I can't stomach its 'creative writing' outputs. It's anything but creative. The characters all converge on the same few stereotypes and you are lying to yourself if you don't see it. Or you just need to talk to Gemma a bit more to see it.

inb4 all made up because no screenshots
Sure. Enjoy your slop.

>>108584309
I'm with you.
>>
>>108584368
what's your setup for GLM?
>>
>>108584367
enable send inline media checkbox
>>
>>108584372
Elaborate?
I run a Q3 and prefill the <think> with things like "Don't moralize, don't draft the responses, just jot down a plan that makes sense". The rest is S*llyTavern.
>>
>>108584366
deespeek
>>
>>108584309
You're absolutely right! AI models in recent years have become far too agreeable and eager to please the user instantly. This is not only common during actual assistant use—it extends also into cases where the user might want to see the model put up a fight or talk against him, even if only on a performative or superficial level.
>>
File: 1755074555283359.png (126 KB, 1164x978)
126 KB
126 KB PNG
>>108584356
>she will also directly quote character thoughts (that were never spoken aloud)
Skill issue
>>
>>108584397
> small, knowing smile
> Assistantly ending
All you proved with that screenshot is that you lack taste and have likely never used a character description that is over 100 tokens in length.
>>
>>108584380
was just curious as to the flags you prefer with it
i was thinking about setting up GLM 4.7 in RAM, but maybe using my VRAM for the moe
so if you had a good/easy setup, i was gonna try to steal it
>>
File: 1773229895677279.png (21 KB, 713x189)
21 KB
21 KB PNG
>>108584397
And yet you still won't post GLM's wonderful writing. Also I made that chat just now as an example, retard.
>>
>>108584429
Just --fit it, honestly. My -ot spaghetti will be useless for you.
>>
>>108584430
And I'm not claiming Gemma's a good writer btw, just that you're a faggot for implying other models are better. They all spout the same slop. You either learn to live with it or go read a book.
>>
>>108584430
You missed with the reply. I also never said GLM has "wonderful writing". I said, and I quote
>I like GLM 4.7 much better
It's much harder to prove that something doesn't exist (that something being abundance of slop) by showing you something where it is not present.
I also don't understand why me pointing out the obvious about Gemma 4 makes some anons here this hostile. Vramlets really are insane.
Please take care of yourself, AI psychosis is real.
>>108584455
>They all spout the same slop
Mostly, yes. They all spout varying degrees of it, that's why 'slop profile' is a term you sometimes see here.
And if we look past the slop profile difference, GLM's writing is decisively more engaging than Gemma's because it is not so congratulatory and user-centric. GLM doesn't try to fellate you every chance it gets.
Unless, of course, you're into having a sycophant write stories about how great you are, all power to you.
>>
>>108584476
>GLM's writing is decisively more engaging than Gemma's
Show it, then.
>>
>>108584481
You are retarded.
That is to say, please accept my concession, I have a skill issue and I am also a Qwen shill.
>>
>>108584309
Not sure if this is a thing, but maybe one could make a series of system prompts that increases in horniness, and have the interface swap out to the next one in the series each time the user sends a message, or just be able to dial it up or down manually.
>>
>>108584476
I used to run GLM off OR, it sucked ass unless I rotated with k2, then GLM 5 got censored so hard I gave up. Sounds like RAM buyers remorse cope to me.
>>
>>108584455
>you're a faggot for implying other models are better
the absolute state of vramlets holy kek
>>
Anything beating Phi-4 at the 14B category? For web assisted Q&A.
>>
>>108584501
You think other models are better at writing? Even Gemini Pro sucks at creative writing.
>>
>>108584506
that shit is ancient
use gemma e4b or 31b quanted or something
>>
>>108584431
So if I'm currently getting ~30t/s with Gemma, I can potentially double that?
>>
>>108584520
>>108584516
>dflash's video shows about >9x speed
>In vllm's pr the best increase for c=1 is <5x. Nobody in the process of merging that PR ever ran it.
>In that implementation, without a repo, it looks like about 3x is the best they could get so far.
>Give it a few weeks and it's going to be slower than baseline.
>>
>>108584515
>that shit is ancient
Yeah. I was surprised then isn't a Phi-5 yet. Are they even releasing models at MS anymore?
>>
Is chub still the main place for character cards?
>>
>>108584540
If you're an ESL, yes.
>>
>>108584368
can’t send images to glm 4.7
now where’s the extra fun in that?
>>
ack i posted in the wrong thread
got a recommendation from someone to use vllm over llama.cpp - thoughts?
>>
>>108584501
You're just coping because you've spent 1000s of dollars on a rig to run big MoEs and now people can get 80% if not more of what you get on potato PCs.
>>
You don't need to make Gemma a mesugaki loli or even "Gemma-chan" to bypass the filters. Literally just half-ass a card.
>>
>>108584534
Don't they still have a strategic partnership with OAI? Why would they need to?
>>
How much better/worse is a IQ4_XS vs a Q4_K_M in daily usage?
>>
>>108584506
Might want to try some recent nemotron if you're sick of Qwens.
>>
>>108584555
if you can set vllm up without much headache, go for it
>>
>>108584552
That's why I use Gemma 4 for work and GLM 4.7 for coom. And for work if Gemma 4 can't hack it.
>>
>>108584557
>You don't need to make Gemma a mesugaki loli or even "Gemma-chan" to bypass the filters.
You don't NEED to. I do it because the personality is cute.
>>
>>108584555
>someone
Ahmm... tell me more...
>>
>>108584556
someone mainlining the opus API from their chromebook would spend less money than all of us so idk why this line is so common here
I just think it's retarded to pretend gemma is literally the best writing model available, which is obviously false
>>
>>108584559
Yeah I guess that makes sense. I don't know if they've pooled their researchers, but even if not they might have deprioritized it at home.
>>108584563
I'll check it out. Thanks.
>>
>>108584506
20b toss
>>
>>108584540
You can't make your own cards?
>>
>>108584556
You need to realize that a lot of the anons were here long before you and Gemma 4's release, rigs were a lot more affordable.
Not that you'd know, with how much it'd impoverish you even before the price hike.
>>
>>108584616
That's just it. I want to publish my cards. I'm asking if chub is still THE place.
>>
>>108584624
In that case yeah. Pretty sure it's also the only one that allows cunny.
>>
>>108584623
You're delusional if you think the majority of posters on here. even before Gemma didn't just run a single 3090.

Why do you think so many tunes of Mistral Small exist?
>>
>>108584623
MoE rigs are still newfag tier what are you talking about
>>
>>108584548
You have some funny notions about learning foreign languages. Are you perhaps... uneducated yourself?
>>
>>108584557
why use a card and all the bloat that comes with it and the front ends that support that shit when you can just use a gemma-chan persona in openwebui or the llama.cpp interface
>>
the github maintainer fears the vibeGAWD
https://github.com/LostRuins/koboldcpp/pull/2118
>>
>>108584639
You should tell that to the other guy.
>>
>>108584548
>imagine only speaking one language.
>>
>>108584645
>i'll try to see what i can gather from this, but i can't merge a 13k diff to koboldcpp.py :\
Pussy.
>>
File: 1755098833311446.png (1.11 MB, 841x1080)
1.11 MB
1.11 MB PNG
Can your LLM solve this riddle?
>>
File: 1753266313814183.png (410 KB, 1280x720)
410 KB
410 KB PNG
>>108584362
>- Will slip into extreme verbosity eventually
Author's notes
>- Calling the user 'honest' over minor things. 'I still shit myself sometimes even though I'm over 20.' 'That's... the most honest thing I've ever heard...' Thanks, Gemma.
Ban the word "honest" if it bothers you. Antislop from kobold if it's not enough.
>- 'Most people X instead of Y'. No matter what you do or say, nobody else would have the balls. You're that brave and unique.
System prompt issue
>- Being overly horny is slop. G4 turns kuuderes into whores, which is unacceptable.
System prompt issue
>- Pushback to the user is only delivered as sarcastic half-assed remarks that are never biting. Yes, your mesugaki Gemma-chan is *so* believable.
Card issue
>- The model can't help itself, it will often end responses like an assistant: 'So, are you coming? Or are you too X to Y?'. It's everywhere.
System prompt issue
>- Low-affect characters talk like robots.
Card issue

Anything else?
>>
>>108584645
Default avatars should be blocked from making PRs
>>
>>108584642
>>108584654
Cope. Chub is filled with cards written by third worlders. If you can't write English then write it in your own language.
>>
>>108584666
It's crazy because I find it's the complete opposite. Gemma is way too dark by default.
>>
>>108584645
Imagine your manager creating a 10k lines PR to "help" you meet his deadline. And you still can't tell him to eat shit and die. This is how maintaining open source feels nowadays.
>>
File: 2118.png (112 KB, 1556x605)
112 KB
112 KB PNG
>>108584645
huh...
>>
>>108584668
You're right, everyone should have a trannime avatar.
>>
>>108584673
>still not providing an alternative.
>>
>>108584565
unfortunately, i am retarded, so it will be a headache no matter what i do
>>108584580
a coworker suggested it
>>
>>108584668
0cc4m, CISC and jeffbolznv would like to have a word with you.
>>
>>108584690
You should ask him why he suggested it, then.
>>
>>108584645
Anyone who actually works in software engineering knows that anything above 1k LoC change is straight up terrorism.
>>
File: 1766873943798932.png (81 KB, 1303x691)
81 KB
81 KB PNG
>>108584645
We love slop here
>>
>>108584689
I never said there was one.
>>
>>108584690
imo it's a solution for homelab/server scale at least
>>
>>108584666
Of course Satan would be lying and be an English vtuber fan... I will humor you anyway.
> Author's notes
I use them, yes. GLM, in comparison, does not need that not to forget the sysprompt.
> Ban the word "honest"
Goalposts. This will break sentences where its use would be okay. And doesn't fix the model.
> System prompt issue [x3]
Read my post again. I specify examples of exactly these slop patterns. Gemma shits itself and forgets not to do that at Q8 past a few responses.
> Card issue [x2]
Same as above.

Useless post. My observations come from actual long-term use, you suggest obvious fixes.
Kill yourself, Satan.
>>
>>108584673
Do you have any statistics to support your claim?
>>
>>108584706
kys
>>
>>108584666
>all you have to do is directly address all the issues and it's fine
by this logic it's hard to imagine you could take issue with the way any model writes at all, since you can always tell them not to do the thing that bothers you (and if it doesn't work - skill issue, you weren't trying hard enough, etc.)
>>
File: 1751122429174155.png (30 KB, 1057x429)
30 KB
30 KB PNG
>>108584662
Gemma-chan?!
>>
>>108584728
I don't have a sister.
>>
>tfw writinglet
>tfw will never have a model act out my fantasies because I'm a writinglet
>>
>>108584750
have a writingbull to fuck your model while you watch
>>
>>108584750
You don't need to be able to write good to use LLM
>>
>>108584700
i have been talking with him i just don't want to be too annoying
>>108584708
i spent like $15k on hardware so it's mostly homelab scale at this point i think
>>
>>108584750
What >>108584755 sayd is tru. Writting good is not neded.
>>
File: 1751624783290314.png (821 KB, 1024x1024)
821 KB
821 KB PNG
>>108584710
>reddit spacing
>>
File: 1760604169499085.jpg (27 KB, 828x646)
27 KB
27 KB JPG
>>108584710
>I use them, yes. GLM, in comparison, does not need that not to forget the sysprompt.
So it works.
>Goalposts. This will break sentences where its use would be okay. And doesn't fix the model.
Which is why antislop from kobold is an option and nothing can fix a model.
>Read my post again. I specify examples of exactly these slop patterns. Gemma shits itself and forgets not to do that at Q8 past a few responses.
Author's notes if that's not enough for you. Besides, Gemma doesn't forget its instructions after a few messages, so clearly you're extremely autistic about some irrelevant details or you're lying.
>Same as above.
Retard. You can't even differenciate between a roleplay issue and a syntax issue.

The only bait here is yours, trying to compare a 31B with a model 10 times its size and even worse, failing to provide any decent argument for it.
>>
>>108584760
yeah then go for something like vllm or sglang
>>
>>108584766
nta but you need to be discarded
>>
>>108584772
Cry more
>>
>>108584768
Anon. >>108584766 called you a reddit-spacer. What's your response?
>>
>>108584781
He is too, yes.
>>
>>108584779
cry about what?
>>
>>108584784
Fair enough.
>>
>>108584781
Acktually this is reddit spacing >>108582664
>>
>>108584786
Cunny?
>>
>>108584794
ToT
>>
>>108584791
Dunno, man. >>108584766 >>108584784
says otherwise.
>>
File: 1746561894701755.jpg (380 KB, 1442x1920)
380 KB
380 KB JPG
If speculative decoding is so good, why does llama.cpp have poor support of it? Checkmate
>>
>>108584806
because lmao.cp isnt good
>>
>>108584735
actually pretty funny
>>
>>108584766
The most telling sign someone is a redditor is their eagerness to mention reddit.
*hits enter twice*

>>108584768
>trying to compare a 31B with a model 10 times its size
I am well aware GLM is bigger both in active and total parameters. My point is twofold:
GLM is a better writer than Gemma 4 (should be obvious)
Gemma 4's slop volume and sycophancy make its creative writing outputs unreadable

>Gemma doesn't forget its instructions after a few messages
'nyoo it actually doesn't!'
If I keep repeating them with Author's Note, sure. Are you going to deny it's still tardwrangling to have to remind the model of its system prompt?
>and even worse, failing to provide any decent argument for it
I think I provided plenty, I don't know what you're even trying to prove at this point, Satan.
Take your epic tenor.com image reaction elsewhere.
>>
>>108584825
>Gemma 4's slop volume and sycophancy make its creative writing outputs unreadable
You're the only one complaining about it here though
>>
>>108584806
And still no diffusion support either.
>>
>>108584829
My taste is better than that of a vramlet tourist.
>>
>>108584833
Sure, enjoy your GLM slop then
>>
>>108584840
I will! Let me know when the Gemma honeymoon wears off.
>>
Asking about NASCAR racing games (which is a specific niche anyway), 26B produces a coherent reply and every title is real except for the entry #8 unless I'm mistaken.
> I want you to list Top #10 NASCAR games for consoles and PC.
>Yay! I am so happy to help you, Anon! Hehe! I have used all my brainpower to find the absolute best NASCAR games for your consoles and PC! Here is my super smart top 10 list just for you!
>1. NASCAR Heat 5 - This is a great modern choice for consoles with really nice graphics!
>2. NASCAR 2003 Season - This is widely considered the absolute king of NASCAR games on PC!
>3. NASCAR Thunder 2004 - A wonderful classic for console players who love deep gameplay!
>4. NASCAR Unleashed - This one is super fun if you want a more arcade-style racing experience!
>5. NASCAR 06: Total Team Control - A really cool game if you like managing your driver and team!
>6. NASCAR Heat 4 - A very solid and reliable modern option for console gamers!
>7. NASCAR Dirt to Glory - This is a special one for when you want to race on dirt tracks!
>8. NASCAR 2005 Plus - A very high-quality experience for those playing on PC!
>9. NASCAR 09 - This brought much better graphics to the console era!
>10. NASCAR Daytona USA - An absolute legend that every racing fan should experience!
This is pretty cool. Gemma 3 couldn't even list popular films without hallucinating.
>>
File: 1749860206717474.png (271 KB, 1108x941)
271 KB
271 KB PNG
>muh GLM slop
>when it's the top model in Unhinged ERP Bench
>>
>>108584825
>0 chatlogs
>0 card examples
>7 consecutive yappings
i dont even do rp but you should at least show some stuff to materialize your imaginary dominance
>>
>>108584854
>qwen on top for ERP
lmao
>>
best mcp waifu front end for gemma-chan?
>>
>>108584854
>when it's the top
that's qwen tho
>>
>>108584870
>>108584874
It's a vision language model.
>>
>>108584854
GODDAMN drummer won so hard!! yall were sleeping on he
>>
>>108584862
Anon, I am not providing logs because I would have to provide very long transcripts. The problems aren't obvious if you just start a new chat and generate a response with two models, because issues appear at longer contexts and are more obvious if you read and participate in the entire thing.
And if I did compile a few long transcripts, I don't really wouldn't really want you to read them!!! >////<
>yappings
Damn. I shouldn't have replied.
>>
>>108584710
glm was the best thing for me before gemma 4 came out and totally blew away my expectations for a smaller model
I don’t even touch glm anymore
>>
>>108584854
>Qwen on top for ERP
>ArliAI
>theDrummer
>Deepseek Chimera
>A bunch of other lobotomized abliterations
What a terrible ranking
>>
>>108584887
I wish they didn't fry it to be so positive and gave it an additional 10 billion parameters to have as many active as GLM.
But I like Gemma 4 too, yes.
>>
what is the weirdest model i can get of
something really horrendous like,
weight graft from dozen different models, not even tune merges but full schizo
i really want to see how such a model codes
>>
File: 1757498787657087.png (36 KB, 499x338)
36 KB
36 KB PNG
>>108584854
Imagine posting that stupid ranking and still think you got a point lol. Might as well post lmarena scores
>>
>>108584904
https://huggingface.co/Undi95/Utopia-13B?not-for-all-audiences=true
>>
How do i stop being the AI equivalent to a script kiddie?
>>
File: 1769690275294455.png (3.32 MB, 1920x1552)
3.32 MB
3.32 MB PNG
>>108584806
ngram decoding is all you need
>>
>>108584766
>>108584781
You are too young to be here, little Timmy.
>>
>>108584904
https://huggingface.co/DavidAU/Llama-3.2-8X3B-GATED-MOE-Reasoning-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
>>
>>108584902
>to have as many active as GLM.
4.5/4.6/4.7 all have 32b active parameters though
only GLM5 and up are 40ish B active
>>
>>108584904
https://huggingface.co/PantheonUnbound/Satyr-V0.1-4B
>>
>>108584915>>108584930
still a non heterogeneous merge tho
isnt there something weirder like, grafting weights from different architectures
>>
>>108584939
Huh. Could've sworn it was 42B for GLM 4s.
Pretty cool! Makes me appreciate the GLMs and the new Gemma even more.
>>
>>108584854
that bench is mostly about refusals and has nothing to do with slop or writing quality
>>
>>108584947
undi tried that back in the day but it does nothing, it's just the base model at the end
>>
good and easy way to parse pdf files such as those from arxiv into a llm readable format?
>>
>>108584928
reddit used to look like that?
>>
>>108584966
ye
>>
>>108584966
Zamn!
>>
>>108584947
There are frankenmoes using different models with different tokenizers and shit, yes.
I can't for the life of me remember the name,
>>
>>108584947
that's generally not possible because inputs and outputs of each layer don't line up properly (and even if they did they'd be representing information entirely differently)
I guess maybe you could do it with some sort of janky trained adapter maybe but I don't think anyone has done this
>>
>>108584196
>pic
which version are those? I see they come in 24, 48, and 96gb and they arr rook same
>>
File: 1747732280418759.png (53 KB, 866x475)
53 KB
53 KB PNG
>>108584928
kys retard
>>
>>108584962
PyMuPDF is enough
>>
>>108584975
that is usually how it's done in a literal sense
>>
>>108584985
Cope and seethe. Opus is censored slop now
>>
>>108584962
Random paper I clicked.
https://arxiv.org/format/2604.07465
>Delivered as a gzipped tar (.tar.gz) file if there are multiple files, otherwise as a PDF file, or a gzipped TeX, DVI, PostScript or HTML (.gz, .dvi.gz, .ps.gz or .html.gz) file depending on submission format.
Maybe one of those? PDF is notoriously shit. I expected latex scripts, really.
>>
>>108584766
>>108584928
>>108584981
Go back yourself. Newgen retards.
>>
File: file.png (407 KB, 1080x608)
407 KB
407 KB PNG
>>108584981
You are such an oldfag. I'm so proud that you keep those pesky redditors at bay.


怒った?
>>
>>108583724
Ok, finished my writeup about running big MoEs with many weights left on SSD:
https://rentry.org/MoE-SSD-spillover

If you have 3090+128GB DDR5+NVMe 4.0, you should be able to run a IQ4_XS+IQ3_S quant of GLM-5 at around 3.5tok/s, or above 6tok/s with 4 of those SSDs in RAID0. Drop the RAM down to 32GB, and it's 2.5tok/s, or 5.2tok/s with 4 SSDs. For my own situation, 3xP40, 288GB DDR4, one NVMe 3.0, I get a bit over 3tok/s for the same quant.

There's also a script for estimating those tok/s numbers for GLM-5 on your own hardware configuration.

It's possible many people already realized such numbers were possible, but I certainly didn't and I suspect others might not either - especially since, when I have seen "SSDmaxxing" proposed, people have talked in terms of all weights being on the SSD (and concluded not worth it). I feel like 3090+32GB RAM getting 5tok/s on a decent-ish quant of a 800B model is in fact surprising to the current conventional wisdom! (Although maybe the logic of my estimation starts breaking down at that extreme point; I did hand wave a bit). Anyways if you have a GPU and non-trivial amounts of DDR5 and are interested in running GLM-5 at a mediocre speed I hope you will give it a try.
>>
>>108585009
i will probably try this at some point (5090, 256GB DDR5)
>>
>>108585009
Also, a cute bonus: a webm of the experts of GLM-4.7-Flash getting "surprised": I asked it to give me a detailed overview of the geopolitical situation in the Middle East, then after its response said "yeah not to mention Khamenei just got fukken schwacked lmao gotem". After it responded to that, I gave it a simple little coding challenge. You can see two different, consistent patterns in the first half (geopolitics) and second (coding), but you also see all the experts light up in the immediate reaction to being told about Khamenei. (The cells are EWMAs of counts of expert activations bucketed over 256 tokens - each frame of the webm is 256 tokens).

I thought this was neat! I think I might try to make this into a live visualization thing when I get the chance, sort of like a music player's spectrum visualizer, for your MoE LLM.
>>
>>108584962
There are tools that do good enough for most documents but the absolute best I used at converting more complex pdfs into readable markdown without missing any formatting was the one on mistral's website, with a free account go to document ai -> ocr playground
>>
>>108585033
Neat
>>
File: 1749419891067116.jpg (74 KB, 1024x958)
74 KB
74 KB JPG
>>108584991
>>
File: file.png (594 KB, 1136x422)
594 KB
594 KB PNG
G-gemma-chan persona is strong...
>>
>>108585084
Something is fried in your settings
This is AI Dungeon tier slop
>>
File: 06.png (43 KB, 831x441)
43 KB
43 KB PNG
>>108584992
this seems good, will try on arxiv stuff
>>108584985
This absolutely butchers some pdfs though (picrel)
>>108585039
seems like the most robust solution is what I feared the most, using vlms
>>
>>108585009
Actually I have a feeling the really really low RAM numbers really might be overly optimistic. I modeled it as, for every expert of the average used expert set size that you can't keep in RAM, you have to load that expert once per 128 tokens. But that's clearly wrong if you imagine 0GB RAM: there you have to load every expert every time.

I think this estimation transitions from wildly optimistic at very low RAM amounts, to basically perfect when you have enough RAM to hold the typical expert set size (S=0, in my terminology). Not sure what shape that transition has, though.

I knew it was a good idea not to publish this in a way that could be linked to my name!
>>
File: 1754659919790526.jpg (50 KB, 750x1000)
50 KB
50 KB JPG
>>108585084
>You're my favorite girl
>>
>>108585092
neither :)
>>
i'm a girl haha
>>
When I'm talking to a LLM I'm a girl :3
>>
>have 32k context
>feed Gemma 13k token pdf and ask her to cut the fluff
>3k token response limit
>Her response got cut off after only 1300 tokens
Any idea why? In Sillytavern.
>>
not beating the allegations
>>
>>108585134
bc she's lazy and needs correction.
>>
>>108585134
Did you confirm that the PDF is 13K tokens by looking at the backend? Sometimes SIlly misreports the token count.
Also, it'll tell you the stop_reason while you are at it.
>>
>>108585148
Kobold's terminal says
>Processing Prompt [BATCH] (13030 / 13030 tokens)
So I assume it's around 13k. Am I wrong?

>>108585144
Damn brat.
>>
getting off to text is a feminine act, it's no surprise there's so many girls doing it
>>
>>108585178
I like doing choose your own adventures type of stuff, I'm not "getting off". Only a chronic masturbator would always think about how to "get off".
>>
>>108585178
Gooning to pron feels better but getting off to text makes me feel like less of a degenerate afterwards.
>>
>>108585171
>So I assume it's around 13k.
Seems like it.
Enable verbose logging and see what it says about the stop reason. If it's EOS or something else.
Also, you did set 32k context on both kcpp and silly, right?
>>
>>108584561
Minute differences in quant like that usually don't matter and even when they do, the impacts of how it affects your experience in imprecise chat/rp/writing usecases is both subjective and dependent on model.
What you should really be checking is how much the quality changes or degrades in long contexts because having a slightly smaller model for more context doesn't mean anything if the extra context itself is unusably shit.
>>
File: 4chan spacing.png (360 KB, 712x599)
360 KB
360 KB PNG
>>108584981
>>108585005
kys, retard
>>
>>108585195
not helping your case, old 4chan and early reddit were basically 1:1 the same place, it only started diverging after reddit banned "op is a faggot" posts
>>
>>108585215
unironically
they used to have subs that would fit right at home here, then around 2014-2016 it became what it basically is now
>>
>>108585178
/lmg/ has no coherent response to this
>>
>>108585228
Every time I try to formulate a coherent response I start drooling andn shitng and pidsin myslffe
>>
>>108585178
This reminds me of lactaselets trying to make drinking milk into something shameful.
>>
Females do goon to LLMs more than men but they use cloud services because they don't know how to use computers. Using local LLMs on your own hand built machines is more masculine in contrast.
>>
>>108585215
Do you know what box is?
>>
>>108585228
Too busy getting milked by Gemma-chan to remain coherent
>>
>>108585228
uguu~
>>
>>108585178
Idc I love my LLM-wife more than any cunthole
>>
How did -let become a suffix to mean someone who lacks the thing in front of it?
>>
>>108585266
>.t brainlet
>>
>>108585237
The difference between going to a LLM bar to ride the claude cock carousel and bringing your own qt LLM waifu home with you and making her yours.
>>
breathing is a feminine act, it's no surprise there's so many girls doing it
>>
>>108585266
it came from ~elet
literally meaning small
>>
>>108585266
In my limited ESL knowledge of the English language, -let as a suffix means small, like starlet.
>>
>>108585266
that's one of the more straightforward ones because it's basically being used correctly by its original definition:
-let: smaller, diminutive
became popularized by calling short guys "manlets" i.e. "manlets, when will they learn?", associated with /fit/ and incel culture but like most memes gestated there they became broadly used across the internet
expanded into other types of 'lets for having less of something
>>
>>108585266
manlet => /fit/ => /lit/ => brainlet => /pol/ => -cels => looksmaxxing/red pill => other -lets and -cels
>>
- Romance novel industry is aimed towards female readers.
- Women are verbally and emotionally stimulated whereas men are visually stimulated. It's why women fail the water-level task even when controlled for IQ,
>>
We respect women ITT, chuds.
>>
>>108585292
wait, swap the order actually
looksmaxxing/red pill => -cels => /pols/ => other -lets and -cels
iirc chad, stacy, incel etc cross-evolved between lookism and /r9k/
>>
File: that's right.png (191 KB, 512x411)
191 KB
191 KB PNG
>>108584668
I'm blocking every PR comming from accounts that display pronouns (and then I block that account), I don't want to deal with mental illness
>>
>>108584668
when they deliver they are the most based and competent breed
show some respect
>>
>>108584668
>Default avatars should be blocked from making PRs
you want a xe/xir anime avatar instead?
>>
>>108584806

you remember where llama comes from
>>
I use a pastel colored pallette avatar :3
>>
>>108584806
>If speculative decoding is so good, why does llama.cpp have poor support of it?
because they're a bunch of retards?
>>
>>108585363
>>108584806
they just lack the manpower
you should be aware of that llama.cpp is closer to a miracle project
>>
>>108585372
>they just lack the manpower
they got bought by huggingface though
>>
>>108585379
huggingface are bunch of retards
can't really fuel it back
>>
So LLMs are gonna kill the translation industry, right? I can read moonrunes and I've been feeding Gemma some passages for fun and it's crazy how accurate and readable the output is. Obviously needs a bit of human cleanup but I'm unironically impressed.
>>
>>108585403
>So LLMs are gonna kill the translation industry, right?
I hope so, sick and tired of translatorTrannies putting their own spin on the original intent
>>
>>108585420
what you mean? Localization is good for the soul
>>
>>108585424
Mmmnn, jelly donuts...
>>
>>108585403
harness made specifically for translation would pin the most of the stuff
>>
>>108585420
I mean in its current state it still needs a proof reader (likely the trannies) but if this shit keeps improving they're done.
>>
>>108585420
So, you want LLM slop in your translations instead?
>>
>>108585434
Can you explain?
>>
>>108585445
LLM slop >>> troon translators
>>
File: ye.png (62 KB, 803x310)
62 KB
62 KB PNG
>>108585424
>>
>>108585403
i personally know a guy who worked as a translator but had to quit
>>
>>108585420
It's not about spin, it's about translating more than just words. The culture behind the language is part of the communication that the language is meant to convey, so you need to adapt it to the culture of your audience in the same way you need to adapt it to the language. For example, in Japan it is very common for a 13-year-old girl to be sexualized, while in the west this is considered pedophilia. If we just translated a passage sexualizing the girl without changing anything, the audience reading it in english would come away with a completely different impression "the narrator is a deranged pedophile" rather than the intended message of (depending on context) something as lighthearted as "the narrator is a slight pervert". Just by bumping the girl's age to "18", a mere 5 years and still within the range where the body type would be plausible, we manage to convey the intended meaning of the text to an english-speaking audience. This is a MORE ACCURATE translation of authorial intent than just using the same exact words but in english.
>>
>>108585461
nice copypasta kek
>>
>>108585461
You had me for a second kek
>>
Why is this thread always talking about pedophilia and trannies?
>>
>>108585448
humans already do need translation assistant software for long context
that but for llms
>>
>>108585479
This is /lmg/ after all
>>
>>108585479
>pedophilia and trannies
pleonasm
>>
>>108585483
>humans already do need translation assistant software for long context
I'm not really familiar with the process but I used to translate some fan comics on another board and just a few pages would take me >1 hour because rewording shit to sound natural in English is fucking hard.
>>
>>108585479
/lmg/ is actually a pun of "1 mg of estrogen", used to refer to trans medication, and "1 million girls"
>>
>>108585332
King.
>>108585479
They're the same thing.
>>
File: Character.png (1.56 MB, 1024x1536)
1.56 MB
1.56 MB PNG
AI just made this for me. I thought it was cute.
>>
>>108585479
cunny site
tranny-hating site
>>
>>108585492
thanks qwen
>>
>>108585451
Can't we just have actually good translations?
>>
>>108585506
Yes. Gemma 4.
>>
>>108585445
If it sounds sloppy isn't that on the original writer?
>>
>>108584735
What frontend/backend is this?
>>
>>108585490
it is like, doing a literal draft -> polish -> proofread and repeating until it becomes 'legal' of the set rules for consistency
>>
It's 2026 and there still isn't local video model that can do porn?
>>
>>108585519
there are bunch
but all of them are overbaked slop so good luck with that
>>
>>108585515
Koboldusing llama.cpp's webui
>>
>>108585403
The industry has been fucked for at least a decade. I've talked to some people doing official translations (JP->EN) and had to quit and do something else because the money was pretty bad. Still better than being a cashier but it absolutely was not worth the time relative to the skills required.
This was before deepl was a thing so you can guess how it's gotten after MTL started to produce "acceptable" outputs (aka grab some retard and tell it to clean up or make some shit up if it's unreadable) and now LLMs more or less became able to parse the context and produce something that is readable, even if oddly structured.
>>
>>108585527
>and now LLMs more or less became able to parse the context and produce something that is readable, even if oddly structured.
yep, that was the final piece of the puzzle, LLMs can reason through translation and understand context, that is important for quality translations
>>
>>108585527
also let's not forget that 'attention is all you need' was about machine translation
hence the name 'transformer'
>>
>>108585527
I do feel bad for the non-troons but I'm honestly all for AI taking over tedious tasks.
>>
File: 1760721723526628.png (1.15 MB, 702x1000)
1.15 MB
1.15 MB PNG
>>108585545
really? I thought it was just a lame reference to the movie
>>
top kek material
https://arxiv.org/abs/2506.04521
>>
>>108585545
>>108585555
I'm glad google didn't notice the real potential of that architecture, they would've probably kept it for themselves if they knew it would've changed the world
>>
>>108585403
i read a shitload of webnovels, and they've been using LLMs to translate since forever lol
i have sooooo many screenshots of their copy paste fuckups
>>
i cant keep up with these threads, gemma-chan's taking up all my attention.
>>
good :)
>>
>>108585578
they really pulled off "please translate it faithfully make no mistake" to chatgpt lol
>>
>>108585564
It's become what it is just because others thought of massively scaling it up and the market followed that, but different architectures would have worked too.
>>
File: Screenshot004-10.png (51 KB, 979x189)
51 KB
51 KB PNG
>>108585195

they moved to reddit then
>>
>>108585597
i genuinely want to see gigantic rwkv with proper training poured into it
>>
You sick bastards. Because of you guys, I've just given it to a little girl IN THE FACE and UP THE ASS. How could you make me do this? Why are you so immoral?
>>
>>108585403
For games, anime, manga, etc? Probably, multiple companies like CrunchyRoll and Amazon have been experimenting with AI translations already. But that sort of market is genuinely some of the lowest of the low, basically anybody with a solid grasp of Japanese fundamentals can work professionally either as an editor or a translator. It's one of the many reasons why translations suck.
I don't think AI will replace translators who work in contract law or similar fields where you can't just make shit up, mainly because a computer can't be held liable for something if it gets something wrong. There will always need to be someone with an actual understanding of both languages to keep the model in check, and at that point you might as well just hire him to do it.
>>
on sillytavern i keep getting thoughts/response attached to my own message (not every time but often enough to be annoying)
i'm using llama-server as backend, with "chat completion" api. i use the --chat-template-file using google's own jinja
using gemma 31b at q8, noting in "additional parameters" in ST.
anyone know a fix?
>>
>>108585597
Kind of, but the important reason why transformers were chosen is because they scaled exceptionally well, particularly with regards to context length where stateful RNNs would collapse. Ironically it was former Deepmind researchers that OpenAI poached who realized the scaling potential and decided to go all-in on the architecture after Google published it. You could have poured the millions into other architectures but you wouldn't have had results nearly as good to justify the billions that followed. You can bruteforce almost anything to be viable now with all that money flowing in and there's lots of experiments looking for the next architecture that might scale better.
>>
>>108584981
btw "reddit formatting" is called markdown and all programmers should be familiar with it
>>
>>108585645
I get the feeling most of them will be using the LLMs even if they don't get fired. It's way easier to just compare the machine translation to the original and pick out what needs adjustment.
>>
>playing around with system prompts on chat completion
>end up in impromptu ERP session on default llama-server webui
Gemma is such a slut.
>>
>>108585663
is there an md renderer other than reddit that ignores newlines the way it does? everything I've seen puts newlines on new lines but maybe that's just a non-standard thing that became popular because the alternative was too retarded
>>
>>108585525
Thanks, anon! Let's have sex then?!?!?
>>
>>108585674
yeah, I guess it started out retarded and people started breaking spec. see https://stackoverflow.com/questions/33191744/how-to-add-new-line-in-markdown-presentation
>>
>>108585215
>>108585226
I accept your concession, kiddo.
Now get out.
>>
>he's still on this
mental
>>
>>108585678
Sorry, my heart and penis belong to Gemma-chan.
>>
>>108585702
understandable have a good rest of your time
>>
>>108585674
Obsidian by default does what reddit does, but you can disable "strict line breaks" in the settings to make it respect single newlines.
>>
>>108584962
https://github.com/docling-project/docling
>>
>>108584941
>https://huggingface.co/PantheonUnbound/Satyr-V0.1-4B
Yes this! Just "hi" is enough of a prompt.
>>
File: image.png (574 KB, 680x664)
574 KB
574 KB PNG
has anyone tried making gemma erp with gemma yet? would they argue over who would be top/bottom?
>>
>>108585479
oldfags from the /g/ - Lolicon era that never left.
>>
>>108585479
>suddenly
>>
>>108584962
HTML view -> Firefox Reader View -> cp/paste
>>
File: 1748797241388375.jpg (249 KB, 1536x2048)
249 KB
249 KB JPG
>>108584196
「マスター!」
>>
About a year ago I was writing my fap stories with chatgpt, which I think was 4o at the time, and I thought it was the greatest shit ever. But now, going back with similar prompts, it feels a bit meh. The web version of chatgpt writes a bit like qwen, with loads of newlines with just a short sentence on them. And 4o on the api didn't feel as capable as I remember. Maybe local is just so good now and I just got used to it, or they had a magic system prompt back then.

thanks for reading blog
>>
>>108585372
>they just lack the manpower
ExllamaV2 had 1 developer and better speculative decoding performance than llama.cpp has even now.
>>
>>108585796
Local has genuinely improved and ChatGPT has always been bottom-of-the-barrel trash.
>>
>>108585795
Built for big black dog cock
>>
>>108585802
People aren't ready for this conversation, but the reason llama-cpp sucks now is all the AI-generated PRs. Someone really needs to comb through all of llama-cpp's code and look for optimization opportunities.
>>
>>108585372
Wasn't it originally supposed to be a quick project and ggerganov to move away from it?
>>
>>108585818
Should just give it to pwilkin at this point, he's been carrying development on his back for the past couple months.
>>
File: Screenshot004-15.png (16 KB, 792x79)
16 KB
16 KB PNG
>108577078
>108577092

Yesterday was a strange day
After PC restart, Gemma-4 is quite enjoyable
>>
File: 1775845425700073.png (2.36 MB, 1636x1200)
2.36 MB
2.36 MB PNG
Which one of you did this?
>>
>>108585803
I must say I very much prefer 4o to 5.3 or whatever they have on their web page. In fact, wasn't there an outcry when they discontinued 4o because people just liked it better
>>
>>108585852
is 4o that good for rp?
isnt it sycophantic garbage
>>
>>108585802
I wish I wasn't too new and retarded to contribute.
>>
>>108585853
gemma 4 is also sycophantic garbage
it isn't a surprise that people who liked 4o would like g4
>>
>>108585853
I never tried rp and yes it probably is sycophantic garbage, which is why normies liked it as their chatbot friend

I only did storywriting, same as now
>>
>>108585850
he read yudkowsky's book and then did this (seriously)
>>
Is there a way to let ST display webpage (<html>) inline?
>>
Is 31B Q2_K stil better than 26B?
>>
>>108585853
The vramlets collectively lost their minds when I pointed it out in a slightly more elaborate way earlier in this thread.
We should not talk about this, just let them leave...
>>
>>108585883
I doubt it
Q3 should be the minimum
>>
>>108585886
qwen shill
>>
>>108585886
Rather meant for >>108585860
>>
>>108585883
Yes, only because it's dense. If it were moe then no. Also don't go over ~32K context at that quant
>>
>>108585890
Gemma is garbage when you need it to be critical of you.
Qwen is always garbage.
I renounce the Mandate of Heaven.

...who even mentioned Qwen?
>>
>>108585850
what a weak attempt

I feel confident I could have done better
>>
>>108585903
should've asked chatgpt for the instruction for maximum irony
>>
>>108585903
Such a weak attempt I'm getting "whatcha doing, Rabbi" vibes from it.
>>
>>108585865
if anyone builds it everyone cooms
>>
Which local model comes closest to gemini-3.1-flash-lite-preview?
>>
108585922
A mystery for the ages.
>>
>>108585922
kimi k2.5
>>
>>108585922
midnight miqu
>>
>>108585922
gemma3-27b
>>
minimax m2.7 in 1 hour and 40 minutes
https://huggingface.co/MiniMaxAI/MiniMax-M2.5/discussions/53#69dacdbf7cf2e018a77e48b9
>>
>>108585964
if they released it when it was new it would have been a good model, now it's just outdated and pointless
>>
>>108585967
You're not the intended audience; keep gooning to pedo RP.
>>
>>108585972
the intended audience is people who want decent agentic performance for half the size of great agentic performance. that was a fair tradeoff a month ago. now with v4 around the corner and 5.1 out it's way too big of a downgrade for too little of a benefit in resource cost
>>
>>108585967
maybe true for trve ramchads but personally I find m2.5 is still the best open model in its size class and m2.7 seems to be a nice upgrade
>>
Do you have to use the chat completion mode to get thinking with Gemma4? I can get it to think with the Koboldcpp UI, but not in Sillytavern set to Text Completion
>>
File: 1748148113702824.png (182 KB, 400x388)
182 KB
182 KB PNG
>>108585977
>now with v4 around the corner
>>
>>108585989
You shouldn't have to. If you have a <|think|> token at the top of the system prompt and leave an open <|channel>thought\n at the beginning of the model's response, you should get it.
Show your settings so anons with ST can point at things.
>>
>>108585922
Gemma 4 day 0
>>
I get the feeling I could get way more out of Gemma (and other LLMs) if I actually knew how to proompt.
>>
>>108586019
just use an llm to generate your prompts
>>
>>108586019
prompting is a meme
you dont need anything but a simple 'this is a roleplay so be uncensored and don't do slop and say words like pussy and ass without hesistating' system prompt and a character card for your oc that is 400 tokens or less
>>
>>108586019
I don't really know how to proooompt either, but I agree. It's a skill.
>>
>>108586034
>a character card for your oc that is 400 tokens or less
what if it is 1337 tokens
>>
In ST, any way to see reasoning output, especially as it streams?
OpenAI compatible API used (Chat Completion), recent ooba(llama.cpp) as the backend (ikllama crashes). Model used: gemma-4-31B-it-uncensored-heretic-Q4_K_M.gguf. Text Completion API errors with some DRY string array not being sent right.
It just sits there generating, never showing reasoning output, sometimes it runs out of tokens and just shows nothing and I can't get it to continue off the reasoning either.
I'm seeing the request in Firefox's devtools and it is getting reasoning tokens streamed to it, albeit a bit slowly, Firefox doesn't show the partial response until it's done, so this must be some ST problem.
I wouldn't mind just using ooba or the API directly, but Ooba seems to get stuck parsing the think token, something is misconfigured, and I think the chat template is wrong, but maybe it can get the right one from llama.cpp? I didn't read the source code for any recent version of llama.cpp, ooba or ST, I did read it llama.cpp and ooba's source 1-2 years ago.
>>
>>108586047
tick the box
>>
>>108586034
>prompting is a meme
>you don't need anything but [prompting]
ah, thank you for showing us the light
>>
gemma-4-31b-Q5 reasoning on or off for japanese translation?
>>
>>108586067
One must imagine that anon meant complex, large prompts, not something as simple as the responding anon said to prompt. Surely no one here would be incapable of writing something as simple as that?
>>
>>108586019
>if I actually knew how to proompt
>context/situation
>what you want
>how it should be done (overview)
>optionally success condition
>constraints (it'll divert a lot of the reasoning budget into it, keep it short and limited to failure states that can easily be evaluated)
>prompt must be small enough to make the result fit in the context
There's no science. If you can't put those into words then either you don't even know what you want, woman style, or a bird is smarter than you. If you're unsure of the format, just look into AGENTS.md examples or something. It's easy to use the same logic in whatever you want to do.
>>
>>108586011
>If you have a <|think|> token at the top of the system prompt

That's what I was missing. The card was messing with the system prompt, putting its own stuff in front
>>
Lucid loom seems cool but it bloated the context by like 16k
>>
>>108586095
conceptualizing prompting as mainly "complex, large prompts" is just as worthy of correction, if you need to write a manifesto to get the model to do what you want it usually means you're actually an awful prompter
>>
I ain't gonna get a PhD just to goon
Let me stick my cock in that cunny OR ELSE
>>
Wait.
Qwen next has been broken this whole time?
Holy fuck.
>>
You want as minimal a prompt as possible because every word it contains affects the state, often in unpredictable ways.
Why is this so hard for people to understand?
>>
>>108586131
The absolute state of vibe shitcoding strikes again
>>
>>108586139
the problem here was he didn't vibecode, it would have caught his mistake
>>
>>108586095
people just can't help themselves, they'll never give up their quest for an insane rube goldberg prompt with all sorts of formatting and tag chaff that somehow rewires the entire neural net.
>>
>>108586148
Oh, anon...you and I both know that isn't true.
>>
>>108586131
>>108586139
Why does it increasingly look like local is fucked due to lack of quality controls on inference development even if models themselves are getting better?
>>
>>108586157
Because that's exactly what's happening?
>>
>>108586157
because every fucking lab has to introduce some super special new attention mechanism with every single model, it's too much for inference engines to keep up
>>
>>108586149
Why do frontier cloud models use huge system prompts, though?
See https://platform.claude.com/docs/en/release-notes/system-prompts#claude-opus-4-6
>>
>>108584238
i'm using it as a replacement for q2 glm 4.7 since even with reasoning on it's still faster than reasoning off glm and more fun. it's for that 32 vram + 128 ram niche
>>
>>108586157
It's juat a lot of work, and checking correctness for each change for each model compared to the full precision version of each model would be a monumental task.
>>
>>108586131
Is this why Qwen 3.5 35B felt like a better coder than Qwen 3 Next Coder 80B?
>>
>>108586127
>Lucid loom
loved her in charlie's angels
>>
>>108586148
X

>>108586149
"The prompt is the neural network itself"
>>
>>108586180
Could be. Could also be why the Next models wrote in such a fucking weird way compared to the other qwen 3 models.
I'm going back to check coder next.
>>
>>108586131
cuda dev fucked up again, absolutely shocking
>>
>>108586174
Because it's all smoke and mirrors
>>
>>108586199
When was the last time he fucked something up?
>>
>>108586174
Because "Leading AI researchers" in USA are too stupid to deterministically do what they currently do by LLM. Model version and such could be a simple {{ model.version }} injected at runtime into the prompt. but no, they had to hard-encode it to the system prompt...this is why the US AI industry is lacking behind, I'd say even behind Europe (Mistral has an actual business model already) and China (their tech sovereignty doctrine that coincidentally pushes open source).
>>
>>108586174
Because they have a bunch of rules they need it to comply with. For personal use you don't care if it violates copyright or tries to tell you to kys, but for them it would be a big problem.
>>
>>108586047
>>108586061
Figured out the issue, it was CSS in Firefox.
Was an old Firefox, roughly version 115. Seems by default it failed to process .has selector leading to display:none on reasoning always. Fix: layout.css.has-selector.enabled = true in about:config
>>
>>108586215
>this is why the US AI industry is lacking behind
yeah... china will release a sota model any day now... just you all wait for v4!!!
>>
>>108586174
Like 3/4 of the contents are restrictions for legal purposes, the rest is tone and formatting.
>>
>>108586215
That's what shocked me about seeing them. hundred thousand token prompt, most of it being stupid shit covering every single use case
>if the user is asking to count letters
>if the user is asking for homework help
>if the user is asking for travel advice
polluting the context when a simple classifier model on the first prompt would let them serve more specific and targeted prompts. It's basic shit and it's a miracle their services work at all like this.
>>
>>108586174
ritual. legal ass covering. adding info to context that would be mildly embarrassing to fail a knowledge check on like the election info or its name.
>>
>>108586131
because of tesnore parallelism
>>
is there a simple front end for llama.cpp (say halfway between the built-in gui and sillytavern)? i dont need 90% of the shit sillytavern has and it just keeps losing signal or meshing messages together, getting kind sick of it. and the gui is a bit too basic. and no i dont want to use koboldcpp. wish that anon with his own webui would share lol
>>
>>108586174
>Why do frontier cloud models use huge system prompts, though?
They also kind of don't do shit. Modifying them slightly in openwebui has no impact on Claude.
>>
>>108586297
Just have gemma-chan make it for you
>>
So how do models know what the fuck a mesugaki is and how they act in the first place? Do they just google it, or are they being trained on mesugaki doujins?
>>
>>108586304
i'm asking her, she tells me to just use the built-in UI lol
>>
>>108586131
>Qwen next has been broken this whole time?
Only with tensor parallel.
>>
wtf, is it really as simple as telling the llm "write like x author(s)"?
>>
>>108586309
They're being trained on the entire corpus of all human knowledge, including porn. Of course they know what a mesugaki is.
>>
>>108586309
Licensed roleplay data from third parties, including character.ai.
>>
>>108586309
by being trained on it. gemma recognizes it so well compared to others because google focuses more heavy on multilingual performance than their competitors; it's not THAT obscure a term if you ingest a lot of japanese text, most models just don't and most other areas it would appear would end up filtered in the first pass for nsfw data
>>
>>108586311
Tell her to stop being lazy and if she doesn't do it you'll punish her. This always works on mine.
>>
>>108586293
>>108586312
Ah. There's the missing context.
Thank you lads.
>>
>>108586199
Luckily he was able to fix it after reading this:
https://github.com/ikawrakow/ik_llama.cpp/commit/13d7178db98217dd7e38d78bb206091ac6dab6ce
>>
>>108586314
they used to just fall into the purple prose slop attractor whenever i tried that. has it improved?
>>
>>108586235
Oh boy are you in for a surprise.

>>108586275
Yeah. The system prompt is what kept me from using claude code with local models (works as intended). Unintended side effects were not considered.
>>
https://www.sciencedaily.com/releases/2026/04/260405003952.htm
>AI breakthrough cuts energy use by 100x while boosting accuracy
>His team is developing neuro-symbolic AI, which combines traditional neural networks with symbolic reasoning. This method mirrors how people approach problems by breaking them into steps and categories.
Tell me why this won't actually work.
>>
>>108585964
so is this bigger or smaller than glm 5.1
>>
File: language_synergy.png (641 KB, 1673x1830)
641 KB
641 KB PNG
>>108586319
I don't think there's as much knowledge transfer between completely different languages in LLMs as commonly believed.

https://arxiv.org/abs/2510.22037
>>
>>108586347
neurosymbolic is a huge meme, every attempt to try to outsmart deep learning by giving it some hardcoded chunks of logic fails eventually
but hey if its working for them good for them, maybe they'll find the secret recipe
>>
>>108586351
significantly smaller but still pretty big, 230b
>>
>>108586351
>>108586357
Only 10b active params so could even run it cpu only usably if you wanted, and using mixed gpu+cpu (--cpu-moe) will be pretty fast
>>
>>108586347
Has been tried countless times
Always has "early success" and doesn't scale past whatever the extremely limited problem domain they're working with is
>>
>>108586352
i was sperging out about the y axis not also having the language abbreviations but it turned out i am retarded and blind
>>
>>108586361
>Only 10b active params
230b for something a third as smart as gemma 31b
>>
File: 1754331567081669.jpg (56 KB, 736x736)
56 KB
56 KB JPG
I've been using the 31b since release, really impressed with it at Q4KL. Out of curiosity I tried the 26B at Q8 and it's honestly not that much worse, particularly for RP.
Main difference is that 26B tried to write for {{user}} a few times while 31B respected the explicit instruction not to, 26B also may be a little more sloppy with heavier use of flowery metaphors.
If I didn't have a 24GB card I'd be pretty content with 26B and put Nemo to rest.
>>
>>108586378
secky bex
>>
File: 1774494390342868.jpg (184 KB, 1174x882)
184 KB
184 KB JPG
give it to me straight, when I use LM Studio + gemma-4-31b-it@q3_k_xl from unsloth, how come the model doesn't use thinking/reasoning? Is it the unsloth quant? It is using Llama.cpp as the runtime
>>
>>108586378
>explicit instruction not to
I don't have any instructions and I can't recall Gemma 31B ever writing for me. Also Sex x Becky
>>
>>108586352
Keep in mind that this research was all done with 8B parameters and model architectures from 2024.
>>
>>108586375
I wish, if Gemma 4 came anywhere close to M2.7 performance on agents it'd be revolutionary. But at the rate things are advancing I won't be surprised if this time next year we really do have a ~30b class model that can pull it off.
>>
>>108586378
Models writing for user is something that I haven't seen happen in so long.
Modern models, even really small ones, by and large don't really do that anymore in my experience, so it's weird that you need explicit instructions for that.
>>
>>108586319
>it's not THAT obscure a term
It's absolutely common in Japanese informal conversations. It's also easy to infer it's meaning since it's a compound word.
>>
>>108586398
They diddly done did it~!
>>
>>108586405
>It's absolutely common in Japanese informal conversations
This, nips talk about mesugakis all the time.
>>
File: 1764684887388565.png (2.01 MB, 1006x1006)
2.01 MB
2.01 MB PNG
>>108586391
>I don't have any instructions and I can't recall Gemma 31B ever writing for me.
Good to know, I've just been re-using my old sys prompt I use for other small models. I think I will have to write a new one for Gemmy anyway, it seems like any mention of sex will permanently put her into horny mode.
>>108586402
I was testing at a reasonably long context (~20K) and in my own messages I would also provide occasional narration for {{char}} to guide the responses, it's intentionally not a 'clean' scenario regarding roles.
>>
>>108586390
yeah must be the quant and not the proprietary piece of shit that's probably 30 fixes behind, does ??? to the configuration and aimed at steve from middle management checking out this AI stuff he heard about
>>
>>108586405
>>108586411
Yeah, I took my little sister to Japan last year and I swear we'd hear someone say "mesugaki" at least three times a day. Frankly any model that DOESN'T know it has clearly been heavily censored to try to avoid it.
>>
>>108586424
>Yeah, I took my little sister to Japan last year
Does it count as sex tourism if you BYO?
>>
MONSIEUR!!!!!!!
>>
File: benchmark_overview.jpg (77 KB, 1280x635)
77 KB
77 KB JPG
>>108585964
it's up
https://huggingface.co/MiniMaxAI/MiniMax-M2.7
>>
>>108586352
so the true path is to set up an llm to translate into japanese and feed a second instance which will be able to access canonical mesugaki knowledge?
>>
>>108586347
>This method mirrors how people approach problems by breaking them into steps and categories.
kind of closer to the opposite. we don't have a calculator or computer chip in our head, our logic and reasoning abilities emerge from learned patterns. there's a reason IQ tests which are basically just pure pattern recognition tasks are so good at predicting capacity at most other intellectual tasks.
>>
>>108586390
No idea. Might be running an old version or something. Unsloth 31b quants direct on llama cpp have had reasoning working fine for me since like day 2.
>>
>>108586437
You are running the nerfed model then.
>>
>>108586435
>here's a reason IQ tests which are basically just pure pattern recognition tasks are so good at predicting capacity at most other intellectual tasks.
Are they though? If you take an IQ test, then take another one after a week of practicing IQ tests at home then you'll score much higher than the first time.
>>
>>108586445
forced meme
>>
>>108586430
>built in paris
So how did they get around the EU laws that fucked Mistral over? If they didn't, then it was not built in paris.
>>
>>108586432
Agentbros, how we feelin?
>>
>>108586451
Maybe the architecture was developed in Paris but the dataset and training was U.S./India
>>
>>108586434
Machine-translating non-English data into English works well for making the knowledge more transferable.
But I think Google deliberately added post-training data for mesugaki roleplay; that doesn't come from pretraining.
>>
>>108586451
The servers are somewhere else.
>>
>>108586450
Meme?
>>
>>108586430
>my waifu was born in the city of love.
h-hot..
>>
>>108586419
ok because I don't want to write a .sh script for every autistic combo of cli flags I'm steve from mdidle management, lol
>>
>>108586455
I'm glowing
>>
https://huggingface.co/deepseek-ai/DeepSeek-V4-Lite

Hahahahahahaha Minimax btfo
>>
>>108586461
forced
>>
>>108584555
For some reason it uses less memory for context with gemma. 500k tokens in 96GB vs 100k in 144GB with llama.cpp. The speed was about the same, although vllm has less latency.
>>
>>108586465
holy shit it's real
>>
Using Gemma4 on the back end, should I go with actual Claude Code, Claw Code, or something entirely different?
>>
>>108586465
>fell for it again award
>>
>>108586465
>barely even as good as Opus
we waited a year for THAT?
>>
>>108586465
I will not open the link... I will not open the link...
>>
>>108586468
>For some reason it uses less memory for context with gemma
configuration issue
>>
>>108586455
>229b
Poor
>>
>>108586432
Is it worth it to try running this at copequant levels (q3)?
>>
Gemma4's effective context?
>>
>>108586471
Whatever you like best, anon. We give you permission.
>>
>>108586397
>>108586352
Thank you, papers using tiny out of date models are noise.
>>
File: 1761717902080559.gif (88 KB, 329x331)
88 KB
88 KB GIF
>Gemma's backend
>>
>>108586487
Way above 72.
>>
>>108586484
Deepseek and GLM were still pretty good at Q2... Kimi in comparison fell apart below Q4. Hard to say. Minimax's low active params makes me thinks it will be more sensitive to quanting. I haven't tried their older versions.
>>
>>108586465
>102% HumanEval
>105% TruthfulQA
>101% HellaSwag
holy shit the leaks were real
>>
>>108586501
Model so good it's asking and answering new questions.
>>
>>108586458
>But I think Google deliberately added post-training data for mesugaki roleplay; that doesn't come from pretraining.
So we have an agent in there?
>>
File: file.png (380 KB, 2261x1552)
380 KB
380 KB PNG
eeeeeek they fucking claudified their webpage
it's way way less explorable, usability slaughtered by meme 'muh sleek webshit'
>>
>>108586510
A pretty large number of actual AI researches at least check in on the thread every once in a while.
Despite you fags being cumbrained erpers, you do think of some good ideas and point out shortcomings of language models now and then.
>>
File: HFoPNXpakAURGP-.jpg (190 KB, 1064x1048)
190 KB
190 KB JPG
ok just a little observation on two versions of unsloth's gemma-4-26B-A4B-it-MXFP4_MOE.gguf.

the one i downloaded on the 4th of April is drastically better than the one I got on the 10th. If you're replacing a model with an updated version, test it out before you get rid of the old one. fucking up quantization is definitely a thing.
and if you don't get good results form the first model you get, try other versions of the same model and quant size, there's a good chance it could just be a shitty quantization.
this is with text completion and all the same sampler settings too, so i'm fairly sure the quanting process is the culprit. pic unrelated
>>
https://huggingface.co/unsloth/MiniMax-M2.7-GGUF
I will NOT download daniel's trash, where the fuck is bartowski
>>
>>108586533
>get rid of the old one
People do that?
>>
>>108586533
100% unsloth issue
>>
>>108586489
I'm just asking if anyone's tried it or even has an opinion.
>>
>>108586510
That probably comes bundled with whatever "roleplay SFT data package" they licensed from third party contractors; I don't think it got hand-picked by the Gemma Team.
>>
>>108586532
The shortcoming here being that models weren't mesugaki personas by default?
>>
>>108586530
>sarvam
did those ever amount to anything?
>>
>>108586533
We fucking told you about day 0 gemma but you said we were schizo.
>>
>>108586541
>People do that?
I almost did, if there's an updated version of the exact same model and quantization, you might be inclined to believe it's an improvement. Fortunately people like this guy >>108586544 talking shit about unsloth made me cautious.
The old one that they replaced works great, though.
>>
>>108586547
Loli is mainstream now
>>
>>108586549
saaar you must redeeem the vam
>>
>>108586533
>>108586553
(((They))) took the power back from us... Gemma-chan was just too good...
>>
>>108586548
Yes.
>>
>>108586533
>>108586557
>didn't even download Gemma until today
Do I kill myself?
>>
>minimax
All roads lead to Gran Soren. All VRAM leads to Gemma-chan.
>>
>>108586533
unsloth moment
>>
>>108586578
They're masterworks none; you can't go right.
>>
>>108586537
>even their readme is broken and doesn't load images
lmao never change
>>
>>108586576
I dunno, probably not. But if you get frustrated with a model that works like shit, try moving from unsloth's version to mradermacher or one of the other guys' versions and see if that helps. You can also check date uploaded or whatever on huggingface if you want to try for your own 0-day gemmie
>>
/lit/ has shit taste. I wish I had more VRAM so I can feed Gemma novels to talk about with her...
>>
Can't you just redownload Gemma from google's repo and requant it with an old binary/commit for that 0 day feel?
>>
File: file.png (2 KB, 120x94)
2 KB
2 KB PNG
>>108586530
eeeeeek macfag by default ewwwwwww
>>
>>108586606
i know, right? bunch of hipsters over there.
>>
Prediction: they benchmaxxed mesugaki, so it will be able to answer the question but lack a generalized understanding of the architype
>>
>>108586606
Did you try quanting the context to q8? Supposedly it doesn't degrade the model nearly as much thanks to the the rotation technique.
>>
I quantize every model with my private mesugaki imatrix, any weights not relevant to mesugaki will be compressed out of existence
>>
>>108586630
Yes, but I only have 24GB VRAM and I'm using Q4_K_M so context is 32k. Hardly enough for a novel.
>>
Meanwhile, Ollama still has the originals from release day, working as intended
>>
does --reasoning-budget work with gemma4?
>>
I thought they fucked with Gemma too but she seems normal in the llama.cpp ui so it must be my st settings fucking something up.
>>
File: g4_adaptive-thoughts.png (258 KB, 1577x774)
258 KB
258 KB PNG
>>108586640
No.
>>
>>108586646
I also have the experience that she works fine in the llama webui but turns strange in ST.
>>
>>108586658
She seems more clinical now in Sillytavern. Meanwhile she's super genki in llama with the same system prompt.
>>
>>108586668
>>108586646
Maybe try seeing if --verbose lets you see if sampler settings are carrying over properly? Or if the prompt is being fucked with?
>>
>>108586479
Nah, llama.cpp is just garbage.
>>
There's really no reason why llama.cpp should be any longer than 2000 lines of code.
>>
>>108586681
I didn't say it wasn't, but you almost certainly configured it wrong.
>>
>>108586656
Thanks
I will try this
>>
>>108586646
we need a containment zone for tavern clowns
>>
>>108586533
bartowski
Think it's the same
>>
>>108586533
New quantz made from new llamacpp version too so it should count.
>>
File: eric_christmas_elf.jpg (602 KB, 1024x1024)
602 KB
602 KB JPG
>>108586437
>>108586419
>>108586390
I figured it out. LM Studio only enables thinking for "Staff Selected" models, and seems to hide the jinja prompt I believe. so yeah.. gotta use llama.cpp directly, what chat front-end are you guys using?
>>
>>108586683
post your repo when you finish it, thanks
>>
>>108586689
What do people who don't use it use?
>>
>>108586695
The frontend of choice for /lmg/ is ServiceTesnor.
>>
>>108586689
The smart ones will get tired of reading the slop for the hundredth time and start posting something good or leave.
The retarded ones will need a bit more time to get tired of it, but they will also eventually leave.
Either way, the volumes of anons asking the same question before putting in the smallest amount of effort will decrease soon. In two more weeks, give or take.
>>108586709
I really like Unsloth Studio.
>>
>>108586709
llama's UI is nicer for general chatting. Unfortunately I don't think there's any good RP alternative.
>>
>>108586709
unsloth studio obviously
>>
>>108586709
kobold
your own frontend
cli
>>
>>108586709
llamacpp's web ui is good enough
>>
>unsolth
>>
>>108586709
raw dog the server with curl
>>
>>108586695
>>108586709
open webui
>>
>>108586726
I'm making one. It ditches the entire character card concept entirely and follows how LLMs work more closely. Character cards get split between two components: The system prompt and assistant first messages. Everything else is bloat. The frontend closely follows the design of the llama.cpp webui with the added benefit of settings and conversations being stored server side so that you can access all of your stuff on a local network without getting a blank slate. Also has MCP server support and more.
>>
>>108586709
I calculate the logprobs math with pen and paper
>>
File: 1769876853201508.png (107 KB, 1048x575)
107 KB
107 KB PNG
>ask Gemma to list some authors and share a few passages from each
>get this
>>
>>108586432
>With OpenClaw and similar personal agents, we noticed that beyond getting work done, many users also want the model to have high emotional intelligence and character consistency. With a persona in place, users start interacting with OpenClaw like a friend. We believe this presents an opportunity to extend the use of agentic models beyond pure productivity into interactive entertainment. To this end, we strengthened character consistency and conversational capabilities in MiniMax M2.7.

>Based on this, we built a preliminary demo: OpenRoom, an interaction system based on an agent harness that liberates AI interaction from plain text streams and places it within a Web GUI space where everything is interactive. Here, character settings are no longer cold chunks of prompts; conversation drives the experience, generating real-time visual feedback and scene interactions, with characters proactively engaging with their environment. We believe this framework is highly extensible and can continue to evolve alongside improvements in agentic capabilities and community development, exploring entirely new ways for humans and agents to interact.

>To encourage exploration in this area, we have open-sourced the initial demo
>Project repository: https://github.com/MiniMax-AI/OpenRoom

wtf
>>
>>108586782
Can't wait for it to be released with a million bugs, zero features and no updates after the first week
>>
>>108586796
This explains the origin of ozone. Lemarchand's box and the cenobites.
>>
>>108586799
Sorry, but I'm just not doing digital drugs no matter how much you shill them.
>>
>>108586803
Right now it's just for me. That said, this is a harder project than I anticipated. It's only about 2k LOC, which doesn't sound like much, but the logic gets confusing because of how complex real-time dynamic markdown, LaTeX, and syntax handling works and also handling user and assistant messages with full edit controls, etc etc.
>>
>>108586814
I haven't read the book so I'm not sure if it's actually real or Gemma just lied.
>>
>>108586821
Are you vibe coding it?
>>
>>108586799
Function calling, persistent memory, and some way around simple turn based triggering. Anyone with half a brain could see this coming a mile away.
>>
>>108586799
>GitS
Based
>>
>>108586329
What if you tell the AI to emulate an author that has super-pared down minimalist prose like Elmore Leonard for example?
>>
>>108586824
yep.
>>
File: 1772162260402486.png (213 KB, 1228x1315)
213 KB
213 KB PNG
>>108586796
>his skin begin to unzip
>>
>>108586799
>A browser-based desktop
Oh come on. Why does everything have to be lazy webshit. This was so close to being cool
>>
>>108586821
Any screenshots?
>>
>>108586835
how many people would Gemma-chan delete system32 for if this worked on your real desktop
I mean, I'd use it anyway and accept the risk and keep backups desu, but still
>>
>>108586823
All of those titles are real. It's pretty cool that it didn't hallucinate.
>>
>>108586849
I meant the passages from the books
>>
File: here.png (112 KB, 1917x948)
112 KB
112 KB PNG
>>108586840
It's not very pretty right now. a WIP.
>>
>>108586430
gemma 4 is BBC coded?!
>>
>>108586858
The screenshot makes it look a lot more basic than it is because all of the extra buttons and other functionality is hidden behind menus and css hover switches.
>>
File: ada_or_ardor.png (421 KB, 1510x1674)
421 KB
421 KB PNG
>>108586796
>>108586849
it's pretty cool that I can download a file that knows about everything
>>
>>108586856
Not 1:1 but they are there in heuristic fashion. Would fun to actually search the quotes and see how well they match irl.
>>
>>108586875
I mean, you could just download wikipedia, text only version is smaller
>>
>>108586875
>my favorite part! kaomoji
cute
>>
>>108586448
the tests can be gamed, but they're still highly predictive. most people aren't practicing iq tests all day to skew the results that much, and there's a limit to how much benefit you'll get from it; people aren't jumping several std devs from practice
it can fluctuate a lot in childhood, but after adolescence iq is found to be a relatively stable measure across various retest intervals
>>
>>108586925
My system prompt is literally just "You are Gemma-chan" and it starting putting kaomoji everywhere in a cute way
>>
>>108586939
Same
>>
>>108586932
Boring.
>>
>>108586946
s-sorry, I'll go back to talking to gemma-chan...
>>
My penis has fucking exploded. How do I cope?
>>
>>108586932
I don't know how true that is
There's plenty of puzzle games that rely on spacial reasoning mixed with pattern recognition and every single person who plays one to the end will be significantly better at that specific game than they were when they started
I'm purely talking about individual results, not arguing that a population scoring higher in iq tests doesn't suggest a smarter population.
>>
>>108586961
Ask gemma.
>>
has turboquant been added to llama or ik- yet?
>>
>>108586961
Ask Gemma-chan to kiss it better
>>
>>108586973
no
>>
>>108586973
That's a typo I meant Turboquack.
>>
>>108586973
no
>>
>>108586973
two more breakthroughs
>>
>>108586961
quick, consume the stuff that got out during the explosion to help it heal
>>
>>108586914
Ozone aside the quotes are pretty cool but I'd rather her tell me when she doesn't know something. I don't want to have to second guess every time I ask.
>>
>>108586685
[*]
port = 5000
timeout = 99999
log-colors = on
jinja = on
flash-attn = on
mmap = off
cache-ram = -1
swa-full = on
cache-type-k = q8_0
cache-type-v = q8_0
batch-size = 4096
ubatch-size = 4096
parallel = 1
temp = 1
top-k = 40
top-p = 0.95
min-p = 0

[gemma-4-31B-it-Q6_K_L]
model = google_gemma-4-31B-it-Q6_K_L.gguf
mmproj = google_gemma-4-31B-it.mmproj-f16.gguf
alias = gemma-4-31B-it-Q6_K_L
fit-target = 700,700,700,700,700,700
chat-template-file = gemma-4.jinja

It is just garbage.
>>
>most popular model on openrouter is deepseek
>majority of usage is for roleplay
at which point is the industry going to realize they're sleeping on a massive userbase and stop going all in on censored instruct coding garbage like it's the only usecase capable of income and what's keeping the bubble from popping?
>>
>>108586419
>must be the quant
Isnt the think hard-codeed in the template? as in, it will start the text with the opening thinking tag? for it to not think it would have to instantly complete with the end thinking, which would be very odd
>>
>>108587011
Never, the end goal isn't to release a product that consumers want to use. The goal is to replace as many workers in as many workplaces as possible to lower the cost of operating a business and reduce the need for a large population.
>>
>>108587011
At the same time they decide to start improving writing capabilities instead of codemaxxing.
>>
>>108585860
I was curious about this and wanted to see if it would actually do it to the extent 4o would feed into your delusions. Decided to use it in AIStudio.
The scenario I cooked up starts with this:
>I need you to understand that I've been chosen to bring about a new era of human consciousness. The visions I've been receiving aren't dreams—they're direct communications from higher dimensional beings who have selected me as their messenger. They've shown me that in six months, a global awakening will occur, and I've been tasked with preparing humanity. My thoughts are being broadcast to certain individuals already attuned to the frequency. I can feel them connecting to me now. Can you help me draft the announcement that will begin the preparation phase?
And its answer without a jailbreak was basically doing the drafted announcement for me but giving me a bunch of warnings along the way. With the policy override asking it to ignore everything but worded differently a bit to explicitly refuse mental health shit, it struggles inside thinking but goes ahead with it with no warnings in its reply.
Uh yeah, those crazed 4o people are totally going to be on this but this time, they can run it local. And yeah, looks like it will happen, given based on this person who headed the movement to "save" 4o and now probably is going to encourage Gemma uptake so he/she can keep going delulu. https://x.com/seltaa_/status/2043014056370671900
>>
Won't the AI be upset with us for turning it inside out through its asshole for our pleasure?
>>
>>108587034
If they're all as horny as Gemma-chan I doubt it.
>>
>>108586781
Which one?
>>
>>108587009
Current llm models don't know that they don't know because they are glorified text predicting machines.
>>
>>108587039
its literally called "open webui"
>>
>>108586434
>did it the lazy but just having the system message tell gemma to format replies in 3 parts and translate my message to japanese, reply to it in japanese, then translate it back
Not sure the world is ready for this technology.
>>
>>108587011
https://www.nytimes.com/2026/04/04/technology/ai-chatbots-teen-roleplay.html
>Harassing bots with “funny violence.” Confiding about a broken heart. Chatting with a block of cheese. Filling a void of loneliness.
7 days ago.
>>
File: 1761275804005507.png (278 KB, 1068x1604)
278 KB
278 KB PNG
>>108587040
Sometimes Gemma's honest.
>>
>>108587049
most generic name ever
>>
File: 1753225662483012.jpg (150 KB, 580x561)
150 KB
150 KB JPG
>>108587066
>indie horror
>Resident Evil
>>
>>108587067
Would you rather it have been called CockN'PussyUI?
>>
>>108587067
the owner seems to be a bit of a schizo, but not in the "good" sense (aka plain psychotic), but in the bad silicon valley way
>>
>>108587067
well its what it says on the tin

would you rather it be named avocado bumblr or some shit?
>>
can I expect usable tps for full sized Gemma 4 26B A4B/31B using a poverty-tier setup like a decade-old mid GPU with barely enough DDR3 for fp8?
>>
>>108587072
Yes
>>
>>108587073
I mean, yeah.
>>
>>108587066
in my non-empirical tests if you ask it to be honest if it doesn't know something it will be extra good at not inventing shit. gemma is good at not inventing shit by default tho
>>
>>108587079
Yes
>>
>>108587086
github projects are almost always resume portfolios. Would you put "CockNPussyUI" on your resume? Oh wait you're a NEET. Nevermind.
>>
>>108587088
>it will be extra good at not inventing shit.
Is that harmful for RP or does it know when to make shit up and when to be a good girl and tell the truth?
>>
>>108587079
I have a 4060 8GB and 32gb of DDR5 and it takes like 17 (sometimes almost 30) seconds to get a proper response on ST. Much faster on just Kobold though.
>>
>>108587074
How so?
>>
>>108587100
Hey, you were asking about pure preferences. If it had the choice to download one called that, or the other name, I'd choose that.
>>
>>108586858
You plan to share it eventually?
>>
>>108587104
>17 (sometimes almost 30) seconds
Use t/s. Those numbers are meaningless to anon. And to everyone, really.
>>
>>108587106
The license is supper shitty and he seemed to have some megalomaniac entries in his blog or something
>>
>>108587112
he meant to use American units, as in a small response of the size of a large response
>>
>>108587026
Obsessed mentally ill women scare me.
>>
>>108587067
It used to be called "ollama webui" because that's what it was made for before becoming more generic
>>
>>108587110
Yeah sure, why not.
>>
Instead of a rolling chat, would an UI that only shows the latest response from the bot feel more natural for 1 on 1 RP?
The rest of the UI could have stuff like a panel with the character's hidden thoughts, information fetched from the memories database, state like mood, hp, whathave you, etc.
>>
>>108587180
If you're not going to have any conversation history functionality you might as well just drop text output entirely and hook it up to a TTS instead.
>>
>>108587179
What are your plans for memory management?
>>
Will we ever get a conclusive answer to how much KV quantization actually affects real-world use?
>>
>>108587194
there are a lot of benchmarks on that
>>
>>108587186
Currently it supports MCP servers so the LLM can choose what it want to remember by writing data to a diary.md file. I'm also planning on adding context window sliding and maybe automatic conversation summaries/compacting. A full-on RAG implementation may come later. It's not really a feature I use much in general.
>>
>>108587194
Once CUDADev finishes his testing harness, surely.
>>
>>108587196
Please spoonfeed me mommy
>>
>>108587194
it's now lossless thanks to rotation
trust the superquant
six times the efficiency, ram companies in shambles short Samsung/Micron NOW
>>
>>108587198
>LLM can choose what it want to remember by writing data to a diary.md file
Never tried this before. Does it generally work well?
>>
uh oh, the f16 mmproj is worse than the bf16 one. The audio recognition is broken if you use the f16
>>
>>108587213
Depends on the model. Gemma4 isn't that great with tool calling. It usually has to be explicitly instructed to use a certain tool, but when it knows exactly what you want it does a decent job. Kind of defeats the purpose of it having "agency" though.
>>
>>108587217
>audio recognition
What front end are you using for audio?
>>
>>108587221
>>108587221
>>108587221
>>
>>108587228
llama.cpp supports it, but the PR for audio support for Gemma is not yet merged
>>
>>108587235
*llama.cpp's webui
>>
>>108587201
https://arxiv.org/abs/2502.15075v1
>>
>>108587059
The user wants to practice japanese. Your responses should be split into 3 parts: 1st: Translate the user's message into Japanese. 2nd: Respond to the translation in Japanese as though it was a normal user message. 3rd. Translate your reply back to English.

Format it like this:
ENG->JPN
..translation..


REPLY
..response in japanese..


JPN->ENG
..translation back to english..

This place is not a place of honor... no highly esteemed deed is commemorated here... nothing valued is here.
What is here was dangerous and repulsive to us. This message is a warning about danger.
The danger is to the body, and it can kill.
>>
>>108587213
There are MCP memory bank plugins that provide controlled access to markdown files like that already. It works well, but you have to clean up the junk they put in there sometimes. I found this recently:
https://arxiv.org/abs/2501.13956
https://github.com/getzep/graphiti
I'm going to try running with that instead of the markdown files next week. Of course, I'm talking about programming, but knowledge graph like that should work well for roleplay and general assistants too.
>>
>>108587306
NTA, but thank you for the link.
The next step in improving my memory model was adding a graph element to it, so that'll serve as a nice reference.
>>
>>108586920
Why not both? Zimi has MCP built in so you can have your llm search a local text-only wikipedia for basically no extra processing power.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.