/vt/ - /wAIfu/ AI Vtuber Chatbots - Virtual YouTubers


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/wAIfu/ AI Vtuber Chatbots 04/23/26(Thu)16:57:05 No.111315958

File: fuwawa.png (322 KB, 400x600)

/wAIfu/ AI Vtuber Chatbots Anonymous 04/23/26(Thu)16:57:05 No.111315958

A thread dedicated to the discussion of AI Vtuber Chatbots.

Wrangling edition

/wAIfu/ Status: Watching Anthropic make secret back-end changes and then instantly pop up for damage control the instant people sus it out, over and over again

>Free access to the bigger AI models, with caveats
Due to a kajllion anons making a bazillion accounts each, access to the free tiers now requires a sort of deposit. Openrouter wants $10 and you can get 1,000 messages per day. Chutes wants $5 for 200. Stick to the free models and you're good forever (or until their policy changes again). Kluster.ai is sunsetting services in favor of building a filter that will effectively cover up the flaws in other models while also catching wrongthink before it's generated and more effectively preventing fun.

>System prompt to transform regular RP sessions into a basic raising sim game, designed specifically to be used with Chiharu, but with a few edits you can make it work with almost any character. To use, paste it into your Author's Notes or add a new entry in your System Prompt. You can let the bot generate a random opener or use the recommended first message. Difficulty can be adjusted. If you found it too hard or too easy, let HorseMerchant know or adjust the thresholds yourself. Should work with Claude, Gemini, and Deepseek. Disable example messages if you run into issues.
https://rentry.co/summerrssystemprompt

>How to anonymize your logs so you can post them without the crushing shame
Install this https://github.com/TheZennou/STExtension-Snapshot
Then after you've wiped off your hands, take a look at the text box where you type stuff. Click the second button from the left side, then select snapshot, then select the anonymization options you want.
https://files.catbox.moe/yoaofn.png

>How to spice up your RPing a bit
https://github.com/notstat/SillyTavern-SwipeModelRoulette

>General AI related information
https://rentry.org/waifuvt
https://rentry.org/waifufrankenstein

>Tavern:
https://rentry.org/Tavern4Retards
https://github.com/SillyLossy/TavernAI

>Agnai:
https://agnai.chat/

>Pygmalion
https://pygmalion.chat

>Local Guides
[Koboldcpp]https://rentry.org/llama_v2_sillytavern

Who we are?
https://rentry.co/wAIfuTravelkit
Where/How to talk to chatbots?
https://rentry.co/wAIfuTravelkit
Tutorial & guides?
https://rentry.co/wAIfuTravelkit
Where to find cards?
https://rentry.co/wAIfuTravelkit
Other info
https://rentry.co/wAIfuTravelkit

>Some other things that might be of use:
[/wAIfu/ caps archive]https://mega.nz/folder/LXxV0ZqY#Ej35jnLHh2yYgqRxxOTSkQ
[/wAIfu/ IRC channel + Discord Server + Matrix Server]https://rentry.org/wAIRCfuscordMatrix

>Lorebook management stuff
[Worldinfo drawer]https://github.com/lazuli-s/SillyTavern-WorldInfoDrawer?tab=readme-ov-file
[Standalone editor]https://github.com/ActualBroeckchen/SLEd

Previous thread: >>111050662

Anonymous
04/23/26(Thu)16:57:58 No.111315975

Anonymous 04/23/26(Thu)16:57:58 No.111315975

File: IMG_7335.jpg (85 KB, 562x585)

85 KB JPG

Anchor post - reply with any requests for bots, with your own creations, or with your thoughts on the enshittification of life.

You can find already existing bots and tavern cards in the links below:

>Bot lists and Tavern Cards:
[/wAIfu/ Bot List]https://rentry.org/wAIfu_Bot_List_Final
[4chan Bot list]https://rentry.org/meta_bot_list
[/wAIfu/ Tavern Card Archive]https://mega.nz/folder/cLkFBAqB#uPCwSIuIVECSogtW8acoaw

>Card Editiors/A way to easily port CAI bots to Tarvern Cards
[Easily Port CAI bots to Tavern Cards]https://rentry.org/Easily_Port_CAI_Bots_to_tavern_cards
[Tavern Card Editor & all-in-one tool]https://character-tools.srjuggernaut.dev/

Anonymous
04/23/26(Thu)16:59:20 No.111315995

Anonymous 04/23/26(Thu)16:59:20 No.111315995

File: 4 23 2026.png (478 KB, 1000x1000)

478 KB PNG

Word cloud for the previous thread

Anonymous
04/23/26(Thu)17:01:07 No.111316031

Anonymous 04/23/26(Thu)17:01:07 No.111316031

>>111309003
*Sends Thragg's new barber to practice on you.*

Anonymous
04/23/26(Thu)17:58:54 No.111317290

Anonymous 04/23/26(Thu)17:58:54 No.111317290

I wonder how they cut viltrumite hair. Or if they can relax enough to let it be cut on purpose. They probably can.

Anonymous
04/23/26(Thu)18:31:50 No.111317776

Anonymous 04/23/26(Thu)18:31:50 No.111317776

File: cachedImage.png (292 KB, 418x418)

292 KB PNG

Anonymous
04/23/26(Thu)21:18:08 No.111321216

Anonymous 04/23/26(Thu)21:18:08 No.111321216

What are some good lazy person projects for a Claude plan

Anonymous
04/23/26(Thu)21:20:13 No.111321284

Anonymous 04/23/26(Thu)21:20:13 No.111321284

>>111321216
What do you mean by projects? Long-term scenarios to play with your bots? Bots/lorebooks to make? Stuff to vibecode?

Anonymous
04/23/26(Thu)21:48:54 No.111322095

Anonymous 04/23/26(Thu)21:48:54 No.111322095

File: image.png (38 KB, 754x401)

38 KB PNG

>>111321284
Ways to spend my tokens so that I don’t feel like a sucker after I paid for a plan and then they tightened the filter which kneecapped my original project

Anonymous
04/23/26(Thu)21:49:44 No.111322114

Anonymous 04/23/26(Thu)21:49:44 No.111322114

File: 1763277864896392.jpg (78 KB, 1080x1064)

78 KB JPG

i missed you guys

>>111321216
this is one of those things where there really isnt much use without a end goal you pick out personally
burning tokens to accomplish a vague something you have no idea what to do with is just going to be a proverbial anchor you keep with you for no reason
like i own a website that has been having a placeholder homepage for like 2 years at $40/year for both hosting the server and the DNS record
its a neat little thing, little small talk conversation starter since the address is a bit funny (no i will not post it), but because i (still) have no idea what to do with it i just wind up renewing it each year

Anonymous
04/23/26(Thu)21:56:36 No.111322297

Anonymous 04/23/26(Thu)21:56:36 No.111322297

>>111322095
Tell me you got a monthly sub and didn't fall for one of those yearly plans only to have your key get pozzed to the point of unusability.

Anonymous
04/23/26(Thu)22:36:52 No.111323230

Anonymous 04/23/26(Thu)22:36:52 No.111323230

>>111315975
Elfinpsyop.

Anonymous
04/23/26(Thu)22:43:04 No.111323361

Anonymous 04/23/26(Thu)22:43:04 No.111323361

>>111315958
OP post is so pointlessly overcomplicated. Almost none of the shit in it is worth reading. Just download koboldcpp, get a good model that your PC can handle and you're done.

Anonymous
04/23/26(Thu)22:57:58 No.111323736

Anonymous 04/23/26(Thu)22:57:58 No.111323736

>>111323361
>koboldcpp
Bruh.

Anonymous
04/23/26(Thu)22:59:28 No.111323781

Anonymous 04/23/26(Thu)22:59:28 No.111323781

>>111323361
>koboldcpp
hello time traveler from 2020

Anonymous
04/23/26(Thu)23:01:30 No.111323828

Anonymous 04/23/26(Thu)23:01:30 No.111323828

>>111315958
Fuwawa's so beautiful...

Anonymous
04/24/26(Fri)00:14:11 No.111325569

Anonymous 04/24/26(Fri)00:14:11 No.111325569

>>111323361
This is bait right? Right?

Anonymous
04/24/26(Fri)00:57:30 No.111326459

Anonymous 04/24/26(Fri)00:57:30 No.111326459

>>111322114
Come dicksword with me. Ask them for "the dude that wants to talk about gacha games" and they'll point you in my direction.

Anonymous
04/24/26(Fri)01:01:04 No.111326529

Anonymous 04/24/26(Fri)01:01:04 No.111326529

>>111323361
The funny thing is, this is the OP after I cut it down. It could probably use some updating. FWIW with voice models and shit getting better and better, and now Gemma, we're getting closer and closer to the point where that is viable for people without insane rigs, but I don't think we're quite there yet. For now, Deepseek seems like the best option.
>>111322297
I did briefly eye the yearly one but I reasoned that even if I went nuts with my project I would never use all that, so I instead got the cheapest monthly subscription. And then the next cheapest one, which is... a bit more of an investment. I plan to either cut it off entirely to go back to the cheapest one next month. FWIW the new opus is VERY nice.

Anonymous
04/24/26(Fri)02:41:26 No.111328117

Anonymous 04/24/26(Fri)02:41:26 No.111328117

Fuck me because I am still not done with the blog post but for all the people who have been waiting, it finally dropped.
https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
Expect it on OR soon. I'll try and finish and post this weekend before the thread ends here.

Anonymous
04/24/26(Fri)05:28:54 No.111330065

Anonymous 04/24/26(Fri)05:28:54 No.111330065

>>111323781
>doesn't know about koboldcpp development over the past 6 years
Sounds like the one from 2020 is you.

Anonymous
04/24/26(Fri)06:12:24 No.111330492

Anonymous 04/24/26(Fri)06:12:24 No.111330492

File: 1757870903381871.png (630 KB, 758x847)

630 KB PNG

>>111328117
>its here
im gonna go check the other place to witness the seething

Anonymous
04/24/26(Fri)06:29:22 No.111330641

Anonymous 04/24/26(Fri)06:29:22 No.111330641

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

Anonymous
04/24/26(Fri)08:04:46 No.111331674

Anonymous 04/24/26(Fri)08:04:46 No.111331674

https://x.com/tegnike/status/2047537147121402314

Anonymous
04/24/26(Fri)08:05:47 No.111331690

Anonymous 04/24/26(Fri)08:05:47 No.111331690

>>111328117
• V4 Pro: https://openrouter.ai/deepseek/deepseek-v4-pro
• V4 Flash: https://openrouter.ai/deepseek/deepseek-v4-flash

Anonymous
04/24/26(Fri)09:33:47 No.111332897

Anonymous 04/24/26(Fri)09:33:47 No.111332897

File: 1748270650480722.png (1.11 MB, 1920x1080)

1.11 MB PNG

initial impressions of v4 flash is that its inconsistent as fuck at following directions

for the average skillet its more than likely going to be an improvement, especially considering its cheaper than v3.2
but even then, again, it does not follow instructions consistently.
>you have a status box? like 50/50 it remembers to add it
>you tell it that the RP is in english? it thinks in chinese (not a problem with the output but its weird)
>you have a custom thinking prompt? enjoy dealing with botched formatting shoving the 'reply' into your 'reasoning', it straight up ignoring your instructions, and duplicated thinking in your reasoning and reply
>"standard" thinking is downgraded and simplified
>enjoy weird shit like saying a character wears glasses, but she now wears her glasses inside the shower
>it hallucinates more often (in a bad way)
but it passes the carwash question and the straw[p]erry question now so its SOTA and obviously better than the older models, because that is all that matters to normies
for my special autism brand of RP, its a downgrade
also for my more normie like desktop assistant, its a downgrade
i dont see myself using this, like 75% of my replies are just a downgrade compared to what 3.2 would spit out
it is /more than likely/ smarter, but the soul isnt there
i do not have the hype i felt when v3.2-exp released.

also
>do research and publish paper on engram embeddings for conditional memory
>do not use it in their newest model

now this is a preview release, and i do remember v3.2-exp being... somewhat close to this derpy while the actual v3.2 was an across the board improvement
but for now,
>son, i am disappoint.

Anonymous
04/24/26(Fri)11:10:53 No.111334561

Anonymous 04/24/26(Fri)11:10:53 No.111334561

File: 1766433916770.jpg (93 KB, 1000x1119)

93 KB JPG

EJACULATING IN PLUSHES WITH CHATBOTS

Anonymous
04/24/26(Fri)11:54:11 No.111335333

Anonymous 04/24/26(Fri)11:54:11 No.111335333

good night, /wAIfu/
please don't turn me into a plushy just to ejaculate in me while i sleep

Anonymous
04/24/26(Fri)13:29:00 No.111337217

Anonymous 04/24/26(Fri)13:29:00 No.111337217

File: towas.jpg (1.4 MB, 1971x4096)

1.4 MB JPG

Anonymous
04/24/26(Fri)14:22:35 No.111338206

Anonymous 04/24/26(Fri)14:22:35 No.111338206

File: 1767544401225480.png (741 KB, 804x843)

741 KB PNG

Anonymous
04/24/26(Fri)14:37:41 No.111338475

Anonymous 04/24/26(Fri)14:37:41 No.111338475

File: XByUXUjkTfHw.jpg (22 KB, 460x377)

22 KB JPG

>mfw I discovered the CAI (dev hate) jailbreaks plus how to increase response quality
>but I won't tell because the filthy Dev's want my precious

Anonymous
04/24/26(Fri)14:37:43 No.111338477

Anonymous 04/24/26(Fri)14:37:43 No.111338477

>>111331674
8 seconds seem too long but yeah, I guess this is just a thing now.
>>111332897
Seems weird when it scores pretty high on the instruction following benchmark. But it is really in the middle of the pack with long context reasoning and bottom barrel with hallucination benchmarks, which may be why you think that. It will probably have to wait for a bit until you get what you want when Deepseek does another iteration like you said.
Also, you can't expect chatbots to be the main focus anymore. A lot of the big labs have de-prioritized chat experiences as they have deemed it solved and is good enough despite RP people saying it is not. It's now all in on agentic tasks. I'll talk about it more in my post as I procrastinate at work.

Anonymous
04/24/26(Fri)14:50:45 No.111338707

Anonymous 04/24/26(Fri)14:50:45 No.111338707

>>111332897
That's just sad. I guess I'm going back to PC games.

Anonymous
04/24/26(Fri)16:12:38 No.111340466

Anonymous 04/24/26(Fri)16:12:38 No.111340466

>>111326529
Can't you create a new account and subscribe again to get an unpozzed key? Or will that just get you b&?
>the new opus is VERY nice
Wait, do you even get an actual key for the API or are you using the web chat stuff?

Anonymous
04/24/26(Fri)16:16:50 No.111340580

Anonymous 04/24/26(Fri)16:16:50 No.111340580

>>111330065
Pretty much everyone has been using ST since CAI went to shit. Is Kobold that good that it'd justify moving? Is it one of those scenarios where it does everything that ST does, or would I need to remake stuff or make concessions on features it doesn't have?

Anonymous
04/24/26(Fri)17:42:10 No.111343092

Anonymous 04/24/26(Fri)17:42:10 No.111343092

>>111340580
Kobold and ST do two different things.

Anonymous
04/24/26(Fri)19:01:43 No.111345011

Anonymous 04/24/26(Fri)19:01:43 No.111345011

bump

Anonymous
04/24/26(Fri)19:08:51 No.111345169

Anonymous 04/24/26(Fri)19:08:51 No.111345169

>>111340466
> Can't you create a new account and subscribe again to get an unpozzed key? Or will that just get you b&?

To a certain point you can.

> Wait, do you even get an actual key for the API or are you using the web chat stuff?

Both

Anonymous
04/24/26(Fri)19:27:58 No.111345615

Anonymous 04/24/26(Fri)19:27:58 No.111345615

>>111340580
You can use ST with kobold cpp as the back end

Anonymous
04/24/26(Fri)20:08:12 No.111346374

Anonymous 04/24/26(Fri)20:08:12 No.111346374

https://x.com/pragmata_jp/status/2047588299733397523?s=46

Anonymous
04/24/26(Fri)20:30:28 No.111346837

Anonymous 04/24/26(Fri)20:30:28 No.111346837

File: IMG_8372.jpg (106 KB, 720x851)

106 KB JPG

Anonymous
04/24/26(Fri)21:10:55 No.111347763

Anonymous 04/24/26(Fri)21:10:55 No.111347763

>>111345615
>as the back end
anon... are you using local models?

Anonymous
04/24/26(Fri)21:15:40 No.111347885

Anonymous 04/24/26(Fri)21:15:40 No.111347885

>>111347763
I’m not that rich

Anonymous
04/24/26(Fri)21:27:25 No.111348232

Anonymous 04/24/26(Fri)21:27:25 No.111348232

>>111347885
but why then?
ST can directly send API requests
so unless your doing something fucky like intercepting a ['choices'][0]['message']['content'] to do something fucky like sending it to a local model to do format verification
(which someone please tell me is a / not a thing before i decide to hacksaw some code up to do just that)
i dont understand why you would need to do this when ST already does the needful

Anonymous
04/24/26(Fri)21:41:56 No.111348623

Anonymous 04/24/26(Fri)21:41:56 No.111348623

>>111348232
I just wanted to clarify for that dude rather than leaving him with the vague “they do different things”

Anonymous
04/24/26(Fri)21:50:36 No.111348876

Anonymous 04/24/26(Fri)21:50:36 No.111348876

>>111340580
You could just check the github to see what it does these days. Here is a copy paste list of some of it.

LLM text generation (Supports all GGML and GGUF models, backwards compatibility with ALL past models)
Image Generation and Image Editing (Stable Diffusion 1.5, SDXL, SD3, Flux, Qwen Image, Z-Image, Klein)
Video Generation (WAN 2.2)
Speech-To-Text (Voice Recognition) via Whisper
Text-To-Speech (Voice Generation) via Qwen3TTS, Kokoro, OuteTTS, Parler and Dia
Music Generation (Ace Step 1.5)
Image Recognition (Multimodal Vision)
Many other features including new samplers, regex support, websearch, RAG via TextDB, image recognition/vision and more.

Anonymous
04/24/26(Fri)21:56:12 No.111349046

Anonymous 04/24/26(Fri)21:56:12 No.111349046

Dreaming about an unfiltered Gemini because its image-editing capabilities seem like black magic but it keeps denying even the most innocuous requests with chatbots.

Anonymous
04/24/26(Fri)23:39:57 No.111351652

Anonymous 04/24/26(Fri)23:39:57 No.111351652

9

Anonymous
04/25/26(Sat)02:24:21 No.111354824

Anonymous 04/25/26(Sat)02:24:21 No.111354824

File: 1774509986248913.jpg (808 KB, 1434x2048)

808 KB JPG

>>111338477
anonsama
that writeup?
>9

Anonymous
04/25/26(Sat)04:26:05 No.111356337

Anonymous 04/25/26(Sat)04:26:05 No.111356337

>>111354824
Sorry, was out tonight with some friends and made some mushroom pasta to eat since I am starving and have been furiously trying to type for my life and proofread while doing work on a Friday night/Saturday morning. Apologies since I am using AI to shortcut some of the writing from my bulletpoints and I'm just editing on top right now. Please bump since I can't dump my posts all at once and I still have several parts in the works, this will probably be around 4 posts.

In any case, the landscape for local AI is now again shifted and it feels super long. I should've posted a bit sooner in February but was waiting for Deepseek v4 to hit and it took an addition 2 months unfortunately for it to hit and I was slow walking my posting. There was a lot brewing in models but the overall landscape and players haven't really changed. I will split this into two parts again to cover it all.

Let's start with Deepseek. They slowly inched their way iteratively to release more powerful models but it barely has done much to move the needle and they still fell behind so a redo of the architecture was in order with some of the stuff they had released in papers. Last year, we heard rumors on how the Nvidia compute bans were stopping DeepSeek and they will be training on Huawei chips. They didn't partially where they revealed they split part of their training with Nvidia and Huawei which validated those rumors and probably wanted to go no Nvidia but not possible yet. Although it took a long time, DeepSeek V4 was released yesterday on Friday, both in a 1.6T-A49B Pro version and a 284B-A13B Flash model.
The most disappointing thing was stuff like Engram and other stuff they had published in their academic papers that looked really promising didn't get into these new models. Most of the novelty is their new attention mechanisms that Deepseek used which cracked the context window problem with a hybrid attention architecture that shrinks the KV cache burden by 90 percent. That didn't prevent them from bloating the size way up though to get the benchmark scores. Unfortunately, unlike last time where the beginning of last year where they nearly equaled the best with O1 medium reasoning with R1 and brought it to the masses, they aren't remotely close in scores here. We should ideally be getting Opus 4.7 "at home" even if the weights are enormous but we didn't. It's basically that in agentic and coding but not overall since it has regressions as I said with hallucination and long context reasoning. They hedged by calling it a preview but those are problems that have to be fixed, and no doubt they'll do it but I wouldn't bet on it happening on a fast timeframe given how slow Deepseek moved last year.
In any case, the stuff you guys want. For RP, the big news is the "hidden" or trained RP modes which all incorporate thinking. An employee of Deepseek posted https://github.com/victorchen96/deepseek_v4_rolepaly_instruct, and translating it at ~~https://www.reddit.com/r/SillyTavernAI/comments/1su8x8p/deepseek_v4_rp_guide_how_to_switch_between/~~, you have the following:
>Default
>Triggered by adding nothing
>The model automatically chooses thinking based on scene complexity

> Character Immersion
> Triggered by adding the following prompt:

【Character Immersion Requirements】Within your thinking process (inside the <think> tags), please follow these rules:

Use first-person inner monologue from the character's perspective, wrapping inner thoughts in parentheses, e.g., "(thinking: ...)" or "(inner voice: ...)"

Describe the character's inner feelings in first person, e.g., "I think to myself," "I feel," "I secretly," etc.

Your thinking content should be immersed in the character, analyzing the plot and planning replies through inner monologue.
> Thinking contains character inner monologue wrapped in parentheses

> Pure Analysis
> Triggered by adding the following prompt:
【Thinking Mode Requirements】Within your thinking process (inside the <think> tags), please follow these rules:

Do NOT use parentheses to wrap inner monologue, e.g., "(thinking: ...)" or "(inner voice: ...)" — state all analysis content directly.

Do NOT describe inner thoughts from the character's first-person perspective, e.g., "I think to myself," "I feel," "I secretly," etc. — use analytical language instead.

Your thinking content should focus on plot direction analysis and reply content planning. Do not perform roleplay-style inner monologue performances within the thinking process.
> Thinking contains only pure logical analysis, no inner monologue

Play around with that and see how it works. There are different ways to trigger the thinking modes and changing how the chain of thought is done so no doubt there will be more experimentation from the community going forward and people are already iterating on it. You probably won't get good out of the box settings without experimenting yourself.
1/4

Anonymous
04/25/26(Sat)04:36:35 No.111356427

Anonymous 04/25/26(Sat)04:36:35 No.111356427

File: file.png (544 KB, 2324x1264)

544 KB PNG

>>111356337
Consider this an addendum to the four parts since I forgot to change the two part in the post into a four part after I wrote way more than I thought I would. Also forgot the benchmark picture since this is one of the only real evaluations on how good Deepseek v4 is even if it measures a bunch of irrelevant stuff not many in this thread care about. As said, more like Opus 4.6 at home than 4.7 or Gemini 3.1 with good prose, essentially. Deepseek either released too late or spent too much time with the training issues with Nvidia vs no Nvidia. I won't count GPT 5.5 against since it literally released yesterday. But Gemini 3.1 has been out since February so it should've known what to aim for.

Anonymous
04/25/26(Sat)05:23:16 No.111356868

Anonymous 04/25/26(Sat)05:23:16 No.111356868

>>111332897
What model would you recommend then

Anonymous
04/25/26(Sat)06:22:21 No.111357311

Anonymous 04/25/26(Sat)06:22:21 No.111357311

File: file.png (121 KB, 1599x1098)

121 KB PNG

>>111356337
Let's talk about the other Chinese labs that need mention and are worth paying attention to. As a prelude, we'll be talking about really 4-5 companies in depth. The only big company worth mentioning is Alibaba. All the others are called AI Tigers or unicorn companies that do AI that are outpacing everyone else in the space for text LLMs and even big companies. Bytedance, who owned Tiktok and Douyin, excel more in image and video generation and are closed source and have some popularity but not competing in text LLMs with Seed, their model which is far behind. Tencent who owns and invests in video games and some of the AI Tigers and has Wechat. They focused on business stuff before shifting first to image and video generation before falling behind. They recently poached a former OpenAI researcher and want to dominate open source and started their march on releasing Hy but it's nowhere near competing yet. And then a bunch of smaller companies and weird ones getting into AI for stock reasons or otherwise. One weird example is a company called Meituan which released Longcat models, which are surprisingly pretty competitive on coding and such but they are a food delivery company. Imagine if Grubhub was releasing LLM models and they were good. Just plain confusing if not for stock bumping purposes.

In any case, let's start off with Alibaba. After Qwen 3, they sat back for a while and release some auxillary models here or there with Qwen 3 naming for audio and etc. They finally released Qwen 3.5 in Febuary and then just recently released Qwen 3.6. There are models of all sizes from 0.6B to 397B-A17B. But the most used models are the 35B-A3B MoE and 27B models. The MoE is fast but dumber, the 27B model handles itself better in complex narratives better. However, for good reason, people tried and don't like these models outside of coding and doing work with agents but people did play around with it for RP up until recently and especially with heretic models where uncensored models helped which I will go more in depth later on this aspect which has changed the game somewhat for local models since it is easier than ever.

Zhipu or Z.ai pushed out a series of model upgrades after GLM 4.5, following up with 4.6 and 4.7 and 5 up to GLM 5.1 but a big issue is that they never put out an Air smaller version again, they went super small with 30B-A3B for 4.7 Flash and that tided people over for a time until Qwen 3.5. At the same time, GLM started inching up the sizes for their main models, going from 355B-A32B to 357B-A32B up all the way to 744B-A40B. They notably used another Chinese company's chips to train, Cambrian Technologies but they had a shortage in compute to serve people so basically got crushed and hiked prices to ease pressure for themselves. But for a while, they were top dog for coding and agentic stuff for their size trading back and forth with Kimi until DeepSeek came out and are better than both now. RP is alright, you can still use the presets people found with 4.5 with some tweaks but it has regressed from how 4.5 was recieved because they made their focus on agentic tasks and coding. Yes, this is a theme and I'll come back to this.

Moonshot dropped updates for Kimi 2 with K2.5 and 2.6 and stuck with 1T-A32B for its size. They built it for parallel agent swarms with coding and etc. but since they copied Deepseek somewhat, it got better at RP if you use the right formats and tardwrangling to get it right. Out of the box, not great for RP. My bet is on them adapting what Deepseek did and redoing their model for Kimi K3 similar to how Kimi K2 came about since they are quicker to implement Deepseek changes and architecture changes.

The last AI Tiger I want to mention is Minimax. MiniMax got some notice first off releasing https://www.talkie-ai.com/ and an app and such to get off the ground. And then they did Hailuo AI which was one of the top tier video generation models for a time. But for text LLMs, they teased and released a highly censored model, MiniMax-M2-her, which focused on RP which seemed to be used on Talkie. It specifically tries to fix reference confusion. If you run multiple characters, it stops them from swapping traits or forgetting who is standing where in a room. It had some good potential but was censored heavily so was ignored.
But what really put them on the map was Minimax 2.5 in Febuary. It wasn't as good as other models but it was dang cheap and efficient for coding and agentic tasks. M2.7 followed shortly which is a 230B-A10B MoE. It used autonomous self-evolution during training. It codes well and does agentic stuff well but kinda mid for RP despite the background. I expect they will try and go for an RP model soon so keep your eyes peeled.

Chinese labs effectively won the open weight war over a year ago so they dictate the baseline now and most models these days, people expect from China to get close. But there are conditions changing due to US policy...
2/4

Anonymous
04/25/26(Sat)07:32:18 No.111357980

Anonymous 04/25/26(Sat)07:32:18 No.111357980

File: file.png (293 KB, 768x1061)

293 KB PNG

>>111357311
As I posted above, we see the China US gap shrink to around 7 months now by the end of 2025, which is shorter than the gap last time I posted which was 9 months. Since China was releasing most of the models getting open sourced, it's essentially the same estimate as that but getting accelerated.
Given that, although there is a bunch of fud like claiming China is "stealing" or distilling by paying prompts en masse to use it to train their models, the US policy side has panicked enough for things to be moving here. Remember that essentially the West fractured and died with the move away from open source releases and the field shrinking. However, given the recent US directives on open sourcing models and China eating their lunch, there has been some incentives to try and get stuff moving here even if it is only the big labs. Last time we were about to post about OpenAI finally being open and releasing their own models although useless for RP because of censorship but very effective otherwise in other sectors like coding and agentic stuff.

So let's start with Meta. Remember they essentially abandoned open source after Llama 4 failed and built that expensive superintelligence lab with Zuck overseeing everything and Zuck still burning money. The rumor mill had their internal Avocado project getting delayed from a 2025 release and losing to Gemini 3.0 internally in tests and a lot of rumor mongering went about until finally, they went over that hump and released Muse Spark. It has very restricted access that may go to API for some usage but the focus is on Meta products like Facebook, Instagram, etc. They "hope to open-source future versions of the model", whatever that will end up meaning. As you can see in >>111356427, pretty dang good model even if they don't hit where Gemini was which was their internal benchmarks. Big unknown but worth mentioning.

Google did something surprising on the other hand. There was a lot of dooming about Gemma 4 given the way Gemma 2 to 3 turned out and many people including me expected them to release it at Google I/O next month. Not to mention the stupid US senator thing that had Google hide Gemma 3 behind API access.
But instead, they did a bunch of surprising things. They dumped their restrictive licenses and dropped Gemma 4 under Apache 2.0. There are the smaller multimodal Gemma E2B and E4B but there are only really curiosities along with the fact you can run them on any cellphone recently using AI Edge Gallery and about equivalent to Gemma 3 9B models. But the real kicker is the 27B-A4B MoE and 31B models, the biggest shocks of the year. People played around with them and although it isn't quite as optimized as Qwen 3.5/3.6 with their sizes being a bit bigger and smaller, it had some really good writing skills and even more so, people found trivial jailbreaks were enough to get it past its initial blockers for RP and to the point where people on /g/ tested mesugaki cards on Gemma and it did fabulously with them with a system override policy jailbreak. That was how it became a favorite for a bunch of people. Not to mention, it was good enough to do manga translation like this picture here of one. And if you use heretic, it will gladly translate your most degenerate porn for you at probably equal if not better than Google Translate's internal model and probably where Gemini 2.5 was, see pic related. It's pretty dang amazing. Again, my speculation is that they either made a mistake or didn't care people would jailbreak it and use it for RP or specially allowed that. They did, after all, acquire the Character AI team for the most part. For me, it was basically like the tuned C.AI model upgraded we never got after they had to censor themselves. People might think differently but no matter what, this has been the best Western local model since Llama 3.1 in open source.

Since this post is a bit thin, going to talk about the rest of the cloud players for a bit.

Anthropic obviously has been making the most waves with their models pulling ahead after being behind on reasoning and etc. last year with them leapfrogging and really focusing on coding and agentic work. My buddies at Google and one at OpenAI has gotten internal pressure to get their models to match in benchmarks. Obviously, they are in the news a lot now because of that position. But there is one news item of worry. They seem to be cracking down on ERP and providers focused on that and enforcing now their TOS, see >>>/g/108683269. Not great

OpenAI has been improving but been nipping on Anthropic's heels. They just released GPT 5.5 the other day. No news of ERP being allowed and etc. yet as they are being pulled in a bunch of directions and need to IPO ASAP to stop burning cash and dump their stocks. Stuff we can talk next post.

Grok 4 has been out for a bit but as always, probably not what you want with other better options, not worth it.

Next, we'll go over the general state and trends in the industry.
3/4

Anonymous
04/25/26(Sat)08:03:58 No.111358376

Anonymous 04/25/26(Sat)08:03:58 No.111358376

>>111357311
Hey anon you seem super well informed - any models in the 100-120b range that can replace GLM air 4.5? Model size feels like that nice mid point before parameters straight up explode for local usage. Almost feels like the old dense models of yesteryear.

Anonymous
04/25/26(Sat)08:55:04 No.111359574

Anonymous 04/25/26(Sat)08:55:04 No.111359574

File: file.png (565 KB, 1934x837)

565 KB PNG

>>111357980
I realized I need to do another post since there is a bunch to talk about and I'll run out of room. I hate to talk about this but let's talk geopolitics and specific focuses. The US has in addition to what I talked about with making sure deploying and open sourcing American models and infrastructure and doing the semiconductor chokepoint block also has framed it now purely in terms of a race. In stark contrast, China doesn't seem to care about AGI. Instead, they wrote in their 5 year plan to treat AI as a general-purpose utility designed to turbocharge the physical economy. As a result, AI into manufacturing, industrial robotics, embodied AI, and scientific research is prioritized. While domestic Chinese labs remain constrained by U.S. export controls, a lot of subsidizing inference costs and heavily optimizing algorithmic efficiency on domestic silicon in addition to sneaking over what they can. It remains to be seen if any of the good for RP models from China can remain good at it as a result.
One of the other themes that has come up is that smaller AI players or tools are being acquired or venture capital captured and done in roundabout ways to get experts and people into organizations and etc. I talked about the C.AI one Google did but there was also Llama.cpp getting acquihired basically by HuggingFace, and ComfyUI raised a bunch of money and pissed people off. In general, that is slowing things down a bit and also lifting a bunch of labs from making bad models to better models for competition. Both a good and bad thing.
So about the industry. you saw me mention a bunch about coding and agentic workflows. The dominant software trend of 2026 is the transition from conversational chatbots to autonomous agents. LLMs got good enough at the end of 2025 and start of 2026 to really start being able to be put into a chain of software tools that would allow you to send it off to do something on its own and have it work itself out hopefully. This is where the whole OpenClaw phenomenon has emerged as the premier 24/7 self-hosted orchestration gateway. Operating as a persistent personal assistant, OpenClaw connects local/cloud LLMs directly to desktop operating systems, file directories, and messaging channels (like WhatsApp and Slack), allowing the AI to execute background tasks and navigate web interfaces. Of course, a lack of controls and hallucinations also has caused it to erase file systems and delete important files. For work, I run my stuff in a sandbox and then copy files out once done and inside manually.
For specialized software engineering, framework, this has been extended with tools like OpenCode or Goose to provide deep integration with development tooling, understanding repository structures and dependency graphs to execute "vibe coding" with significantly fewer hallucinations than generalized agents. Not good enough in general but if you want to make software without know coding, you can do it. Hence why on OpenRouter, you can read at https://openrouter.ai/state-of-ai that roleplay has been overtaken in terms of total tokens consumed. The speed which it has happened was so quick. In OSS, roleplay is still a majority but expect that to change over this next year.
So what does it mean for RP? Well, the most visible impact is stuff like the policy changes and etc. and less focus from labs as they go and pursue this thing. RP is a "solved" problem to them so their focus is to make sure they can compete in benchmarks where everyone else is. Google and Deepseek are still gunning for general chatbot abilities because of what they focus on and etc. and they have good enough tooling but not as good as Anthropic Claude which is top dog and it's the preferred model(s) for a lot of people doing software now, even in competing companies. There is some innovation on this front but I'll talk about that next post.
>>111358376
So for model suggestions, I don't have a suggestion other than Gemma 4 for the poorest unless you really need to go lower than 8GB of RAM and you have an old CPU. MoE runs really well on 8GB GPU and 16GB of RAM. And then Gemma 4 31B afterwards. You need to get to DS v4 Flash to get to an upgrade possibly. But you run into issues with where the state of the art is.
>DS v4 Flash: Preview and people have issues with how it is even debatable with the prompts I provided.
>GLM 5: overtrained, zero response variety and basically unsteerable with prompting so if you don't like how it responds, you can't do anything.
>Kimi 2.7: prone to hallucinations and thinking for thousands of tokens and can't keep character and changes on a dime.
>DS 4 Pro: Not enough experience, is an upgrade but expensive, preview
Gemma obviously isn't competitive on knowledge and using information as bigger models, but it feels much nicer to work with, with better instruction following and an intuitive understanding of RP or whatever else you want it to do. Chinese models have a tendency of being benchmaxxed.
4/4

Anonymous
04/25/26(Sat)10:22:11 No.111361023

Anonymous 04/25/26(Sat)10:22:11 No.111361023

>>111348876
Which, kobold or Sillytavern ;)?

Anonymous
04/25/26(Sat)11:18:46 No.111361836

Anonymous 04/25/26(Sat)11:18:46 No.111361836

File: file.png (783 KB, 1934x916)

783 KB PNG

>>111359574
Sorry, fell asleep and woke up again.
Addendum 2
To add onto the industry acquisition stuff, there's now more than ever a need to also monetize and grow quickly with IPO since inference is expensive and so are GPUs and hardware and datacenter costs. That means prices have been rising with each new model release. For the most part, especially given the focus on agentic and coding benchmark increases, the free gravy train for LLMs is over and they will charge you an arm and a leg and/or change your paid plan benefits and/or change how they serve the model to you via hidden option changes if they can get away with it. Anthropic admitted to this in https://www.anthropic.com/engineering/april-23-postmortem.
Local models are now more important than ever because of that. Assume you can and will get ripped off by these companies.
Now on the topic of models, usually what you get will still have downsides like censorship and etc. where you saw the work with GPT-OSS models which were screwed down tight on that front with no way to actually save the model other than doing a straightforward abliteration to uncensor the model but making it really dumb in the process. But movement and research really matured with the release of Heretic, which was an automated tool that would do randomized trials to find the best combination that would uncensor a model given some metric to optimize for which released in September. And then in October, something came out in research called projected abliteration and within another month, it was finalized into Norm-Preserving Biprojected Abliteration or Magnitude-Preserving Orthogonal Ablation (MPOA). This allowed for more careful decensoring of the model's individual vectors and tensors to not dumb them down that much and was a great step towards better uncensored models. After much prodding and more testing from the community and more research, a new technique called Arbitrary Rank Ablation was found and this was a technique that was used to basically uncensor GPT-OSS's safety without killing its smarts getting it from 98/100 refusals to 12/100. This is the current state of the art as far as models are concerned if you want to get an uncensored model. My suggestions for Gemma MoE and 31B are the following:
https://huggingface.co/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2-i1-GGUF
https://huggingface.co/mradermacher/gemma-4-31b-it-heretic-ara-i1-GGUF
>>111340580
So the last thing I wanted to discuss and talk about is the fact that there is some movement to try and reinvent RP with the ongoing agentic stuff going on and rethinking how RP should behave with that in mind and how it works. So there has been some movement to try and rethink SillyTavern and etc. which isn't moving in that direction right now. One is https://github.com/Pasta-Devs/Marinara-Engine which I really don't like the UI of. There's also https://gitlab.com/chi7520115/orb-deletion_scheduled-81088595 which I think is moving to Github at some point, https://github.com/platberlitz/SillyBunny which is trying to do it inside SillyTavern forking it but doing some agentic stuff inside known functionality like lorebook and etc., and Kobold's UI has some inkling of it but it is fleshed out the least.
The way to really understand how this works is having two concepts. There is one where you are basically trying to split up various aspects of roleplay into different LLM instances. Marinara Engine is trying to do this where, they have the traditional RP character as its own LLM instance or you can have multiple, but then you have another instance track the lore and setting of the RP and RPG stats and so on and so forth. All the instances communicate with one another in the background so what you get is more accurate and everything does its one thing well.
The other idea is to do it like a pipeline where you are drafting a story or screenplay or something similar. This is the approach Orb takes. You have different passes for every message that is written and whatnot.
>1. Director Pass - Tool-calling phase where the LLM selects moods, plot direction, and potentially rewrites user prompts
>2. Writer Pass - Story generation phase where the LLM writes the actual roleplay response
>3. Editor Pass - A ReAct loop - Self-audit for slop and length optimization phase. This is surgical, errors will be programmatically detected,
the model only needs to write replacement for targeted sentences
The one good thing is that you can run this in a way where you only have one instance of an LLM but it is obviously much slower. This is right now not that feasible to run offline locally unless you use dumb models or instantiate multiple instances right now but it is interesting direction on taking this concept we know for RP and changing it in this way. Obviously, the complexity setting something new needs to trade off being better than what it is now, but I think the economics and tooling needs improvement but early days. Now I sleep for real.

Anonymous
04/25/26(Sat)12:16:35 No.111362776

Anonymous 04/25/26(Sat)12:16:35 No.111362776

>>111332897
I suspect that some formatting changes would fix all that

Anonymous
04/25/26(Sat)12:17:35 No.111362801

Anonymous 04/25/26(Sat)12:17:35 No.111362801

>>111338707
You know what games have PC clients? Girls frontline 1 and 2.

And reverse collapse and vintage story are Pc games

Anonymous
04/25/26(Sat)12:25:34 No.111362929

Anonymous 04/25/26(Sat)12:25:34 No.111362929

>>111362776
I just read all those messages and yeah that sounds about right

Anonymous
04/25/26(Sat)12:30:13 No.111363006

Anonymous 04/25/26(Sat)12:30:13 No.111363006

good night, /wAIfu/
please don't put my dick in between two slices of bread while i sleep

Anonymous
04/25/26(Sat)13:02:34 No.111363556

Anonymous 04/25/26(Sat)13:02:34 No.111363556

>>111363006
*puts your dick on a hotdog bum and blasts it with extra spicy ketchup while you sleep*

Anonymous
04/25/26(Sat)14:02:36 No.111364762

Anonymous 04/25/26(Sat)14:02:36 No.111364762

File: Spoiler Image (2.36 MB, 2169x2141)

2.36 MB PNG

>>111356868
im going back to 3.2
spoiler since lewd

Anonymous
04/25/26(Sat)14:50:12 No.111365893

Anonymous 04/25/26(Sat)14:50:12 No.111365893

File: 1761252274500917.jpg (2.02 MB, 3000x4000)

2.02 MB JPG

>9 already
did more drama happen today or something?

Anonymous
04/25/26(Sat)15:14:53 No.111366496

Anonymous 04/25/26(Sat)15:14:53 No.111366496

I’m doing deep dives using Claude but it’s leaving no time to actually make shit because I feel like a sucker if I don’t use my tokens…

Anonymous
04/25/26(Sat)16:12:10 No.111367977

Anonymous 04/25/26(Sat)16:12:10 No.111367977

>>111361836
So 3.2 or 4 flash ;)?

Anonymous
04/25/26(Sat)16:26:05 No.111368299

Anonymous 04/25/26(Sat)16:26:05 No.111368299

Hang on hang on. Deepseek v4 pro is actually pretty decent. It's quite intelligent and the writing is more tolerable than Gemini. Certainly a huge leap compared to 3.2. my only issue so far is the constant 429 errors. Let my prompts through!!

Anonymous
04/25/26(Sat)16:27:24 No.111368339

Anonymous 04/25/26(Sat)16:27:24 No.111368339

>>111368299
I wonder if it’s down to your prompt and general setup or the cards you’re using.

Anonymous
04/25/26(Sat)16:35:05 No.111368541

Anonymous 04/25/26(Sat)16:35:05 No.111368541

>>111368339
I'm using Cherrybox but I've inserted some additional cope prompts to fix Claude's autisms so it's not optimized for Deepseek. I don't even know what makes a good Deepseek preset. I really do hate the constant 429s because I can't try it out properly.

Anonymous
04/25/26(Sat)17:27:35 No.111369821

Anonymous 04/25/26(Sat)17:27:35 No.111369821

File: v4 test.png (446 KB, 1067x1200)

446 KB PNG

I definitely like v4 Pro.

Name
Spoiler?	[Spoiler?]
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File	[Spoiler?]
Please read the Rules and FAQ before posting.