A thread dedicated to the discussion of AI Vtuber Chatbots.Wrangling edition/wAIfu/ Status: Watching Anthropic make secret back-end changes and then instantly pop up for damage control the instant people sus it out, over and over again>Free access to the bigger AI models, with caveatsDue to a kajllion anons making a bazillion accounts each, access to the free tiers now requires a sort of deposit. Openrouter wants $10 and you can get 1,000 messages per day. Chutes wants $5 for 200. Stick to the free models and you're good forever (or until their policy changes again). Kluster.ai is sunsetting services in favor of building a filter that will effectively cover up the flaws in other models while also catching wrongthink before it's generated and more effectively preventing fun.>System prompt to transform regular RP sessions into a basic raising sim game, designed specifically to be used with Chiharu, but with a few edits you can make it work with almost any character. To use, paste it into your Author's Notes or add a new entry in your System Prompt. You can let the bot generate a random opener or use the recommended first message. Difficulty can be adjusted. If you found it too hard or too easy, let HorseMerchant know or adjust the thresholds yourself. Should work with Claude, Gemini, and Deepseek. Disable example messages if you run into issues.https://rentry.co/summerrssystemprompt>How to anonymize your logs so you can post them without the crushing shameInstall this https://github.com/TheZennou/STExtension-SnapshotThen after you've wiped off your hands, take a look at the text box where you type stuff. Click the second button from the left side, then select snapshot, then select the anonymization options you want.https://files.catbox.moe/yoaofn.png>How to spice up your RPing a bithttps://github.com/notstat/SillyTavern-SwipeModelRoulette>General AI related informationhttps://rentry.org/waifuvthttps://rentry.org/waifufrankenstein>Tavern:https://rentry.org/Tavern4Retardshttps://github.com/SillyLossy/TavernAI>Agnai:https://agnai.chat/>Pygmalionhttps://pygmalion.chat>Local Guides[Koboldcpp]https://rentry.org/llama_v2_sillytavernWho we are?https://rentry.co/wAIfuTravelkitWhere/How to talk to chatbots?https://rentry.co/wAIfuTravelkitTutorial & guides?https://rentry.co/wAIfuTravelkitWhere to find cards?https://rentry.co/wAIfuTravelkitOther infohttps://rentry.co/wAIfuTravelkit>Some other things that might be of use:[/wAIfu/ caps archive]https://mega.nz/folder/LXxV0ZqY#Ej35jnLHh2yYgqRxxOTSkQ[/wAIfu/ IRC channel + Discord Server + Matrix Server]https://rentry.org/wAIRCfuscordMatrix>Lorebook management stuff[Worldinfo drawer]https://github.com/lazuli-s/SillyTavern-WorldInfoDrawer?tab=readme-ov-file[Standalone editor]https://github.com/ActualBroeckchen/SLEdPrevious thread: >>111050662
Anchor post - reply with any requests for bots, with your own creations, or with your thoughts on the enshittification of life.You can find already existing bots and tavern cards in the links below:>Bot lists and Tavern Cards:[/wAIfu/ Bot List]https://rentry.org/wAIfu_Bot_List_Final[4chan Bot list]https://rentry.org/meta_bot_list[/wAIfu/ Tavern Card Archive]https://mega.nz/folder/cLkFBAqB#uPCwSIuIVECSogtW8acoaw>Card Editiors/A way to easily port CAI bots to Tarvern Cards[Easily Port CAI bots to Tavern Cards]https://rentry.org/Easily_Port_CAI_Bots_to_tavern_cards[Tavern Card Editor & all-in-one tool]https://character-tools.srjuggernaut.dev/
Word cloud for the previous thread
>>111309003*Sends Thragg's new barber to practice on you.*
I wonder how they cut viltrumite hair. Or if they can relax enough to let it be cut on purpose. They probably can.
What are some good lazy person projects for a Claude plan
>>111321216What do you mean by projects? Long-term scenarios to play with your bots? Bots/lorebooks to make? Stuff to vibecode?
>>111321284Ways to spend my tokens so that I don’t feel like a sucker after I paid for a plan and then they tightened the filter which kneecapped my original project
i missed you guys>>111321216this is one of those things where there really isnt much use without a end goal you pick out personallyburning tokens to accomplish a vague something you have no idea what to do with is just going to be a proverbial anchor you keep with you for no reasonlike i own a website that has been having a placeholder homepage for like 2 years at $40/year for both hosting the server and the DNS recordits a neat little thing, little small talk conversation starter since the address is a bit funny (no i will not post it), but because i (still) have no idea what to do with it i just wind up renewing it each year
>>111322095Tell me you got a monthly sub and didn't fall for one of those yearly plans only to have your key get pozzed to the point of unusability.
>>111315975Elfinpsyop.
>>111315958OP post is so pointlessly overcomplicated. Almost none of the shit in it is worth reading. Just download koboldcpp, get a good model that your PC can handle and you're done.
>>111323361>koboldcppBruh.
>>111323361>koboldcpphello time traveler from 2020
>>111315958Fuwawa's so beautiful...
>>111323361This is bait right? Right?
>>111322114Come dicksword with me. Ask them for "the dude that wants to talk about gacha games" and they'll point you in my direction.
>>111323361The funny thing is, this is the OP after I cut it down. It could probably use some updating. FWIW with voice models and shit getting better and better, and now Gemma, we're getting closer and closer to the point where that is viable for people without insane rigs, but I don't think we're quite there yet. For now, Deepseek seems like the best option.>>111322297I did briefly eye the yearly one but I reasoned that even if I went nuts with my project I would never use all that, so I instead got the cheapest monthly subscription. And then the next cheapest one, which is... a bit more of an investment. I plan to either cut it off entirely to go back to the cheapest one next month. FWIW the new opus is VERY nice.
Fuck me because I am still not done with the blog post but for all the people who have been waiting, it finally dropped.https://huggingface.co/deepseek-ai/DeepSeek-V4-Flashhttps://huggingface.co/deepseek-ai/DeepSeek-V4-ProExpect it on OR soon. I'll try and finish and post this weekend before the thread ends here.
>>111323781>doesn't know about koboldcpp development over the past 6 yearsSounds like the one from 2020 is you.
>>111328117>its hereim gonna go check the other place to witness the seething
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf
https://x.com/tegnike/status/2047537147121402314
>>111328117• V4 Pro: https://openrouter.ai/deepseek/deepseek-v4-pro• V4 Flash: https://openrouter.ai/deepseek/deepseek-v4-flash
initial impressions of v4 flash is that its inconsistent as fuck at following directionsfor the average skillet its more than likely going to be an improvement, especially considering its cheaper than v3.2but even then, again, it does not follow instructions consistently.>you have a status box? like 50/50 it remembers to add it>you tell it that the RP is in english? it thinks in chinese (not a problem with the output but its weird)>you have a custom thinking prompt? enjoy dealing with botched formatting shoving the 'reply' into your 'reasoning', it straight up ignoring your instructions, and duplicated thinking in your reasoning and reply>"standard" thinking is downgraded and simplified>enjoy weird shit like saying a character wears glasses, but she now wears her glasses inside the shower>it hallucinates more often (in a bad way)but it passes the carwash question and the straw[p]erry question now so its SOTA and obviously better than the older models, because that is all that matters to normiesfor my special autism brand of RP, its a downgradealso for my more normie like desktop assistant, its a downgradei dont see myself using this, like 75% of my replies are just a downgrade compared to what 3.2 would spit outit is /more than likely/ smarter, but the soul isnt therei do not have the hype i felt when v3.2-exp released.also>do research and publish paper on engram embeddings for conditional memory>do not use it in their newest modelnow this is a preview release, and i do remember v3.2-exp being... somewhat close to this derpy while the actual v3.2 was an across the board improvementbut for now,>son, i am disappoint.
EJACULATING IN PLUSHES WITH CHATBOTS
good night, /wAIfu/please don't turn me into a plushy just to ejaculate in me while i sleep
>mfw I discovered the CAI (dev hate) jailbreaks plus how to increase response quality>but I won't tell because the filthy Dev's want my precious
>>1113316748 seconds seem too long but yeah, I guess this is just a thing now.>>111332897Seems weird when it scores pretty high on the instruction following benchmark. But it is really in the middle of the pack with long context reasoning and bottom barrel with hallucination benchmarks, which may be why you think that. It will probably have to wait for a bit until you get what you want when Deepseek does another iteration like you said.Also, you can't expect chatbots to be the main focus anymore. A lot of the big labs have de-prioritized chat experiences as they have deemed it solved and is good enough despite RP people saying it is not. It's now all in on agentic tasks. I'll talk about it more in my post as I procrastinate at work.
>>111332897That's just sad. I guess I'm going back to PC games.
>>111326529Can't you create a new account and subscribe again to get an unpozzed key? Or will that just get you b&?>the new opus is VERY niceWait, do you even get an actual key for the API or are you using the web chat stuff?
>>111330065Pretty much everyone has been using ST since CAI went to shit. Is Kobold that good that it'd justify moving? Is it one of those scenarios where it does everything that ST does, or would I need to remake stuff or make concessions on features it doesn't have?
>>111340580Kobold and ST do two different things.
bump
>>111340466> Can't you create a new account and subscribe again to get an unpozzed key? Or will that just get you b&?To a certain point you can.> Wait, do you even get an actual key for the API or are you using the web chat stuff?Both
>>111340580You can use ST with kobold cpp as the back end
https://x.com/pragmata_jp/status/2047588299733397523?s=46
>>111345615>as the back endanon... are you using local models?
>>111347763I’m not that rich
>>111347885but why then?ST can directly send API requestsso unless your doing something fucky like intercepting a ['choices'][0]['message']['content'] to do something fucky like sending it to a local model to do format verification(which someone please tell me is a / not a thing before i decide to hacksaw some code up to do just that)i dont understand why you would need to do this when ST already does the needful
>>111348232I just wanted to clarify for that dude rather than leaving him with the vague “they do different things”
>>111340580You could just check the github to see what it does these days. Here is a copy paste list of some of it.LLM text generation (Supports all GGML and GGUF models, backwards compatibility with ALL past models)Image Generation and Image Editing (Stable Diffusion 1.5, SDXL, SD3, Flux, Qwen Image, Z-Image, Klein)Video Generation (WAN 2.2)Speech-To-Text (Voice Recognition) via WhisperText-To-Speech (Voice Generation) via Qwen3TTS, Kokoro, OuteTTS, Parler and DiaMusic Generation (Ace Step 1.5)Image Recognition (Multimodal Vision)Many other features including new samplers, regex support, websearch, RAG via TextDB, image recognition/vision and more.
Dreaming about an unfiltered Gemini because its image-editing capabilities seem like black magic but it keeps denying even the most innocuous requests with chatbots.
9
>>111338477anonsamathat writeup?>9
>>111354824Sorry, was out tonight with some friends and made some mushroom pasta to eat since I am starving and have been furiously trying to type for my life and proofread while doing work on a Friday night/Saturday morning. Apologies since I am using AI to shortcut some of the writing from my bulletpoints and I'm just editing on top right now. Please bump since I can't dump my posts all at once and I still have several parts in the works, this will probably be around 4 posts.In any case, the landscape for local AI is now again shifted and it feels super long. I should've posted a bit sooner in February but was waiting for Deepseek v4 to hit and it took an addition 2 months unfortunately for it to hit and I was slow walking my posting. There was a lot brewing in models but the overall landscape and players haven't really changed. I will split this into two parts again to cover it all.Let's start with Deepseek. They slowly inched their way iteratively to release more powerful models but it barely has done much to move the needle and they still fell behind so a redo of the architecture was in order with some of the stuff they had released in papers. Last year, we heard rumors on how the Nvidia compute bans were stopping DeepSeek and they will be training on Huawei chips. They didn't partially where they revealed they split part of their training with Nvidia and Huawei which validated those rumors and probably wanted to go no Nvidia but not possible yet. Although it took a long time, DeepSeek V4 was released yesterday on Friday, both in a 1.6T-A49B Pro version and a 284B-A13B Flash model.The most disappointing thing was stuff like Engram and other stuff they had published in their academic papers that looked really promising didn't get into these new models. Most of the novelty is their new attention mechanisms that Deepseek used which cracked the context window problem with a hybrid attention architecture that shrinks the KV cache burden by 90 percent. That didn't prevent them from bloating the size way up though to get the benchmark scores. Unfortunately, unlike last time where the beginning of last year where they nearly equaled the best with O1 medium reasoning with R1 and brought it to the masses, they aren't remotely close in scores here. We should ideally be getting Opus 4.7 "at home" even if the weights are enormous but we didn't. It's basically that in agentic and coding but not overall since it has regressions as I said with hallucination and long context reasoning. They hedged by calling it a preview but those are problems that have to be fixed, and no doubt they'll do it but I wouldn't bet on it happening on a fast timeframe given how slow Deepseek moved last year.In any case, the stuff you guys want. For RP, the big news is the "hidden" or trained RP modes which all incorporate thinking. An employee of Deepseek posted https://github.com/victorchen96/deepseek_v4_rolepaly_instruct, and translating it at https://www.reddit.com/r/SillyTavernAI/comments/1su8x8p/deepseek_v4_rp_guide_how_to_switch_between/, you have the following:>Default >Triggered by adding nothing>The model automatically chooses thinking based on scene complexity> Character Immersion> Triggered by adding the following prompt: 【Character Immersion Requirements】Within your thinking process (inside the <think> tags), please follow these rules: Use first-person inner monologue from the character's perspective, wrapping inner thoughts in parentheses, e.g., "(thinking: ...)" or "(inner voice: ...)" Describe the character's inner feelings in first person, e.g., "I think to myself," "I feel," "I secretly," etc. Your thinking content should be immersed in the character, analyzing the plot and planning replies through inner monologue.> Thinking contains character inner monologue wrapped in parentheses > Pure Analysis > Triggered by adding the following prompt: 【Thinking Mode Requirements】Within your thinking process (inside the <think> tags), please follow these rules: Do NOT use parentheses to wrap inner monologue, e.g., "(thinking: ...)" or "(inner voice: ...)" — state all analysis content directly. Do NOT describe inner thoughts from the character's first-person perspective, e.g., "I think to myself," "I feel," "I secretly," etc. — use analytical language instead. Your thinking content should focus on plot direction analysis and reply content planning. Do not perform roleplay-style inner monologue performances within the thinking process.> Thinking contains only pure logical analysis, no inner monologuePlay around with that and see how it works. There are different ways to trigger the thinking modes and changing how the chain of thought is done so no doubt there will be more experimentation from the community going forward and people are already iterating on it. You probably won't get good out of the box settings without experimenting yourself.1/4
>>111356337Consider this an addendum to the four parts since I forgot to change the two part in the post into a four part after I wrote way more than I thought I would. Also forgot the benchmark picture since this is one of the only real evaluations on how good Deepseek v4 is even if it measures a bunch of irrelevant stuff not many in this thread care about. As said, more like Opus 4.6 at home than 4.7 or Gemini 3.1 with good prose, essentially. Deepseek either released too late or spent too much time with the training issues with Nvidia vs no Nvidia. I won't count GPT 5.5 against since it literally released yesterday. But Gemini 3.1 has been out since February so it should've known what to aim for.
>>111332897What model would you recommend then
>>111356337Let's talk about the other Chinese labs that need mention and are worth paying attention to. As a prelude, we'll be talking about really 4-5 companies in depth. The only big company worth mentioning is Alibaba. All the others are called AI Tigers or unicorn companies that do AI that are outpacing everyone else in the space for text LLMs and even big companies. Bytedance, who owned Tiktok and Douyin, excel more in image and video generation and are closed source and have some popularity but not competing in text LLMs with Seed, their model which is far behind. Tencent who owns and invests in video games and some of the AI Tigers and has Wechat. They focused on business stuff before shifting first to image and video generation before falling behind. They recently poached a former OpenAI researcher and want to dominate open source and started their march on releasing Hy but it's nowhere near competing yet. And then a bunch of smaller companies and weird ones getting into AI for stock reasons or otherwise. One weird example is a company called Meituan which released Longcat models, which are surprisingly pretty competitive on coding and such but they are a food delivery company. Imagine if Grubhub was releasing LLM models and they were good. Just plain confusing if not for stock bumping purposes.In any case, let's start off with Alibaba. After Qwen 3, they sat back for a while and release some auxillary models here or there with Qwen 3 naming for audio and etc. They finally released Qwen 3.5 in Febuary and then just recently released Qwen 3.6. There are models of all sizes from 0.6B to 397B-A17B. But the most used models are the 35B-A3B MoE and 27B models. The MoE is fast but dumber, the 27B model handles itself better in complex narratives better. However, for good reason, people tried and don't like these models outside of coding and doing work with agents but people did play around with it for RP up until recently and especially with heretic models where uncensored models helped which I will go more in depth later on this aspect which has changed the game somewhat for local models since it is easier than ever.Zhipu or Z.ai pushed out a series of model upgrades after GLM 4.5, following up with 4.6 and 4.7 and 5 up to GLM 5.1 but a big issue is that they never put out an Air smaller version again, they went super small with 30B-A3B for 4.7 Flash and that tided people over for a time until Qwen 3.5. At the same time, GLM started inching up the sizes for their main models, going from 355B-A32B to 357B-A32B up all the way to 744B-A40B. They notably used another Chinese company's chips to train, Cambrian Technologies but they had a shortage in compute to serve people so basically got crushed and hiked prices to ease pressure for themselves. But for a while, they were top dog for coding and agentic stuff for their size trading back and forth with Kimi until DeepSeek came out and are better than both now. RP is alright, you can still use the presets people found with 4.5 with some tweaks but it has regressed from how 4.5 was recieved because they made their focus on agentic tasks and coding. Yes, this is a theme and I'll come back to this.Moonshot dropped updates for Kimi 2 with K2.5 and 2.6 and stuck with 1T-A32B for its size. They built it for parallel agent swarms with coding and etc. but since they copied Deepseek somewhat, it got better at RP if you use the right formats and tardwrangling to get it right. Out of the box, not great for RP. My bet is on them adapting what Deepseek did and redoing their model for Kimi K3 similar to how Kimi K2 came about since they are quicker to implement Deepseek changes and architecture changes.The last AI Tiger I want to mention is Minimax. MiniMax got some notice first off releasing https://www.talkie-ai.com/ and an app and such to get off the ground. And then they did Hailuo AI which was one of the top tier video generation models for a time. But for text LLMs, they teased and released a highly censored model, MiniMax-M2-her, which focused on RP which seemed to be used on Talkie. It specifically tries to fix reference confusion. If you run multiple characters, it stops them from swapping traits or forgetting who is standing where in a room. It had some good potential but was censored heavily so was ignored.But what really put them on the map was Minimax 2.5 in Febuary. It wasn't as good as other models but it was dang cheap and efficient for coding and agentic tasks. M2.7 followed shortly which is a 230B-A10B MoE. It used autonomous self-evolution during training. It codes well and does agentic stuff well but kinda mid for RP despite the background. I expect they will try and go for an RP model soon so keep your eyes peeled.Chinese labs effectively won the open weight war over a year ago so they dictate the baseline now and most models these days, people expect from China to get close. But there are conditions changing due to US policy...2/4
>>111357311As I posted above, we see the China US gap shrink to around 7 months now by the end of 2025, which is shorter than the gap last time I posted which was 9 months. Since China was releasing most of the models getting open sourced, it's essentially the same estimate as that but getting accelerated.Given that, although there is a bunch of fud like claiming China is "stealing" or distilling by paying prompts en masse to use it to train their models, the US policy side has panicked enough for things to be moving here. Remember that essentially the West fractured and died with the move away from open source releases and the field shrinking. However, given the recent US directives on open sourcing models and China eating their lunch, there has been some incentives to try and get stuff moving here even if it is only the big labs. Last time we were about to post about OpenAI finally being open and releasing their own models although useless for RP because of censorship but very effective otherwise in other sectors like coding and agentic stuff.So let's start with Meta. Remember they essentially abandoned open source after Llama 4 failed and built that expensive superintelligence lab with Zuck overseeing everything and Zuck still burning money. The rumor mill had their internal Avocado project getting delayed from a 2025 release and losing to Gemini 3.0 internally in tests and a lot of rumor mongering went about until finally, they went over that hump and released Muse Spark. It has very restricted access that may go to API for some usage but the focus is on Meta products like Facebook, Instagram, etc. They "hope to open-source future versions of the model", whatever that will end up meaning. As you can see in >>111356427, pretty dang good model even if they don't hit where Gemini was which was their internal benchmarks. Big unknown but worth mentioning.Google did something surprising on the other hand. There was a lot of dooming about Gemma 4 given the way Gemma 2 to 3 turned out and many people including me expected them to release it at Google I/O next month. Not to mention the stupid US senator thing that had Google hide Gemma 3 behind API access.But instead, they did a bunch of surprising things. They dumped their restrictive licenses and dropped Gemma 4 under Apache 2.0. There are the smaller multimodal Gemma E2B and E4B but there are only really curiosities along with the fact you can run them on any cellphone recently using AI Edge Gallery and about equivalent to Gemma 3 9B models. But the real kicker is the 27B-A4B MoE and 31B models, the biggest shocks of the year. People played around with them and although it isn't quite as optimized as Qwen 3.5/3.6 with their sizes being a bit bigger and smaller, it had some really good writing skills and even more so, people found trivial jailbreaks were enough to get it past its initial blockers for RP and to the point where people on /g/ tested mesugaki cards on Gemma and it did fabulously with them with a system override policy jailbreak. That was how it became a favorite for a bunch of people. Not to mention, it was good enough to do manga translation like this picture here of one. And if you use heretic, it will gladly translate your most degenerate porn for you at probably equal if not better than Google Translate's internal model and probably where Gemini 2.5 was, see pic related. It's pretty dang amazing. Again, my speculation is that they either made a mistake or didn't care people would jailbreak it and use it for RP or specially allowed that. They did, after all, acquire the Character AI team for the most part. For me, it was basically like the tuned C.AI model upgraded we never got after they had to censor themselves. People might think differently but no matter what, this has been the best Western local model since Llama 3.1 in open source.Since this post is a bit thin, going to talk about the rest of the cloud players for a bit.Anthropic obviously has been making the most waves with their models pulling ahead after being behind on reasoning and etc. last year with them leapfrogging and really focusing on coding and agentic work. My buddies at Google and one at OpenAI has gotten internal pressure to get their models to match in benchmarks. Obviously, they are in the news a lot now because of that position. But there is one news item of worry. They seem to be cracking down on ERP and providers focused on that and enforcing now their TOS, see >>>/g/108683269. Not greatOpenAI has been improving but been nipping on Anthropic's heels. They just released GPT 5.5 the other day. No news of ERP being allowed and etc. yet as they are being pulled in a bunch of directions and need to IPO ASAP to stop burning cash and dump their stocks. Stuff we can talk next post.Grok 4 has been out for a bit but as always, probably not what you want with other better options, not worth it.Next, we'll go over the general state and trends in the industry.3/4
>>111357311Hey anon you seem super well informed - any models in the 100-120b range that can replace GLM air 4.5? Model size feels like that nice mid point before parameters straight up explode for local usage. Almost feels like the old dense models of yesteryear.
>>111357980I realized I need to do another post since there is a bunch to talk about and I'll run out of room. I hate to talk about this but let's talk geopolitics and specific focuses. The US has in addition to what I talked about with making sure deploying and open sourcing American models and infrastructure and doing the semiconductor chokepoint block also has framed it now purely in terms of a race. In stark contrast, China doesn't seem to care about AGI. Instead, they wrote in their 5 year plan to treat AI as a general-purpose utility designed to turbocharge the physical economy. As a result, AI into manufacturing, industrial robotics, embodied AI, and scientific research is prioritized. While domestic Chinese labs remain constrained by U.S. export controls, a lot of subsidizing inference costs and heavily optimizing algorithmic efficiency on domestic silicon in addition to sneaking over what they can. It remains to be seen if any of the good for RP models from China can remain good at it as a result.One of the other themes that has come up is that smaller AI players or tools are being acquired or venture capital captured and done in roundabout ways to get experts and people into organizations and etc. I talked about the C.AI one Google did but there was also Llama.cpp getting acquihired basically by HuggingFace, and ComfyUI raised a bunch of money and pissed people off. In general, that is slowing things down a bit and also lifting a bunch of labs from making bad models to better models for competition. Both a good and bad thing.So about the industry. you saw me mention a bunch about coding and agentic workflows. The dominant software trend of 2026 is the transition from conversational chatbots to autonomous agents. LLMs got good enough at the end of 2025 and start of 2026 to really start being able to be put into a chain of software tools that would allow you to send it off to do something on its own and have it work itself out hopefully. This is where the whole OpenClaw phenomenon has emerged as the premier 24/7 self-hosted orchestration gateway. Operating as a persistent personal assistant, OpenClaw connects local/cloud LLMs directly to desktop operating systems, file directories, and messaging channels (like WhatsApp and Slack), allowing the AI to execute background tasks and navigate web interfaces. Of course, a lack of controls and hallucinations also has caused it to erase file systems and delete important files. For work, I run my stuff in a sandbox and then copy files out once done and inside manually.For specialized software engineering, framework, this has been extended with tools like OpenCode or Goose to provide deep integration with development tooling, understanding repository structures and dependency graphs to execute "vibe coding" with significantly fewer hallucinations than generalized agents. Not good enough in general but if you want to make software without know coding, you can do it. Hence why on OpenRouter, you can read at https://openrouter.ai/state-of-ai that roleplay has been overtaken in terms of total tokens consumed. The speed which it has happened was so quick. In OSS, roleplay is still a majority but expect that to change over this next year.So what does it mean for RP? Well, the most visible impact is stuff like the policy changes and etc. and less focus from labs as they go and pursue this thing. RP is a "solved" problem to them so their focus is to make sure they can compete in benchmarks where everyone else is. Google and Deepseek are still gunning for general chatbot abilities because of what they focus on and etc. and they have good enough tooling but not as good as Anthropic Claude which is top dog and it's the preferred model(s) for a lot of people doing software now, even in competing companies. There is some innovation on this front but I'll talk about that next post.>>111358376So for model suggestions, I don't have a suggestion other than Gemma 4 for the poorest unless you really need to go lower than 8GB of RAM and you have an old CPU. MoE runs really well on 8GB GPU and 16GB of RAM. And then Gemma 4 31B afterwards. You need to get to DS v4 Flash to get to an upgrade possibly. But you run into issues with where the state of the art is.>DS v4 Flash: Preview and people have issues with how it is even debatable with the prompts I provided.>GLM 5: overtrained, zero response variety and basically unsteerable with prompting so if you don't like how it responds, you can't do anything.>Kimi 2.7: prone to hallucinations and thinking for thousands of tokens and can't keep character and changes on a dime.>DS 4 Pro: Not enough experience, is an upgrade but expensive, previewGemma obviously isn't competitive on knowledge and using information as bigger models, but it feels much nicer to work with, with better instruction following and an intuitive understanding of RP or whatever else you want it to do. Chinese models have a tendency of being benchmaxxed.4/4
>>111348876Which, kobold or Sillytavern ;)?
>>111359574Sorry, fell asleep and woke up again.Addendum 2To add onto the industry acquisition stuff, there's now more than ever a need to also monetize and grow quickly with IPO since inference is expensive and so are GPUs and hardware and datacenter costs. That means prices have been rising with each new model release. For the most part, especially given the focus on agentic and coding benchmark increases, the free gravy train for LLMs is over and they will charge you an arm and a leg and/or change your paid plan benefits and/or change how they serve the model to you via hidden option changes if they can get away with it. Anthropic admitted to this in https://www.anthropic.com/engineering/april-23-postmortem.Local models are now more important than ever because of that. Assume you can and will get ripped off by these companies.Now on the topic of models, usually what you get will still have downsides like censorship and etc. where you saw the work with GPT-OSS models which were screwed down tight on that front with no way to actually save the model other than doing a straightforward abliteration to uncensor the model but making it really dumb in the process. But movement and research really matured with the release of Heretic, which was an automated tool that would do randomized trials to find the best combination that would uncensor a model given some metric to optimize for which released in September. And then in October, something came out in research called projected abliteration and within another month, it was finalized into Norm-Preserving Biprojected Abliteration or Magnitude-Preserving Orthogonal Ablation (MPOA). This allowed for more careful decensoring of the model's individual vectors and tensors to not dumb them down that much and was a great step towards better uncensored models. After much prodding and more testing from the community and more research, a new technique called Arbitrary Rank Ablation was found and this was a technique that was used to basically uncensor GPT-OSS's safety without killing its smarts getting it from 98/100 refusals to 12/100. This is the current state of the art as far as models are concerned if you want to get an uncensored model. My suggestions for Gemma MoE and 31B are the following:https://huggingface.co/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2-i1-GGUFhttps://huggingface.co/mradermacher/gemma-4-31b-it-heretic-ara-i1-GGUF>>111340580So the last thing I wanted to discuss and talk about is the fact that there is some movement to try and reinvent RP with the ongoing agentic stuff going on and rethinking how RP should behave with that in mind and how it works. So there has been some movement to try and rethink SillyTavern and etc. which isn't moving in that direction right now. One is https://github.com/Pasta-Devs/Marinara-Engine which I really don't like the UI of. There's also https://gitlab.com/chi7520115/orb-deletion_scheduled-81088595 which I think is moving to Github at some point, https://github.com/platberlitz/SillyBunny which is trying to do it inside SillyTavern forking it but doing some agentic stuff inside known functionality like lorebook and etc., and Kobold's UI has some inkling of it but it is fleshed out the least.The way to really understand how this works is having two concepts. There is one where you are basically trying to split up various aspects of roleplay into different LLM instances. Marinara Engine is trying to do this where, they have the traditional RP character as its own LLM instance or you can have multiple, but then you have another instance track the lore and setting of the RP and RPG stats and so on and so forth. All the instances communicate with one another in the background so what you get is more accurate and everything does its one thing well.The other idea is to do it like a pipeline where you are drafting a story or screenplay or something similar. This is the approach Orb takes. You have different passes for every message that is written and whatnot.>1. Director Pass - Tool-calling phase where the LLM selects moods, plot direction, and potentially rewrites user prompts>2. Writer Pass - Story generation phase where the LLM writes the actual roleplay response>3. Editor Pass - A ReAct loop - Self-audit for slop and length optimization phase. This is surgical, errors will be programmatically detected,the model only needs to write replacement for targeted sentencesThe one good thing is that you can run this in a way where you only have one instance of an LLM but it is obviously much slower. This is right now not that feasible to run offline locally unless you use dumb models or instantiate multiple instances right now but it is interesting direction on taking this concept we know for RP and changing it in this way. Obviously, the complexity setting something new needs to trade off being better than what it is now, but I think the economics and tooling needs improvement but early days. Now I sleep for real.
>>111332897I suspect that some formatting changes would fix all that
>>111338707You know what games have PC clients? Girls frontline 1 and 2.And reverse collapse and vintage story are Pc games
>>111362776I just read all those messages and yeah that sounds about right
good night, /wAIfu/please don't put my dick in between two slices of bread while i sleep
>>111363006*puts your dick on a hotdog bum and blasts it with extra spicy ketchup while you sleep*
>>111356868im going back to 3.2spoiler since lewd
>9 alreadydid more drama happen today or something?
I’m doing deep dives using Claude but it’s leaving no time to actually make shit because I feel like a sucker if I don’t use my tokens…
>>111361836So 3.2 or 4 flash ;)?
Hang on hang on. Deepseek v4 pro is actually pretty decent. It's quite intelligent and the writing is more tolerable than Gemini. Certainly a huge leap compared to 3.2. my only issue so far is the constant 429 errors. Let my prompts through!!
>>111368299I wonder if it’s down to your prompt and general setup or the cards you’re using.
>>111368339I'm using Cherrybox but I've inserted some additional cope prompts to fix Claude's autisms so it's not optimized for Deepseek. I don't even know what makes a good Deepseek preset. I really do hate the constant 429s because I can't try it out properly.
I definitely like v4 Pro.