/v/ - experimenting with LLMs - Video Games

Anonymous

experimenting with LLMs 09/16/25(Tue)10:44:29 No.720820562

File: 346908124985043.png (1.08 MB, 2551x1325)

experimenting with LLMs Anonymous 09/16/25(Tue)10:44:29 No.720820562 Archived

Has anyone experimented with using LLMs for table top RPGs? I've been trying to make them work as a Game Master with various models over the years but I haven't really been able to "get there".
Kimi K2 0905 is very, very close to usable. With the right prompting you can have say a large model being served by a provider, and a small local model (I use Qwen3 4b 2507) to read the rulebooks and validate what's being said. Kimi K2 mechanically "knows" Pathfinder 2E without the smaller knowledge agent, it just hallucinates some of the details.
My current setup:
>Take all the current Pathfinder 2E rulebooks (yes, even legacy content)
>convert PDF to text naively, tried vllms but to get good ocr the only model is minicpm and that will take either $$$ or weeks of time processing
>generated a bunch of qa pairs, ~20,000 or so, from both a scrape of AoN and the converted pdfs
>finetuned a sparse embedding model (a variant of SPLADE v2) on the qa pairs- basically, it's an advanced form of keyword search
>at inference, run a combination of dense+sparse embeddings of the qa pairs, the AoN scrape data, and the rulebooks; deduplicate, and run them through a reranker
>feed this to qwen3 4b ('rules guy')
>the GM can prompt 'rules guy', but 'rules guy' also double checks character sheets etc.

It can make character sheets pretty quick and it can be pretty consistent with the rules as both a GM and a player. There are some gripes and it doesn't follow rules 100% of the time

Anonymous
09/16/25(Tue)10:46:21 No.720820682

Anonymous 09/16/25(Tue)10:46:21 No.720820682

>>720820562
Oh and there are a bunch of parsing steps inbetween. I had to take the 'raw' converted pdf-to-text and convert it into markdown. Pretty sure that cost me like $20 in openrouter

Anonymous
09/16/25(Tue)11:01:03 No.720821602

Anonymous 09/16/25(Tue)11:01:03 No.720821602

i assumed there were already tabletop trained LLMs, seems like a no brainer to do, and it’s legal to train on modules despite copyright so you could even sell it

Anonymous
09/16/25(Tue)11:02:09 No.720821685

Anonymous 09/16/25(Tue)11:02:09 No.720821685

>>720820562
>Kimi
Use Gemini instead, it keeps track of info better

Anonymous
09/16/25(Tue)11:11:39 No.720822276

Anonymous 09/16/25(Tue)11:11:39 No.720822276

>>720821685
>jewgle
>proprietary
>nigger worship bias

Anonymous
09/16/25(Tue)11:14:17 No.720822460

Anonymous 09/16/25(Tue)11:14:17 No.720822460

>>720821602
big problem is consistency, you can see in the image that it didn't list the implement when generating the character. You can't give it web search either, web search is either expensive or you look up the web archive which can be either outdated or takes a long time to load. Even if I did, it would just return AoN results anyway which we already have. I think I just need to chunk correctly, grab the whole page of the top 1 or 2 results rather than just the 512 token chunks.

>>720821685
I use Chutes since it's 2000 api calls a day for $10 a month. Chutes is crap but it's cheap. In my experience, Kimi produces a lot less slop.

Anonymous
09/16/25(Tue)11:20:02 No.720822857

Anonymous 09/16/25(Tue)11:20:02 No.720822857

>>720821602
I realised I didn't really answer your question but basically... RAG is generally very hard. It's a bit easier for this case since Pathfinder rulebooks are designed so you can look up stuff in seconds, but with things like character creation etc. there's a lot of data to gather.

Worst case scenario, I end up finetuning a dense embedding model on page-long text contents. But then I'd have to generate synthetic pairs for questions with what pages answer them, PLUS an eval test set, and finetuning even a 0.6B embedding model, with QDoRA, can take up to a week on my poorfag rtx 4070 with no guaranteed results.