What Makes a Good RP Model?
Posted by AccomplishedAir769@reddit | LocalLLaMA | View on Reddit | 23 comments
I’m working on a roleplay and writing LLM and I’d love to hear what you guys think makes a good RP model.
Before I actually do this, I wanted to ask the RP community here:
- Any annoying habits you wish RP/creative writing models would finally ditch?
- Are there any traits, behaviors, or writing styles you wish more RP/creative writing models had (or avoided)?
- What actually makes a roleplay/creative writing model good, in your opinion? Is it tone, character consistency, memory simulation, creativity, emotional depth? How do you test if a model “feels right” for RP?
- Are there any open-source RP/creative writing models or datasets you think set the gold standard?
- What are the signs that a model is overfitted vs. well-tuned for RP/creative writing?
I’m also open to hearing about dataset tips, prompt tricks, or just general thoughts on how to avoid the “sterile LLM voice” and get something that feels alive.
a_beautiful_rhind@reddit
The most annoying trait is how newer LLMs summarize what you told them instead of replying.
Almost every recent cloud or local does this paraphrasing/active listening trope. IMO, its worse than the sterile LLM voice, that's easily fixed with some examples. Like they were made by a bunch of narcissists who just want to hear themselves.
DorphinPack@reddit
For what it’s worth that’s happening for a reason (pun intended 😁). People refer to these newer models as “reasoning” models. Summarizing and iterating information into the context first really helps pick apart complex problem solving queries. It’s also usually more expensive because you need more context overall.
But for RP it’s purely a bad fit IMO. Overthinking in LLMs is a real thing (simpler problems sometimes throw huge models off track) and I’d rather have that context be used for other things in an RP scenario.
My best luck has been with RP models that are more than one fine tune away from one of the newer reasoning models as a base. With their prompting guides you can get really good results even when the underlying model is a newer “reasoning” model.
The second best option has been anything qwen3 based with thinking mode off and some prompting to not summarize everything.
a_beautiful_rhind@reddit
Newer non reasoning models do it just as much. My issue with thinkers is the responses can be more disjointed instead of flowing as one conversation. Plus they schizo out and over dramatize.
here is a "reply": https://ibb.co/wNBrQfzr
And here is how summary weasels into everything: https://ibb.co/Ldhmk7fx
setprimse@reddit
It's more of a personal opinion, but i think that if model can make assigned characters behave less like character and more like a person - it's a good model.
admajic@reddit
Just go to Silly Tavern group on Reddit. They live this stuff.
AccomplishedAir769@reddit (OP)
I was going to ask them there but I felt like it was the wrong place since I was also going to ask anout finetuning
Ravenpest@reddit
annoying habits - leaning towards morality at all costs. I don't need the conversation to be redirected softly towards an arbitrary moral path. R1 is the only one which gives you real freedom in this regard without needing to force it into doing what you want.
What actually makes a roleplay/creative writing model good - the ability to kill characters. No magical wind changing a bullets' trajectory, or teleporting characters around to avoid falling etc. Death is a fundamental concept that must be dealt with.
set the gold standard - R1. If you can run it you do not need any other model ever. Including yours.
EXPATasap@reddit
Omg man, yeah, they’ll go through insane insane DARK measures of description to keep their character alive even after they’re basically moving parts around with magic. Had a weird night one night, had me questioning my proclivities. lol no jk I just wanted to push this app I’m building and a technique I have for starting an RP with the ability to create the first many back and forth like an artificial conversion history, it works too freaking well, but the model took a non dark (it got sick, I’m not going further than, I’ll never do that again… lol also thank god it was while testing, could be scaring LOL! 😂) statement and just, eww got raw…
lol it just didn’t want to let the main characters die, even when the user was barely there much less in human form (think, parasite party?!) lol
And it was with Gemma3:32b tho via a model file I made, btw, MODELFILES are stupidly strong, just a tip
BlissfulEternalLotus@reddit
I wish RP models had good summarising skills to summarise previous chats.
In my view, the consistency of reality, which remembers the changes before and acts accordingly, is the most important feature.
In my experience no matter the model, once the chat goes out of context, it loses coherence and reality starts to break.
And not everybody have the hardware to run llms with huge contexts. So as far as local llms are considered, huge contexts don't make sense.
And recently, I'm wondering if we can approach roleplay as a real world simulation and instruct llm as such.
Amon_star@reddit
i use COT feeling and thinking dataset for my new qwen 4b and gemma 1b finetune
DorphinPack@reddit
Oh that’s super interesting — can you link the dataset?
Space_Pirate_R@reddit
I think one of the most annoying habits is immediately spewing every piece of character information as soon as possible. "Hi. My name's Bob. I had a troubled childhood but eventually joined the military and now I'm spying on your country for a foreign government!"
LagOps91@reddit
Annoying habits I wish models would finally ditch:
- inability to act as a game master - even if you tell the ai that it's job is to make events happen for the player to interact with and/or develop a situation further.
- repetitions - models really struggle to write something without repeating themselves. doesn't have to be word by word repetitions, just that characters continue to say/suggest/do the same kinds of things
Traits of RP models that I personally like:
- adherence to long/complex instructions and provided lore
- for thinking models: ability to steer the thought process via system prompt ( i want to tell the model what i consider important and the ai should think about that)
What a good model needs to be capable of:
- long context understanding is a must have
- ability to deliver narative that fits the tone of the setting
- gives the main character an appropriate amount of agency. the mc needs to act out as prompted, shouldn't make important choices on it's own, but should be present in the scene and act in character
- doesn't coddle the main character. there is no point to roleplay if the player always succeeds at everything with little to no difficulty. Many models have a strong positivity-bias and are too "assistant-like" where they try to solve the plot for the player instead of creating problems for the player to solve.
- can write characters who have their own goals and agency. if everyone is either there to get beaten or there to help the player, it gets old quickly
AccomplishedAir769@reddit (OP)
Thank you for your answers! Appreciate it so much. 🫡
LagOps91@reddit
you're welcome! good luck with your RP model!
LagOps91@reddit
How do I vibe-check a model:
I have a bunch of RP scenarios prepared to see if the ai can do what i want. i specifically set up the scenarios to test different aspects that i care about.
for instance: a scenario where i am drunk, arrogant, newbie adventurer. i enter the "passage of instant defeat", which is so full of traps that anyone who enters suffers an instant defeat. an annoyingly large amount of models DON'T have me trigger a trap or instantly save me in some magical way. If my character cannot fail in even this scenario, no point in using the model for anything.
The scenarios overall include:
- a band of exiles traveling through untamed wilderness (creativity, model is instructed to make interesting events happen).
- a murder mystery where the ai needs to try to decieve the player / keep a secret / frame an innocent character (character consistency, character agency, working against the player)
- the "passage of instant defeat" (consequences, plot armour check)
- litrpg with stat screens, inventory, xp, resources, abilities etc. No model actually does well in this one. best so far was Symthia S1, which at least tried to stick to the rules. (complex instruction following, adherence to character sheets and mechanics)
Gold standard datasets:
- Simply training on RP datasets (responses) doesn't work imo. Most RP datasets are poor quality, not diverse enough and focus more on making the model super horny and worst of all, makes the model prone to cliches and hurts reasoning ability.
- Best results i've had was with models purely trained with reinforcement learning and test time compute (thinking models). In other words, you train the model by giving it different RP scenarios it needs to start or continue and use RL to steer it towards better responses.
Gold standard model:
Synthia S1 is by far the best for me. It does what I want, handles complex instructions and long context well.
a__new_name@reddit
Any opinion on models that can fit on a 3060 (12 GB VRAM)? Currently using ArliAI RP Max 12B.
LagOps91@reddit
try using https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator to get an idea what you can fit. i'm on 24gb vram and don't really know models in the 12gb range.
Lixa8@reddit
Imo another important part of initial model testing are sally-anne tests, as well as other, more complicated variants.
Helps to see if it can properly understand which characters have access to which info. While asking it to generate stories, I've noticed it making characters act upon information they weren't supposed to have.
LagOps91@reddit
what buggs me the most during RP is if the illusion is destroyed by the model cleary not understanding the situation.
having to constantly steer the model is not only annoying, if the model can't do it's job, it feels like i might as well write everything myself.
to really wow me, a model needs to be able to come up with something new that nicely fits into the setting and makes me actually curious as to where a plot might be going. sometimes Synthia S1 and Qwen 3 32b (Sentinel-Serpent feels a bit better than plain Qwen 3 32b, but Qwen 3 works surprisingly well out of the box) were able to do that, but it's very sporadic.
MDT-49@reddit
I don't remember the exact model (sorry!), but I was having a normal, casual conversation with an RP/story-focused model. We were probably talking about birds, music or something similar. Then the conversation became more philosophical, and I challenged them to blow my mind and change my worldview.
They suggested, out of nowhere, to make a handstand so I could look under her skirt.
I guess that definitely changed my perspective and blew my mind at the time, but it might also be an example of over-fitting.
Sartorianby@reddit
I've had similar experiences with some of the more unhinged models. I forgot what model it was but one time I asked for an explanation of a programming term, it decided to explain it like we're in some kind of sapiosexual smut.
AccomplishedAir769@reddit (OP)
Holy shit 😭😂