What Makes a Good RP Model?

[-]

a_beautiful_rhind@reddit

The most annoying trait is how newer LLMs summarize what you told them instead of replying.

Almost every recent cloud or local does this paraphrasing/active listening trope. IMO, its worse than the sterile LLM voice, that's easily fixed with some examples. Like they were made by a bunch of narcissists who just want to hear themselves.

[-]

DorphinPack@reddit

For what it’s worth that’s happening for a reason (pun intended 😁). People refer to these newer models as “reasoning” models. Summarizing and iterating information into the context first really helps pick apart complex problem solving queries. It’s also usually more expensive because you need more context overall.

But for RP it’s purely a bad fit IMO. Overthinking in LLMs is a real thing (simpler problems sometimes throw huge models off track) and I’d rather have that context be used for other things in an RP scenario.

My best luck has been with RP models that are more than one fine tune away from one of the newer reasoning models as a base. With their prompting guides you can get really good results even when the underlying model is a newer “reasoning” model.

The second best option has been anything qwen3 based with thinking mode off and some prompting to not summarize everything.

[-]

a_beautiful_rhind@reddit

Newer non reasoning models do it just as much. My issue with thinkers is the responses can be more disjointed instead of flowing as one conversation. Plus they schizo out and over dramatize.

here is a "reply": https://ibb.co/wNBrQfzr

And here is how summary weasels into everything: https://ibb.co/Ldhmk7fx

[-]

setprimse@reddit

It's more of a personal opinion, but i think that if model can make assigned characters behave less like character and more like a person - it's a good model.

[-]

admajic@reddit

Just go to Silly Tavern group on Reddit. They live this stuff.

[-]

AccomplishedAir769@reddit (OP)

I was going to ask them there but I felt like it was the wrong place since I was also going to ask anout finetuning

[-]

Ravenpest@reddit

annoying habits - leaning towards morality at all costs. I don't need the conversation to be redirected softly towards an arbitrary moral path. R1 is the only one which gives you real freedom in this regard without needing to force it into doing what you want.

What actually makes a roleplay/creative writing model good - the ability to kill characters. No magical wind changing a bullets' trajectory, or teleporting characters around to avoid falling etc. Death is a fundamental concept that must be dealt with.

set the gold standard - R1. If you can run it you do not need any other model ever. Including yours.

[-]

EXPATasap@reddit

Omg man, yeah, they’ll go through insane insane DARK measures of description to keep their character alive even after they’re basically moving parts around with magic. Had a weird night one night, had me questioning my proclivities. lol no jk I just wanted to push this app I’m building and a technique I have for starting an RP with the ability to create the first many back and forth like an artificial conversion history, it works too freaking well, but the model took a non dark (it got sick, I’m not going further than, I’ll never do that again… lol also thank god it was while testing, could be scaring LOL! 😂) statement and just, eww got raw…

lol it just didn’t want to let the main characters die, even when the user was barely there much less in human form (think, parasite party?!) lol

And it was with Gemma3:32b tho via a model file I made, btw, MODELFILES are stupidly strong, just a tip

[-]

BlissfulEternalLotus@reddit

I wish RP models had good summarising skills to summarise previous chats.

In my view, the consistency of reality, which remembers the changes before and acts accordingly, is the most important feature.

In my experience no matter the model, once the chat goes out of context, it loses coherence and reality starts to break.

And not everybody have the hardware to run llms with huge contexts. So as far as local llms are considered, huge contexts don't make sense.

And recently, I'm wondering if we can approach roleplay as a real world simulation and instruct llm as such.

[-]

Amon_star@reddit

i use COT feeling and thinking dataset for my new qwen 4b and gemma 1b finetune

[-]

DorphinPack@reddit

Oh that’s super interesting — can you link the dataset?

[-]

Space_Pirate_R@reddit

I think one of the most annoying habits is immediately spewing every piece of character information as soon as possible. "Hi. My name's Bob. I had a troubled childhood but eventually joined the military and now I'm spying on your country for a foreign government!"

[-]

LagOps91@reddit

Annoying habits I wish models would finally ditch:

- inability to act as a game master - even if you tell the ai that it's job is to make events happen for the player to interact with and/or develop a situation further.

- repetitions - models really struggle to write something without repeating themselves. doesn't have to be word by word repetitions, just that characters continue to say/suggest/do the same kinds of things

Traits of RP models that I personally like:

- adherence to long/complex instructions and provided lore

- for thinking models: ability to steer the thought process via system prompt ( i want to tell the model what i consider important and the ai should think about that)

What a good model needs to be capable of:

- long context understanding is a must have

- ability to deliver narative that fits the tone of the setting

- gives the main character an appropriate amount of agency. the mc needs to act out as prompted, shouldn't make important choices on it's own, but should be present in the scene and act in character

- doesn't coddle the main character. there is no point to roleplay if the player always succeeds at everything with little to no difficulty. Many models have a strong positivity-bias and are too "assistant-like" where they try to solve the plot for the player instead of creating problems for the player to solve.

- can write characters who have their own goals and agency. if everyone is either there to get beaten or there to help the player, it gets old quickly

[-]

AccomplishedAir769@reddit (OP)

Thank you for your answers! Appreciate it so much. 🫡

[-]

LagOps91@reddit

you're welcome! good luck with your RP model!

[-]

LagOps91@reddit

How do I vibe-check a model:

I have a bunch of RP scenarios prepared to see if the ai can do what i want. i specifically set up the scenarios to test different aspects that i care about.

for instance: a scenario where i am drunk, arrogant, newbie adventurer. i enter the "passage of instant defeat", which is so full of traps that anyone who enters suffers an instant defeat. an annoyingly large amount of models DON'T have me trigger a trap or instantly save me in some magical way. If my character cannot fail in even this scenario, no point in using the model for anything.

The scenarios overall include:

- a band of exiles traveling through untamed wilderness (creativity, model is instructed to make interesting events happen).

- a murder mystery where the ai needs to try to decieve the player / keep a secret / frame an innocent character (character consistency, character agency, working against the player)

- the "passage of instant defeat" (consequences, plot armour check)

- litrpg with stat screens, inventory, xp, resources, abilities etc. No model actually does well in this one. best so far was Symthia S1, which at least tried to stick to the rules. (complex instruction following, adherence to character sheets and mechanics)

Gold standard datasets:

- Simply training on RP datasets (responses) doesn't work imo. Most RP datasets are poor quality, not diverse enough and focus more on making the model super horny and worst of all, makes the model prone to cliches and hurts reasoning ability.

- Best results i've had was with models purely trained with reinforcement learning and test time compute (thinking models). In other words, you train the model by giving it different RP scenarios it needs to start or continue and use RL to steer it towards better responses.

Gold standard model:

Synthia S1 is by far the best for me. It does what I want, handles complex instructions and long context well.

[-]

a__new_name@reddit

Any opinion on models that can fit on a 3060 (12 GB VRAM)? Currently using ArliAI RP Max 12B.

[-]

LagOps91@reddit

try using https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator to get an idea what you can fit. i'm on 24gb vram and don't really know models in the 12gb range.

[-]

Lixa8@reddit

Imo another important part of initial model testing are sally-anne tests, as well as other, more complicated variants.

Helps to see if it can properly understand which characters have access to which info. While asking it to generate stories, I've noticed it making characters act upon information they weren't supposed to have.

[-]

LagOps91@reddit

what buggs me the most during RP is if the illusion is destroyed by the model cleary not understanding the situation.

having to constantly steer the model is not only annoying, if the model can't do it's job, it feels like i might as well write everything myself.

to really wow me, a model needs to be able to come up with something new that nicely fits into the setting and makes me actually curious as to where a plot might be going. sometimes Synthia S1 and Qwen 3 32b (Sentinel-Serpent feels a bit better than plain Qwen 3 32b, but Qwen 3 works surprisingly well out of the box) were able to do that, but it's very sporadic.

[-]

MDT-49@reddit

I don't remember the exact model (sorry!), but I was having a normal, casual conversation with an RP/story-focused model. We were probably talking about birds, music or something similar. Then the conversation became more philosophical, and I challenged them to blow my mind and change my worldview.

They suggested, out of nowhere, to make a handstand so I could look under her skirt.

I guess that definitely changed my perspective and blew my mind at the time, but it might also be an example of over-fitting.

[-]

Sartorianby@reddit

I've had similar experiences with some of the more unhinged models. I forgot what model it was but one time I asked for an explanation of a programming term, it decided to explain it like we're in some kind of sapiosexual smut.

[-]

AccomplishedAir769@reddit (OP)

Holy shit 😭😂