Help finetuning my own RP model

Posted by VerdoneMangiasassi@reddit | LocalLLaMA | View on Reddit | 9 comments

Hello, i've been exploring the LLM world in the past weeks and i really want to try and work on my own model for roleplay to meet my standards instead of constantly trying out models built by others

Although, it's my first time at this and i'd really use some help. As of now I'm educating myself on how fine tuning works, and this includes asking you guys here

My priority for the model is coherency, i want to make a model able to make solid logical connections between pieces of data it's given

Any advice is welcome

Thanks in advance

[-]

Wibong@reddit

For RP fine-tuning, check out this dataset I made:
huggingface.co/datasets/beyoru/Aesir-Character-CoT-roleplay
(convos distilled from DeepSeek V4 Pro)

[-]

llama-impersonator@reddit

a warning: once you enter this rabbit hole, you can't leave.

seriously though, step one is collecting data. for RP, that means long sessions with a large model you can tolerate, or long sessions with a smaller model where you manually edit out all of the behaviors you don't like in the responses. once you've compiled a big (and I mean it, you need a lot of tokens) set of sessions, you can start learning all of the technical details.

the reason i say start with data is that data quality is the single most important factor in how well your finetune works, far more important than the other details. you can have middling quality in hyperparameter selection and quality data and the end result will be better than perfect hyperparameters and bad data.

start with qlora, cuz fulltune is far more expensive.

[-]

VerdoneMangiasassi@reddit (OP)

Are we talking hundreds, or thousands of exchanges?

Also does it matter how many characters are in there, or just the style matters?

[-]

llama-impersonator@reddit

magnums were trained with thousands of RP convos mixed with instruct data, but they were from the c2 proxy, so lower quality. cherry picking the best ones probably would have improved the models. you definitely want a couple hundred or you're not going to make much impact on the model.

[-]

UnionVortex@reddit

I recently entered the rabbit hole and I think I can share my findings. Unsloth recommends 100 dataset rows at a minimum. I thought they were being overly conservative and I tried to do it incrementally, first 30, then 50, then 70... Long story short, my model only started to "sound like" the dataset at ~140 rows.

Also: do use a separate validation set. Due to token sampling, even if you overtrain it's unlikely your model will sound the way you want. Having a separate eval dataset gets you a more realistic picture of how your training is going.

To give a real example: I did a training run where I got my training loss down to ~1e-3 with a dataset of 60 rows, and then testing it against the training set still didn't give any meaningful results. Meanwhile, my latest run (160 rows) got to a evaluation loss of ~2.9 and it is already decent even on unseen test cases.

Another thing to consider are the hyperparams, expect to tweak them... A lot...

[-]

HopePupal@reddit

https://huggingface.co/datasets/MiniMaxAI/role-play-bench roleplay datasets look like this. this one's a benchmark, but it's worth reading the article it came from https://www.minimax.io/news/a-deep-dive-into-the-minimax-m2-her-2 to get an idea of how roleplay models are trained/tuned. part of it's marketing fluff but there's still a bunch of useful criteria, considerations, etc. on what makes a good roleplay model.

tl;dr: you're going to need a lot of example RP sessions

[-]

YT_Brian@reddit

Wouldn't using a Lora and maybe rag be easier? My very basic understanding for fine tuning is you use very specific data to train the llm on, it then can pull from said information.

I'd read this first before going further. Tldr, make a correctly made dataset (json based I believe) via maybe using character llm and other such.

You could write out an entire book yourself, every character, history, how they act with multiple examples in detail, the setting/world in depth, maybe multiple full dungeons in written form and use those as a basis for possible random dungeon creation, etc.

[-]

RanklesTheOtter@reddit

Unsloth Studio makes it pretty easy.

[-]

Miriel_z@reddit

One of my next priorities is to learn how to retrain and finetune the model. Any guide is appreciated, I second OP.