Gryphe/Pantheon-Reasoning-27B · Hugging Face
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 18 comments
from Gryphe:
An experiment in bringing reasoning capability to the Pantheon roleplay series in the form of an uncensored dense Qwen 3.6 27B. This specific model can be thought of as a successor to both the Pantheon series and the one-time Codex release since I used such a large variety of data this time around.
Yet another theory being tested this time around: take the data that Pantheon is built on, pair it with full thinking traces, and let the model reason its way through character work — weighing tone, planning narrative beats, considering how a character would actually respond before committing to a line. Whether that meaningfully improves roleplay quality over a non-reasoning model is a question you'll hopefully be able to help me answer.
GGUF quants are available here.
Model details
Base model is llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved, and from what I can tell this worked out very, very nicely in regards to refusal reduction and writing capabilities.
I considered Gemma 4 31B but that model has been an absolute pain to train. Something something special snowflake architectures. (grumble, grumble)
All training sources include full reasoning traces, with thinking active across every assistant turn:
- Pantheon data (\~28%) - the core Pantheon roleplay corpus with reasoning traces back-generated using the method described below
- Opus-4.6-Reasoning-24k (\~21%) - a cleaned and deduplicated aggregation of Claude Opus 4.6 reasoning traces covering general instruction-following, STEM, and coding; provides the broad reasoning backbone
- WorldSim data (\~16%) - long-form Opus 4.6 narrative roleplay with native reasoning traces, focusing on extended storytelling, character immersion, and emergent world logic, cobbled together through various experiments - mainly third person present tense but has a bit of everything + cliché cleaned, of course!
- Text adventure data (\~16%) - high stakes interactive fiction and text adventure content with reasoning back-generated, lending the model a more grounded, prose-forward writing style
- General roleplay data (\~16%) - a broad collection of highly varied roleplay transcripts with reasoning back-generated, helping the model generalise well to arbitrary character setups
- Tiamat data (\~3%) - character and roleplay dataset originally built for Tiamat-24B-Magistral, featuring a multi-step generation/extension/improvement pipeline with critic-improver rewrites to reduce AI clichés, with reasoning back-generated for each exchange
The model was trained with preserve_thinking: true, so thinking tags remain active across all assistant turns in multi-turn conversations, not just the first.
SurpriseOk6927@reddit
reasoning traces for roleplay is a smart approach. long sessions break because models dont track character motivation across turns. curious if the thinking overhead adds noticeable latency
SurpriseOk6927@reddit
damn thats disappointing. the idea was promising but if it breaks after 5k tokens its useless for long rp sessions. appreciate the honest review saved me the download
inddiepack@reddit
I use the non-hereticised quant of 27B by unsloth(MTP), and it does track the reasoning traces, if you have a system prompt for it.
I have tried this fine tune the OP has posted and, unfortunately, it's very bad. It loses the structure and vision of the system prompt very fast (within 5k tokens), and its feature of being able to track reasoning trances, it's actually broken in comparison with the base unsloth model, as I check the thinking tokens and it's not rigorously filtering a character's answer through its own personality matrix and the conversation so far. Once you pass 10-15k tokens, the reasoning starts being shorter and worse as well.
Although this one is unusable for me, massive appreciation for the Gryphe's work and effort.
SurpriseOk6927@reddit
damn thats disappointing was hoping itd hold context better have you tried the unsloth base model without the fine tune curious if the falloff is inherent or just this version
korino11@reddit
Why lot of ppl trying to feed data from antrhopic...Is not much better to tune the model by finding oportunity in it architecture?
TheRealMasonMac@reddit
Data quality is more important.
korino11@reddit
I think data on a last place. Because at first need to tweak architecture, find new solutions. And only after that we can load any data...
ComplexType568@reddit
I really hope those opus "reasoning traces" aren't the fake reasoning traces Anthropic outputs.
pinkyellowneon@reddit
what else would they be
Iwaku_Real@reddit
Exactly
PunnyPandora@reddit
https://i.redd.it/q1pfz0yqbb4h1.gif
LLMFan46@reddit
👍🤌
IrisColt@reddit
Interesting, I'm definitely going to benchmark this. Thanks!!!
Opening-Ad6258@reddit
is thinking fast ?
LLMFan46@reddit
That depends entirely on your hardware.
Gryphe@reddit
I have a lot more planned for this particular roleplay reasoning approach but wanted to get a first version out and about to help me determine whether it's even worth investigating in the first place!
jacek2023@reddit (OP)
I didn’t know you had an account here 😄
Kahvana@reddit
Looks really nice! You might want post this on r/SillyTavernAI also.