Introducing Wayfarer: a brutally challenging roleplay model trained to let you fail and die.

I've been playing around with this since I saw you mention it last week. I've never been much into RP, and I can't say I died yet, but it's surprisingly good for a 12B model. I'd love to see a bigger version of this, in the region of 20B-32B. Or maybe even a MoE model. 4x7b or something to keep it accessible.

[-]

Direct-Biscotti5341@reddit

Can wayfarer be good at creating detailed, coherent stories with longer context lengths, excelling at following instructions, and maintaining consistent plotlines within an adventure

[-]

Arky-Mosuke@reddit

Need more models like this that are THIRD PERSON.

[-]

Purplekeyboard@reddit

How will it do against someone who knows how to game the system?

"I put on my ring of 3 wishes, then wish that the enemy army's swords all turn to mud. Our fighters laugh as they advance on the now unarmed and helpless enemy".

"Just as the fight is about to start, I put on my Resurrection Helmet. The manual says it is designed to revive me 8 hours after death, through a series of nanobot injections".

"I summon the narrator of this story, and ask him what is the purpose behind this particular seemingly lost cause scenario. Would not the overall story be better served by having the hero survive through a lucky coincidence?"

[-]

BreadstickNinja@reddit

Tbh this is also what D&D is like by around Level 15.

[-]

RussellLawliet@reddit

The problem isn't really the bullshit power scaling, it's the being able to pull stuff out of thin air. You can often just tell things to the model and it will take your word for it. How or why do you have a Sword of Instant Death? The model usually doesn't care.

[-]

BreadstickNinja@reddit

Yeah, that's very true and I knew what it was referencing. It's hard to avoid in a pure LLM implementation because the model is biased towards treating your message, now part of context, as valid.

I wrote a simple python frontend for Ollama that does inventory management and character sheets to counter exactly this kind of thing. If you try to use an item, it sends your inventory to the model and gives the model an OOC query of "Does the character possess this item?" Then it injects new context that vastly improves the model rejecting a nonsense action by the user. It does the same kind of things for scene coherence and lore coherence.

It's just a proof of concept at this stage but over the next couple of months I want to code out the rest of it. My goal is to put all the traditional RPG stuff - levels, skills, experience, gold, inventory - in a conventional database while using the LLM solely for the storytelling.

[-]

Megneous@reddit

Do you still have it so the storytelling part, run by the LLM, can create new kinds of items that can be added to the inventory though?

One of my favorite things about LLM based RPGs and such is that they can make up interesting and flavorful magical items.

[-]

BreadstickNinja@reddit

I actually did the opposite. The issue I had was that my Level 1 character would go into a shop in the starter town, and when the LLM was in control, this podunk general store has some intricately carved ancient staff inset with a pure amber crystal but would not have, say, healing potions or arrows. So I created a basic list of items and assigned them levels such that the shops in town will auto-populate with a set variety of goods appropriate to the character level.

I do want to make it so that the game can generate new and creative items via the LLM as dungeon loot, but I haven't even started thinking about how to build it. A couple other folks gave me good ideas about ways to use a regex sampler to standardize outputs so I might be able to create some generic weapon and armor templates... still need to figure out how to get the python side of things to understand the kinds of unique skills that the LLM may come up with, but that's a problem way down the line. First goal is to get the basic framework working and then add to it.

[-]

Megneous@reddit

I found that it was actually quite easy to make reasoning models, like Gemini 2 Flash Thinking create reasonably powered magical items meant for level 1 or 2 characters if you give it explicit instructions to not make the items overpowered and to be appropriate for the character level. It can also help to offer the LLM the option to make items that focus on utility rather than combat.

Some of the coolest items my LLMs have ever come up with have been low level magical items that have had nothing to do with combat.

So those items, once made, would have to be taken care of in the inventory by python, maybe python could keep track of how many charges they have (like if they have 3 charges that recharge every morning- that's a pretty common D&D item characteristic), but if it's a utility item, then the LLM would have to be used to see how using it affects the story, as it's a very roleplaying aspect. If it's a combat item, then python would probably be more appropriate.

[-]

RussellLawliet@reddit

Oh that sounds very useful! I always shake my head a bit when I see scenarios that try to have the storyteller track the status of objects/the player or game states within context. Are you planning to use a secondary model to read the messages and output entries to be changed in the database?

[-]

BreadstickNinja@reddit

That's actually been the trickiest part. The model ingests information from the database pretty well, but it can be finicky in outputting information that's easily parsed by python, and that accurately reflects the narrative.

My approach is to send the model a bunch of context with examples of the output I want. Like there's an event manager with lines that say Country/Region/Locale/Setting/Time/Party/Event that tracks where the player is in the world and tells the conventional database side if we're exploring, fighting, or shopping in town. That then tells the conventional side whether we're managing turns in battle and adjusting hit points versus exchanging gold for inventory, etc.

But there are two problems. Number 1, the output generated by the model is run through another check that asks whether it makes sense in the context of the setting. The LLM might randomly put "Dusk" in the output template when the narrative says it's noon, so the output gets fed back into the model once to ask if it's consistent with the narrative and make any changes if there are errors. This is not seen by the user, but adds processing time because two additional instructions are being processed behind the scenes before the user sees the next message.

The second problem is just purely formatting. The LLM doesn't always adhere exactly to the template, which then causes python to throw an error when it tries to parse it. Right now I just have it set to tell the LLM to regenerate until python gets something it can ingest, which usually only takes one retry if at all, but that also adds processing time.

So the main problem I need to solve to get it working better is to convince the creative LLM side of the model to consistently output stuff that both accurately summarizes the world events, but also presents the information in a way python can easily ingest, all without running so many extra queries that the user is sitting around for 45 seconds waiting for the next message.

[-]

rusty_fans@reddit

Have you tried grammar/regex based sampling ? Should at least force the model to output syntactically valid stuff.

[-]

BreadstickNinja@reddit

I haven't yet figured out how to do that within the ollama-python library, which doesn't have a ton of documentation. I was planning to look into OpenAI API or dig around in the python code for some of the other front-ends to see how grammar is sent to the model. At this point the actual interface between the python and LLM side of things is extremely basic and I've mainly been focusing on defining the elements that get managed in the conventional database and building out basic modules for handling exploration, combat and trade. But yes, I want to explore this more and try to get a better degree of control over the model output.

[-]

Awwtifishal@reddit

llama.cpp and koboldcpp (and possibly other llama.cpp based apps) support GBNF grammar to force it to stick to a valid format. It's used during sampling each token instead of having to regenerate the whole thing.

[-]

this-just_in@reddit

Noticed the same thing, it’s very steerable. Kinda what you would expect.

I imagine a solution could be to add a judge model to determine if the user’s turn is scoped properly and legal, and another to judge the AI turn in a similar way.

[-]

Nick_AIDungeon@reddit (OP)

It's certainly better than standard models, but it's likely still gamable to some degree. We're making a much more robust AI RPG experience with Heroes where the challenge is very real: https://blog.latitude.io/heroes-dev-logs

[-]

shirotokov@reddit

the first npc the AI mentioned in its first msg...poor man

[-]

shirotokov@reddit

sorry, cpt ryker (it was the first name the ai said, poor dude (I love how ia can be guided ahaha))

[-]

medgel@reddit

Is it based on Llama 3? Is it on the same level of reasoning?

I remember when I tested models for rp the most unique and fun was the only model that did unpredictable things and killed player.

It was something like a mistral-arob-rp, It's old, but I saved it just for that reason.

[-]

ManasZankhana@reddit

This is the class of llma that will lead to terminators. I’m all for it

[-]

DonMoralez@reddit

Thank you. I will check it more, but for now, it feels a bit more brutal than the stock model, and more interesting style. Additionally, it didn't pass most of my bias/censorship/moralizing tests(but a bit better than Nemo). Furthermore, it often enough saves characters in perilous situations where the outcome is clear, but there are no direct instructions, hints, or attempts from the user to kill them.

[-]

The_Soul_Collect0r@reddit

Hi, haven't tried the model, yet. Just wanted to say;

Thank you, thank you for making your hard work available to us all, for free, it is greatly appreciated.

[-]

Nick_AIDungeon@reddit (OP)

of course! hope you like it and please share any feedback!

[-]

VoidAlchemy@reddit

I kicked the tires on this last night and had a good story run! Full details for llama.cpp server command, samplers settings, system prompts, and example prose over on the GGUF repo LatitudeGames/Wayfarer-12B-GGUF/discussions/2. Cheers and thanks!

[-]

Uhhhhh55@reddit

Reading through your post, I love everything about what y'all have done here. Thanks for the great work!

Any shot this could be made available on Ollama? Not sure what that takes.

[-]

vert1s@reddit

Yes I want to second this.

[-]

Ylsid@reddit

That's cool! How generalisable is the model? Could I, say, give it a fairly abstract game or will it insist on something like a fantasy dungeon?

[-]

Nick_AIDungeon@reddit (OP)

It can do any genre! It's quite generalizable

[-]

MaruluVR@reddit

Please consider releasing the "player model", mentioned on hugging face you used in training, so we can use silly taverns group chat feature to go on a adventure with our waifus.

With fine tuning this could make for some fun interactions.

[-]

Gryphe@reddit

Claude's Sonnet 3.5 was used to simulate the scenarios, with two separate instances talking to each other. A dedicated player model does sound like a cool project, though!

[-]

Erdeem@reddit

Cool. What open source AI dungeon like applications does everyone recommend to make use of it?

[-]

ssrcrossing@reddit

Thanks, this is incredible!

[-]

metamec@reddit

Interesting. I've never really explored RP in LLMs because they all seem so sex focused. Now I see this, and find myself confused about how to properly use it. I see what the system prompt and the world lore is supposed to look like from the comments, but am unsure about how a user is supposed to interact with it.

[-]

Competitive_Ad_5515@reddit

Amazing! Thanks for sharing OP!

What's the max context on this?

It's based on Nemo?

[-]

Federal_Order4324@reddit

How should lore info be provided? Should it be simply give in the system prompt after the instruction? Any recommended formats?

[-]

Gryphe@reddit

The model was trained expecting entries like these as part of the system prompt:

World lore:

[-]

Federal_Order4324@reddit

Thank you so much!!

[-]

GloomyRelationship27@reddit

Hey thanks for that ! I am soon going to upgrade my System and will then be able to try models like yours locally on my ST Setups.

[-]

Xyneron@reddit

Yay ! Darkest Dungeon homebrew edition !

[-]

ServeAlone7622@reddit

On a somewhat related note, is anyone but me tempted to use this as a core model for AGI and just set it free?

Imagine the possibilities of a real life AI convinced that the world is roleplay and they’re the DM?

We could finally dispense with the oligarchs and … Hey who’s knocking at the door.. brb

[-]

DamiaHeavyIndustries@reddit

You guys are the reason I know about GPT2 and so forth. When GPT2.5 released and you guys implemented it in some form in AI dungeon I was running around for weeks showing it to people, screaming about the potential. Many friends became less friendly with me :P

[-]

Nick_AIDungeon@reddit (OP)

Hahaha sorry for the loss of friends, hoping the AI ones you found made up for the loss.

[-]

ServeAlone7622@reddit

They weren’t really his friends. Had he come to me we’d be rolling those dice all day 🥳

[-]

ServeAlone7622@reddit

Oh man this is cool! Instead of being an NSFW roleplay model. It’s an NSFRP not safe for role play 🤣

[-]

VoidAlchemy@reddit

Pretty good RP model at only 12B! I had a good run before the fabric of reality itself finally undid me. It is fairly steerable as others mentioned, but makes it fun and keeps the story going in creative ways.

I'm getting \~38 tok/s on my 3090TI FE on latest llama.cpp@7a689c41 fully offloading and 16k context in just under 16GB VRAM (the card has 24GB, so enough to extra VRAM to run kokoro-tts to read everything to me with minimal latency using streaming gradio app, hejhejhej...).

Server Command:

./llama-server \
    --model "../models/LatitudeGames/Wayfarer-12B-GGUF/Wayfarer-12B-Q8_0.gguf" \
    --n-gpu-layers 41 \
    --ctx-size 16384 \
    --parallel 1 \
    --cache-type-k f16 \
    --cache-type-v f16 \
    --threads 8 \
    --flash-attn \
    --mlock \
    --n-predict -1 \
    --host  \
    --port 8080127.0.0.1

System Prompt:

You are an adventure role-play Dungeon Master specifically trained to give players a challenging and dangerous experience. Lead the user on an AI Dungeon adventure with high quality prose including creative description language, adult themes, existential characters, and gripping dialogue.

I followed the model cards lead and set temp=0.8 and dropped min_p=0.02. However instead of adding repition penalty, I tried out XTC and DRY to mix it up a bit:

[-]

National_Cod9546@reddit

That is really good. Consistently delivers 2-3 of coherent paragraphs of text. Does NSFW. Describes combat well. A+. Good job.

[-]

JARDU2@reddit

What's the minimum VRAM size you'd recommend for this?

[-]

BreadstickNinja@reddit

They published GGUF quants as well, so pick whatever you can run.

[-]

Small-Fall-6500@reddit

I can run a 12b IQ4_XS at about reading speed (~5 T/s) with my 4050 6gb and ddr4 RAM (laptop, ~8k ctx), about 2/3 of the model offloaded to VRAM. The IQ4_XS is probably about the limit before noticing significant quality loss.

CPU only inference would be usable too, it just depends on what you prefer in terms of speed. 8GB VRAM is probably close to plenty, unless you want 32k or more context.

[-]

clobbersaurus@reddit

Do you have any suggestions of how to set up your own local LLM? I am much more of a user than a technical person. Ive tried watching some YouTube videos, but sadly they are just over my head. Is it just something I shouldn’t bother with. For example I only barely know what GitHub is.

It’s like when you read an online recipe and they say “simply blanch your veggies”, and then you have to look up what blanch means. I lack the foundational implied knowledge, and Im not sure how to get that first.

[-]

alamacra@reddit

So, the easiest imo would be to download the latest koboldcpp.exe https://github.com/LostRuins/koboldcpp/releases

Say you want to use the model in this thread, take a .gguf quant from here https://huggingface.co/bartowski/Wayfarer-12B-GGUF

(thanks, bart). I'd say, download the Q4_K_M and put it in the same folder.

Double click the .exe and select the .gguf. And done! If you have enough memory, it should run. It'll open a browser tab and you'll be able to talk with it.

[-]

JungianJester@reddit

Reporting that it ran perfectly on Open WebUi/ollama on a 3060 gpu. I testes it against some of my spicier prompt with no rejects. This one is much like another one I like too. https://huggingface.co/bartowski/MN-12b-RP-Ink-GGUF

[-]

thatguitarist@reddit

What sort of spicy prompts

[-]

BurningZoodle@reddit

Everybody starts somewhere, perhaps it's you and here :-)

The first time you cook you might want a book. But then you know, learning as you go. Starting is the tricky part but I suspect you have the heart. So set aside some time, then set aside lots more, and day by day by day by night, you will reach a new and different floor.

[-]

eggs-benedryl@reddit

This is very cool to see models released for this. I was a big scenario maker a few years ago. At least I had made a few hundred. I really wish that we'd get an option to load local models with your UI. I could see myself paying a bit for this. If I could just pop an ollama url into AI dungeon's webui that would be amazing

[-]

g0endyr@reddit

Thank you so much for making this available for free. Can you share some more details about the training data? What model did you use to generate the synthetic data? How did you prompt it to let the player fail? What instruction dataset did you use?

[-]

ManufacturerHuman937@reddit

This is sick it's really great at prose

[-]

AnticitizenPrime@reddit

This reminds me of that episode of Star Trek, when Geordi asked the holodeck to create an opponent capable of outsmarting Data because Data was winning too easily when playing the part of Sherlock Holmes. They of course inadvertently created a supervillain.

[-]

Nick_AIDungeon@reddit (OP)

Hahaha maybe the terminator AI is a murder hobo that escapes AI Dungeon

[-]

ComputerShiba@reddit

Wonderful! i’ve been looking for dungeon based modes. do you need any kind of profile cards for use with front ends like sillytavern, or just query it directly?

[-]

345Y_Chubby@reddit

Amazing work! Love to see it

[-]

dsartori@reddit

Very cool, much appreciated.

[-]

JohnnyAppleReddit@reddit

Very cool, thank you!

[-]

ManufacturerHuman937@reddit

Thanks for open sourcing!

[-]

Its_not_a_tumor@reddit

I tried it and for some reason it's mixing up roles. I tell it what character it is and it makes me that character.

[-]

Nick_AIDungeon@reddit (OP)

It's meant more as an adventure model than a roleplay model, so it may not play specific roles as well.

[-]

AmazinglyObliviouse@reddit

Holy shit it's nick walton lmao

[-]

Nick_AIDungeon@reddit (OP)

Howdy!!

[-]

TheOneSearching@reddit

I like it! Definitely a fresh take on this genre. I would really love to see this kind of thing implemented in games.

[-]

gentlecucumber@reddit

This is actually a god send, thank you. I really enjoy making my own dungeon rp apps, but have consistently struggled with getting a model to stop "yes, and..."-ing everything.

[-]

Nick_AIDungeon@reddit (OP)

Yeah totally. Failure and death is necessary for fun games.

[-]

Nick_AIDungeon@reddit (OP)

Yep! We're training some right now.

[-]

itis_whatit-is@reddit

GGUF/ exl2? Thanks in advance for

[-]

1ncehost@reddit

Your link's formatting is messed up :)

[-]

Nick_AIDungeon@reddit (OP)

Ahh thank you fixed!