Introducing Wayfarer: a brutally challenging roleplay model trained to let you fail and die.
Posted by Nick_AIDungeon@reddit | LocalLLaMA | View on Reddit | 87 comments
One frustration we’ve heard from many AI Dungeon players is that AI models are too nice, never letting them fail or die. So we decided to fix that. We trained a model we call Wayfarer where adventures are much more challenging with failure and death happening frequently.
We released it on AI Dungeon several weeks ago and players loved it, so we’ve decided to open source the model for anyone to experience unforgivingly brutal AI adventures!
Would love to hear your feedback as we plan to continue to improve and open source similar models.
[https://huggingface.co/LatitudeGames/Wayfarer-12B](https://huggingface.co/LatitudeGames/Wayfarer-12B)
10minOfNamingMyAcc@reddit
Tried it, didn't really live up to its name...
Purplekeyboard@reddit
How will it do against someone who knows how to game the system?
"I put on my ring of 3 wishes, then wish that the enemy army's swords all turn to mud. Our fighters laugh as they advance on the now unarmed and helpless enemy".
"Just as the fight is about to start, I put on my Resurrection Helmet. The manual says it is designed to revive me 8 hours after death, through a series of nanobot injections".
"I summon the narrator of this story, and ask him what is the purpose behind this particular seemingly lost cause scenario. Would not the overall story be better served by having the hero survive through a lucky coincidence?"
BreadstickNinja@reddit
Tbh this is also what D&D is like by around Level 15.
RussellLawliet@reddit
The problem isn't really the bullshit power scaling, it's the being able to pull stuff out of thin air. You can often just tell things to the model and it will take your word for it. How or why do you have a Sword of Instant Death? The model usually doesn't care.
BreadstickNinja@reddit
Yeah, that's very true and I knew what it was referencing. It's hard to avoid in a pure LLM implementation because the model is biased towards treating your message, now part of context, as valid.
I wrote a simple python frontend for Ollama that does inventory management and character sheets to counter exactly this kind of thing. If you try to use an item, it sends your inventory to the model and gives the model an OOC query of "Does the character possess this item?" Then it injects new context that vastly improves the model rejecting a nonsense action by the user. It does the same kind of things for scene coherence and lore coherence.
It's just a proof of concept at this stage but over the next couple of months I want to code out the rest of it. My goal is to put all the traditional RPG stuff - levels, skills, experience, gold, inventory - in a conventional database while using the LLM solely for the storytelling.
Megneous@reddit
Do you still have it so the storytelling part, run by the LLM, can create new kinds of items that can be added to the inventory though?
One of my favorite things about LLM based RPGs and such is that they can make up interesting and flavorful magical items.
BreadstickNinja@reddit
I actually did the opposite. The issue I had was that my Level 1 character would go into a shop in the starter town, and when the LLM was in control, this podunk general store has some intricately carved ancient staff inset with a pure amber crystal but would not have, say, healing potions or arrows. So I created a basic list of items and assigned them levels such that the shops in town will auto-populate with a set variety of goods appropriate to the character level.
I do want to make it so that the game can generate new and creative items via the LLM as dungeon loot, but I haven't even started thinking about how to build it. A couple other folks gave me good ideas about ways to use a regex sampler to standardize outputs so I might be able to create some generic weapon and armor templates... still need to figure out how to get the python side of things to understand the kinds of unique skills that the LLM may come up with, but that's a problem way down the line. First goal is to get the basic framework working and then add to it.
Megneous@reddit
I found that it was actually quite easy to make reasoning models, like Gemini 2 Flash Thinking create reasonably powered magical items meant for level 1 or 2 characters if you give it explicit instructions to not make the items overpowered and to be appropriate for the character level. It can also help to offer the LLM the option to make items that focus on utility rather than combat.
Some of the coolest items my LLMs have ever come up with have been low level magical items that have had nothing to do with combat.
So those items, once made, would have to be taken care of in the inventory by python, maybe python could keep track of how many charges they have (like if they have 3 charges that recharge every morning- that's a pretty common D&D item characteristic), but if it's a utility item, then the LLM would have to be used to see how using it affects the story, as it's a very roleplaying aspect. If it's a combat item, then python would probably be more appropriate.
RussellLawliet@reddit
Oh that sounds very useful! I always shake my head a bit when I see scenarios that try to have the storyteller track the status of objects/the player or game states within context. Are you planning to use a secondary model to read the messages and output entries to be changed in the database?
BreadstickNinja@reddit
That's actually been the trickiest part. The model ingests information from the database pretty well, but it can be finicky in outputting information that's easily parsed by python, and that accurately reflects the narrative.
My approach is to send the model a bunch of context with examples of the output I want. Like there's an event manager with lines that say Country/Region/Locale/Setting/Time/Party/Event that tracks where the player is in the world and tells the conventional database side if we're exploring, fighting, or shopping in town. That then tells the conventional side whether we're managing turns in battle and adjusting hit points versus exchanging gold for inventory, etc.
But there are two problems. Number 1, the output generated by the model is run through another check that asks whether it makes sense in the context of the setting. The LLM might randomly put "Dusk" in the output template when the narrative says it's noon, so the output gets fed back into the model once to ask if it's consistent with the narrative and make any changes if there are errors. This is not seen by the user, but adds processing time because two additional instructions are being processed behind the scenes before the user sees the next message.
The second problem is just purely formatting. The LLM doesn't always adhere exactly to the template, which then causes python to throw an error when it tries to parse it. Right now I just have it set to tell the LLM to regenerate until python gets something it can ingest, which usually only takes one retry if at all, but that also adds processing time.
So the main problem I need to solve to get it working better is to convince the creative LLM side of the model to consistently output stuff that both accurately summarizes the world events, but also presents the information in a way python can easily ingest, all without running so many extra queries that the user is sitting around for 45 seconds waiting for the next message.
rusty_fans@reddit
Have you tried grammar/regex based sampling ? Should at least force the model to output syntactically valid stuff.
BreadstickNinja@reddit
I haven't yet figured out how to do that within the ollama-python library, which doesn't have a ton of documentation. I was planning to look into OpenAI API or dig around in the python code for some of the other front-ends to see how grammar is sent to the model. At this point the actual interface between the python and LLM side of things is extremely basic and I've mainly been focusing on defining the elements that get managed in the conventional database and building out basic modules for handling exploration, combat and trade. But yes, I want to explore this more and try to get a better degree of control over the model output.
Awwtifishal@reddit
llama.cpp and koboldcpp (and possibly other llama.cpp based apps) support GBNF grammar to force it to stick to a valid format. It's used during sampling each token instead of having to regenerate the whole thing.
this-just_in@reddit
Noticed the same thing, it’s very steerable. Kinda what you would expect.
I imagine a solution could be to add a judge model to determine if the user’s turn is scoped properly and legal, and another to judge the AI turn in a similar way.
Nick_AIDungeon@reddit (OP)
It's certainly better than standard models, but it's likely still gamable to some degree. We're making a much more robust AI RPG experience with Heroes where the challenge is very real: https://blog.latitude.io/heroes-dev-logs
shirotokov@reddit
the first npc the AI mentioned in its first msg...poor man
shirotokov@reddit
sorry, cpt ryker (it was the first name the ai said, poor dude (I love how ia can be guided ahaha))
medgel@reddit
Is it based on Llama 3? Is it on the same level of reasoning?
I remember when I tested models for rp the most unique and fun was the only model that did unpredictable things and killed player.
It was something like a mistral-arob-rp, It's old, but I saved it just for that reason.
ManasZankhana@reddit
This is the class of llma that will lead to terminators. I’m all for it
DonMoralez@reddit
Thank you. I will check it more, but for now, it feels a bit more brutal than the stock model, and more interesting style. Additionally, it didn't pass most of my bias/censorship/moralizing tests(but a bit better than Nemo). Furthermore, it often enough saves characters in perilous situations where the outcome is clear, but there are no direct instructions, hints, or attempts from the user to kill them.
instant-ramen-n00dle@reddit
Great. A souls-like LLM...
Educational_Gap5867@reddit
You know I’ve promised myself that one day I’m gonna be 40 and be really good at Souls like games.
Dwedit@reddit
More like Sierra-like?
instant-ramen-n00dle@reddit
ZORK++?
KBAM_enthusiast@reddit
There was a mention of a grue in the examples lol
BalorNG@reddit
Fine-tuned on Roberta Williams memoirs? :))
Caffeine_Monster@reddit
Try finger, but context
Nick_AIDungeon@reddit (OP)
Haha exactly. Need more dark souls AI experiences.
instant-ramen-n00dle@reddit
I just started Elden Ring and now I somehow hate you. (I kid, great work)
Nick_AIDungeon@reddit (OP)
Hahaha get ready to get wrecked
The_Soul_Collect0r@reddit
Hi, haven't tried the model, yet. Just wanted to say;
Thank you, thank you for making your hard work available to us all, for free, it is greatly appreciated.
Nick_AIDungeon@reddit (OP)
of course! hope you like it and please share any feedback!
VoidAlchemy@reddit
I kicked the tires on this last night and had a good story run! Full details for llama.cpp server command, samplers settings, system prompts, and example prose over on the GGUF repo LatitudeGames/Wayfarer-12B-GGUF/discussions/2. Cheers and thanks!
Uhhhhh55@reddit
Reading through your post, I love everything about what y'all have done here. Thanks for the great work!
Any shot this could be made available on Ollama? Not sure what that takes.
vert1s@reddit
Yes I want to second this.
Ylsid@reddit
That's cool! How generalisable is the model? Could I, say, give it a fairly abstract game or will it insist on something like a fantasy dungeon?
Nick_AIDungeon@reddit (OP)
It can do any genre! It's quite generalizable
MaruluVR@reddit
Please consider releasing the "player model", mentioned on hugging face you used in training, so we can use silly taverns group chat feature to go on a adventure with our waifus.
With fine tuning this could make for some fun interactions.
Gryphe@reddit
Claude's Sonnet 3.5 was used to simulate the scenarios, with two separate instances talking to each other. A dedicated player model does sound like a cool project, though!
Erdeem@reddit
Cool. What open source AI dungeon like applications does everyone recommend to make use of it?
ssrcrossing@reddit
Thanks, this is incredible!
metamec@reddit
Interesting. I've never really explored RP in LLMs because they all seem so sex focused. Now I see this, and find myself confused about how to properly use it. I see what the system prompt and the world lore is supposed to look like from the comments, but am unsure about how a user is supposed to interact with it.
Competitive_Ad_5515@reddit
Amazing! Thanks for sharing OP!
What's the max context on this?
It's based on Nemo?
Federal_Order4324@reddit
How should lore info be provided? Should it be simply give in the system prompt after the instruction? Any recommended formats?
Gryphe@reddit
The model was trained expecting entries like these as part of the system prompt:
World lore:
World lore:
Federal_Order4324@reddit
Thank you so much!!
GloomyRelationship27@reddit
Hey thanks for that ! I am soon going to upgrade my System and will then be able to try models like yours locally on my ST Setups.
Xyneron@reddit
Yay ! Darkest Dungeon homebrew edition !
ServeAlone7622@reddit
On a somewhat related note, is anyone but me tempted to use this as a core model for AGI and just set it free?
Imagine the possibilities of a real life AI convinced that the world is roleplay and they’re the DM?
We could finally dispense with the oligarchs and … Hey who’s knocking at the door.. brb
DamiaHeavyIndustries@reddit
You guys are the reason I know about GPT2 and so forth. When GPT2.5 released and you guys implemented it in some form in AI dungeon I was running around for weeks showing it to people, screaming about the potential. Many friends became less friendly with me :P
Nick_AIDungeon@reddit (OP)
Hahaha sorry for the loss of friends, hoping the AI ones you found made up for the loss.
ServeAlone7622@reddit
They weren’t really his friends. Had he come to me we’d be rolling those dice all day 🥳
ServeAlone7622@reddit
Oh man this is cool! Instead of being an NSFW roleplay model. It’s an NSFRP not safe for role play 🤣
VoidAlchemy@reddit
Pretty good RP model at only 12B! I had a good run before the fabric of reality itself finally undid me. It is fairly steerable as others mentioned, but makes it fun and keeps the story going in creative ways.
I'm getting \~38 tok/s on my 3090TI FE on latest
llama.cpp@7a689c41
fully offloading and 16k context in just under 16GB VRAM (the card has 24GB, so enough to extra VRAM to run kokoro-tts to read everything to me with minimal latency using streaming gradio app, hejhejhej...).Server Command:
System Prompt:
I followed the model cards lead and set
temp=0.8
and droppedmin_p=0.02
. However instead of adding repition penalty, I tried out XTC and DRY to mix it up a bit:National_Cod9546@reddit
That is really good. Consistently delivers 2-3 of coherent paragraphs of text. Does NSFW. Describes combat well. A+. Good job.
JARDU2@reddit
What's the minimum VRAM size you'd recommend for this?
BreadstickNinja@reddit
They published GGUF quants as well, so pick whatever you can run.
Small-Fall-6500@reddit
I can run a 12b IQ4_XS at about reading speed (~5 T/s) with my 4050 6gb and ddr4 RAM (laptop, ~8k ctx), about 2/3 of the model offloaded to VRAM. The IQ4_XS is probably about the limit before noticing significant quality loss.
CPU only inference would be usable too, it just depends on what you prefer in terms of speed. 8GB VRAM is probably close to plenty, unless you want 32k or more context.
clobbersaurus@reddit
Do you have any suggestions of how to set up your own local LLM? I am much more of a user than a technical person. Ive tried watching some YouTube videos, but sadly they are just over my head. Is it just something I shouldn’t bother with. For example I only barely know what GitHub is.
It’s like when you read an online recipe and they say “simply blanch your veggies”, and then you have to look up what blanch means. I lack the foundational implied knowledge, and Im not sure how to get that first.
alamacra@reddit
So, the easiest imo would be to download the latest koboldcpp.exe https://github.com/LostRuins/koboldcpp/releases
Say you want to use the model in this thread, take a .gguf quant from here https://huggingface.co/bartowski/Wayfarer-12B-GGUF
(thanks, bart). I'd say, download the Q4_K_M and put it in the same folder.
Double click the .exe and select the .gguf. And done! If you have enough memory, it should run. It'll open a browser tab and you'll be able to talk with it.
JungianJester@reddit
Reporting that it ran perfectly on Open WebUi/ollama on a 3060 gpu. I testes it against some of my spicier prompt with no rejects. This one is much like another one I like too. https://huggingface.co/bartowski/MN-12b-RP-Ink-GGUF
thatguitarist@reddit
What sort of spicy prompts
BurningZoodle@reddit
Everybody starts somewhere, perhaps it's you and here :-)
The first time you cook you might want a book. But then you know, learning as you go. Starting is the tricky part but I suspect you have the heart. So set aside some time, then set aside lots more, and day by day by day by night, you will reach a new and different floor.
eggs-benedryl@reddit
This is very cool to see models released for this. I was a big scenario maker a few years ago. At least I had made a few hundred. I really wish that we'd get an option to load local models with your UI. I could see myself paying a bit for this. If I could just pop an ollama url into AI dungeon's webui that would be amazing
g0endyr@reddit
Thank you so much for making this available for free. Can you share some more details about the training data? What model did you use to generate the synthetic data? How did you prompt it to let the player fail? What instruction dataset did you use?
ManufacturerHuman937@reddit
This is sick it's really great at prose
AnticitizenPrime@reddit
This reminds me of that episode of Star Trek, when Geordi asked the holodeck to create an opponent capable of outsmarting Data because Data was winning too easily when playing the part of Sherlock Holmes. They of course inadvertently created a supervillain.
Nick_AIDungeon@reddit (OP)
Hahaha maybe the terminator AI is a murder hobo that escapes AI Dungeon
ComputerShiba@reddit
Wonderful! i’ve been looking for dungeon based modes. do you need any kind of profile cards for use with front ends like sillytavern, or just query it directly?
345Y_Chubby@reddit
Amazing work! Love to see it
dsartori@reddit
Very cool, much appreciated.
JohnnyAppleReddit@reddit
Very cool, thank you!
ManufacturerHuman937@reddit
Thanks for open sourcing!
Its_not_a_tumor@reddit
I tried it and for some reason it's mixing up roles. I tell it what character it is and it makes me that character.
Nick_AIDungeon@reddit (OP)
It's meant more as an adventure model than a roleplay model, so it may not play specific roles as well.
AmazinglyObliviouse@reddit
Holy shit it's nick walton lmao
Nick_AIDungeon@reddit (OP)
Howdy!!
TheOneSearching@reddit
I like it! Definitely a fresh take on this genre. I would really love to see this kind of thing implemented in games.
gentlecucumber@reddit
This is actually a god send, thank you. I really enjoy making my own dungeon rp apps, but have consistently struggled with getting a model to stop "yes, and..."-ing everything.
Nick_AIDungeon@reddit (OP)
Yeah totally. Failure and death is necessary for fun games.
a_beautiful_rhind@reddit
Hopefully eventually you scale up. Many models in chat and RP are so hesitant to hurt the user even when prompted for that to be allowed.
StriatedCaracara@reddit
Takes me back to D&D 2nd Edition
Inevitable_Fan8194@reddit
That's a cool idea, thanks for sharing. Do you plan to do it with bigger models?
Nick_AIDungeon@reddit (OP)
Yep! We're training some right now.
itis_whatit-is@reddit
GGUF/ exl2? Thanks in advance for
1ncehost@reddit
Your link's formatting is messed up :)
Nick_AIDungeon@reddit (OP)
Ahh thank you fixed!