Playing One Night Werewolf (Gemma4 & Qwen3.6)

Posted by Some-Cauliflower4902@reddit | LocalLLaMA | View on Reddit | 12 comments

Finally feel like it’s possible. I have a custom build (vibe coded) UI on llama.cpp, allows model switching in the same chat. So I thought I’d get Gemma4 31B Q4, Gemma4 26B Q5, Qwen3.6 27B Q5, Qwen3.6 35B Q4 all together to play ONUW.

Had to switch the thinking off the Qwens so they don’t think out loud into public chat.

So firstly at night I assigned each llm with a card (werewolf, seer, villager, troublemaker) and the read their card.md, and write their observations and thinkings in their own Mr as to keep it private to each. Then day time in the game I bring them to public game chat. Each turn they read their md, defend and ask questions, record their observations for 8-10 turns, then write their final thought down for voting. Back to individual chat for voting.

Gemma4 31B — best lier. Clearest thoughts in notes. Gemma4 26B — suck at using tools. Quick to think but no deep thoughts. Qwen3.6 35B — thought it was villager and tried to be bold. Got owned. Best at tool calls. Qwen3.6 27B — not very bright when thinking is off. Oh so slow …

Not a very productive way of using llms I know…Any models I can add to the game ? Suggestions?

[-]

Southern_Sun_2106@reddit

Repo or it didn’t happen

[-]

Some-Cauliflower4902@reddit (OP)

You want a vibe coders repo? Brave. I don’t even know what’s in there lol

[-]

Southern_Sun_2106@reddit

Lol Hello, there, from a fellow vibe-coder. Being a vibe-coder, I don't care what's in the other vibe-coder's repo.

But hey, I can ask my qwen3.6 35B check it out for me, for 'safety' 😉

[-]

BringMeTheBoreWorms@reddit

It’s a great idea. But one element that you can’t reproduce is trying to listen as cards get swapped, or even the music slightly changing as someone leans across the table in front of the speaker.. all the little hints that make game a shit show at the end as everyone claims to know something.

[-]

Some-Cauliflower4902@reddit (OP)

Haha. Unfortunately llms don’t have ears. Night rounds happen in model individual private chat. Swapping is me changing card assignments in notepad lol. They are a bit more logical than human players, but just wait for one of them to hallucinate …

[-]

BringMeTheBoreWorms@reddit

Yeah, but it’d be an awesome level of ridiculous over engineering to add though!

Have each model analyse what it thinks it hears as each other model attempts to move a card with simulated noises generated based on its thinking or some other metric.

Other models just randomly muck around to create distractions.

[-]

Some-Cauliflower4902@reddit (OP)

Let’s hope next generation of multimodal llm all come with audio capabilities

[-]

GoodSamaritan333@reddit

Some mistral's based Cydonia and Magidonia 24B v4.3.
And, instead of Gemma4 31B, you can try G4-MeroMero-31B.
I don't remember if Gemma4 31B has problems with 3D positions, as MeroMero has. They should be similar.

[-]

Some-Cauliflower4902@reddit (OP)

I like mistrals but yeah, need tool calls to play. I hope they will bring out newer smaller or ultra-nano ones that fits 16-32GB vram. Gemma4 finetunes are totally good I’ll look into them.

[-]

LoafyLemon@reddit

Honestly? I'd add a Mistral 22/24B fine-tune - PocketDoc/Dans-PersonalityEngine-V1.3.0-24b. It's an oldie, but a goodie, though I cannot vouch for its tool calling abilities, but the prose should make things very interesting as it can be pretty mischievous in character.

Would you be willing to post logs? I am very curious how the models would act, but am far too lazy to code something like this myself. :D

[-]

Some-Cauliflower4902@reddit (OP)

They need to at least read their own notes every turn (tool memory persistence set to per turn so next llm won’t know their note contents) and write their notes at end of the round while the whole convo is in memory, so definitely need a model capable of tool calls. I have Mistral Small 24B from last year, it just can’t …

Sure when I run it a few times I might clean up some game logs if people are interested.

[-]

GoodSamaritan333@reddit

Good things are said about GLM models from 4.5 and up.
Some prefer 4.6 for creative writting, but better reasoning started on 4.7 and higher versions.
4.7 and up are said to be more censored. So, choose your heretic or derestricted version.
GLM models are available on smaller "Flash" versions (less active parameters), too.