Just curious what those of you with 3090s are running for roleplay right now?

Posted by delicatemicdrop@reddit | LocalLLaMA | View on Reddit | 21 comments

I've used Midnight Miqu GGUF for quite a while but there are a few repetitive descriptions that seem to pop up in my roleplays. I tried Dusk Miqu and am trying a fresh install, because it worked really well for the first night I used it and now suddenly it crashes every time I start up my KoboldCPP (not sure if that's what many use here or not, I personally have used it for a while)... but it did really well last night so I'm sad about that. Not sure if maybe a file got corrupted or anything, so I'm trying a fresh install here in a few. Also haven't updated my kobold or Silly in a long time, so I'm gonna contemplate doing that as well.

But I was wondering if any of you have any models you're particularly enjoying, I don't really do group chats and just tend to do 1 on 1 roleplay in SillyTavern, and Miqu-based stuff has performed the best for me so far, but I'd love to know what I may be missing out on.

[-]

Haunting-Ad-571@reddit

I hope to buy a 3090 in the future

[-]

Natkituwu@reddit

I have a 4090, but should still preform as well on a 3090.

im currently using Cydonia 22b v2m at Q6k. ive tried models in this card from 7b all the way to 72b. and this is the best middle ground so far when it comes to both quant size and model size.

1.1 - 1.25 temp, 0.1 top a, and with XTC / Dry sampling its amazing! definitely better than miqu at Q3S/M. since Q3 quants are really degraded compared to Q5 or Q6.

[-]

Ravenpest@reddit

I'm afraid below 120b Midnight Miqu is still the best for most RP. Otherwise, Luminum is excellent as long as you dont do ERP, because that's a neverending slopfest with the same issue as Goliath, the fucking shivers are everywhere

[-]

a-creation@reddit

Have you tried something like TheDrummer's Donnager? Just wondering if anyone has experience with if its better than Midnight Miqu
https://huggingface.co/TheDrummer/Donnager-70B-v1

[-]

Ravenpest@reddit

I avoid this uploader's models so no.

[-]

Respeseis@reddit

I primarily used Midnight-Miqu-70B-v1.5.Q4_K_M, but I've recently switched to use NemoMix-Unleashed-12B-Q8_0 or Rocinante-12B-v1.1-Q8_0. I have a 4090 and would love to find something that feels more responsive and precise. All of these models tend to get confused about positions, details, and environment at some point, and manual corrections don’t always fix it. The only way to get them back on track is to rewrite the input.

I’ve also tried gemma-2-Ifable-9B.Q8_0, L3-DARKEST-PLANET-16.5B-D_AU-Q8_0, MN-GRAND-Gutenburg-Lyra4-Lyra-23.5B-D_AU-Q6_k, daybreak-kunoichi-2dpo-7b-q8_0, and llama-3.1-70b-instruct-lorablated.Q4_K_M, but none of them feel quite right to me.

[-]

Environmental-Metal9@reddit

Try mradermacher/New-Dawn-Llama-3.1-70B

[-]

Respeseis@reddit

After trying it for a while, it seems a little better than Midnight-Miqu. Thanks.

[-]

delicatemicdrop@reddit (OP)

Dusk Miqu is working again for me after a fresh install of both the model and KoboldCPP and I actually really enjoy it so far. I run smaller quants than you so it may be even better with your GPU. I really love Midnight Miqu but certain phrases were getting overused over and over even if I switched up the character some and the system prompt. I'm sure Dusk Miqu have have that problem too eventually but I think once I've used a model for a couple months is when it really starts to annoy me. So far it's my favorite other than Midnight Miqu.

[-]

Environmental-Metal9@reddit

Have you noticed if those quirks start before or after you hit your context window? Just curious, really. I was having a lot of that with context sizes of 4k, but it became a new world when I jumped to 16k and higher

[-]

delicatemicdrop@reddit (OP)

More often when it hits context window, but sometimes randomly even within it shortly after starting a new chat. For instance I kept getting "voice like a caress" over and over in multiple different chats. Got tired of seeing it so much.

[-]

Environmental-Metal9@reddit

Oof, yeah. That’s annoying. I haven’t been able to escape shivers down their spines and mischievous glint in their eyes. The way LLMs make people shiver always makes me worry maybe they have a fever or something

[-]

Respeseis@reddit

Yes, they all do that. Barely whispering in my ear, how maybe, just maybe, we feel the same. Nemo and Roci seem just as smart as Midnight, but they're so much faster, so I fell in love with them.

I’ll give Dusk a try and see what it’s made of.

[-]

No-Idea-6596@reddit

If you have 2 or more 3090s, Luminum 123b and Twilight large 123b should give a better RP experience. Can't wait to try a fine-tune nemotron 70b for RP though.

[-]

Environmental-Metal9@reddit

Try right now! It’s quite good as is! I can only imagine when a fine tune appears!

[-]

delicatemicdrop@reddit (OP)

Right now only one 3090. Not sure I could fit another in my case without a headache, actually probably sure I couldn't. I am not an expert so I'd have to look at how I could finagle one outside of the case if I wanted. I have contemplated getting a second, but for right now one is doing okay for me. A little jelly of those of you who get to play with two though :)

[-]

No-Idea-6596@reddit

To me, getting a second 3090 on my system and making it work are kinda like a drug. Once you've tried 70B models that could run with a good speed, you wanna try bigger models to see if you are missing out on something. Once I've tried fine-tuned mistral large 123b models, even with slow response < 0.6- 0.7 token/sec, it's kinda hard to go back and play around with those 70B models. Now I have heard about the 195b models, I am constantly thinking of getting a quad 3090s on my system, which doesn't really support it. Getting a new motherboard that supports 4 huge GPUs, a new 1500+ watt power supply, two more 3090s and maybe water cooling parts just for a good RP experience, hmm.., wouldn't worth the time and money, would it?

[-]

CountPacula@reddit

Mistral Small (Q6 and Q8) and Mistral Large (iq2m and iq3_xxs). Been playing around with Qwen 32b and 72b as well.

[-]

delicatemicdrop@reddit (OP)

I haven't messed with any Qwen, how is it compared to Mistral/Miqu?

[-]

CountPacula@reddit

They do compare pretty favourably, and they definitely are worth trying, but I don't think they are quite as good.

[-]

reiyume0@reddit

I’m using Mistral Small Q8 GGUF on my 4090 with Koboldcpp and it seems better than the other options for repetition.