MiS-Firefly-v0.1-22B (new roleplay finetune)

Posted by realechelon@reddit | LocalLLaMA | View on Reddit | 21 comments

Firefly is a Mistral Small 22B finetune designed for creative writing and roleplay. The model is largely uncensored and should support context up to 32,768 tokens.

The model has been tested in various roleplay scenarios up to 16k context, as well as in a role as an assistant. It shows a broad competency & coherence across various scenarios.

This model is extensively uncensored. It can generate explicit, disturbing or offensive responses. Use responsibly. I am not responsible for your use of this model.

This model is a finetune of Mistral Small 22B (2409) and usage must follow the terms of Mistral's license. By downloading this model, you agree not to use it for commercial purposes unless you have a valid Mistral commercial license. See the base model card for more details.

https://huggingface.co/invisietch/MiS-Firefly-v0.1-22B

[-]

Gensh@reddit

Unlike everyone else, I was using the EXL2 and had no issues with spelling not caused by XTC.

The prose is a lot better than vanilla in general and also avoids the way vanilla loves saying "indeed". There's a big leap in scenario speed - vanilla tends to force a back-and-forth, while Firefly is more willing to take your input and move forward with it. I'm going to keep using it for a while and see what shakes out.

A couple of hiccups:

I forget which, but it requires a fairly current version of a dependency to be loaded at all. Ooba's ExLlamaV2 install/update is broken currently, so I finally had to get my TabbyAPI install working.
Firefly was unusuably stupid with the sampler settings I was using for vanilla. It was fine once I turned the temp way down, but going back to a more neutral starting point might need to be emphasized in the doc.

[-]

realechelon@reddit (OP)

Did you try v0.2?

That should fix the spelling issues with names/spacing, I just don't have EXL2 for it yet.

[-]

Gensh@reddit

I misread the issue and thought the EXL2 had escaped the bug.

I've strictly used the full VRAM formats -- the only time I tried loading a GGML back in the day, I completely goobered the settings. I might try getting the GGUF working if nobody does an EXL2 for the update. The GPTisms in vanilla Small are killing too much time with rerolls.

The current issue with Ooba is that they aren't fully reinstalling for whatever tests they're doing. ExLlamaV2 has to update in lockstep with PyTorch and friends because the api is apparently a disaster. Ooba bumped the ExLlamaV2 version but not the Torch/etc versions, so ExLlamaV2 tries calling functions that don't exist yet. I tried manually installing the correct versions, but I ran into some order of operations issue and gave up.

Either way, when it works, it's great! I had to make more manual amendments for basic errors, but I wasn't having to generate ten responses and splice them together anymore.

[-]

realechelon@reddit (OP)

EXL2s are up for v0.2, linked on the model card.

[-]

4as@reddit

I've tried to the updated version and after some testing I realized it's not a model for me.
It has its moments but generally I don't like how it seems to have a short attention span. Very often rather than expanding on situations and scenes it will start a new paragraph with "suddenly" or "meanwhile" and write about something completely new. At first I saw this as creativity, but after a while it started to get annoying. Might be good for RPing with a single character, but doesn't work with storytelling.
Actually, it generally doesn't do well with stories, beyond what I would generally expect from 22B. For example it prefers to summarize interactions between characters, rather then acting them out in detail. Ie. we get "He went to talk to X who told him that the treasure was buried at Y" rather than actual dialogue.
Prompt adherence also leaves some things to desire. I have character that "likes to talk to herself out loud," and of course the AI doesn't include any dialogue of the character talking to herself.
On top of it all it also shows glimpses of that usual AI slop with stuff like "shivers" or "heaving," or "leaves little to the imagination."

Generally, nothing really noteworthy.

[-]

realechelon@reddit (OP)

Mind sharing settings? I had it bring up things from 40-50 messages earlier on multiple occasions which is wild attention span for 22B. It runs hot so prefers lower temps around 0.7-0.9 with some min-p.

For creative writing I found that prompting beats+length avoids summarization i.e. 'Write a 2000 word scene where X and Y discuss their findings' along with context of the previous scenes (similar to how Novelcrafter works).

Prompt adherence also leaves some things to desire. I have character that "likes to talk to herself out loud," and of course the AI doesn't include any dialogue of the character talking to herself.

That's definitely something I haven't tried. Can you share the character card or is it private?

[-]

4as@reddit

I used the recommended settings from the model card, and then just messed with Temp to see how it behaves. Also, I didn't go far enough to test the model's memory (plus recalling things from the past doesn't really interest me as much).
Anyway, back to the attention span. At low Temp (\~0.3) it does indeed stay in the moments, elaborating and slowly building upon the current story, but it's creativity is gone and it repeats situations.
On higher Temp (0.7+) it gets more creative and moves in unexpected directions, but it also has shorter attention span and often chooses to pivot to new topics. For example, in a middle of a dungeon, out of nowhere, it occasionally tried inserting other characters mention in the character card that really shouldn't be there.

I don't think my character cards are worth sharing. They all have storytelling presets with generic "You are proactive. You have characters use dialogue. You do not summarize." And so on. Then I include a setting and some existing characters from anime/video games to test the models knowledge.

I don't have a system prompt (beside the character card contents).

[-]

realechelon@reddit (OP)

I haven't tried temps as low as 0.3, I'm usually running with dynamic temp 0.7-1 and haven't come across these issues yet (on Q6_K, Q8_0 and F16 GGUFs). I don't think I have cards that mention other characters though (other than in opening dialog), so it could be specific to that, maybe?

Just to confirm, you're using the Mistral v2/v3 instruct format? I have seen some absolutely wild behavior when I forgot to change from L3 format.

realechelon@reddit (OP)

Can you share your sampler settings & instruct settings?

[-]

realechelon@reddit (OP)

Thanks for the feedback.

Did some digging, this is an issue in quantization (doesn't happen on fp16/Q8_0), I'm pushing v0.2 which fixes these quant issues (tested at q8 and q6).

Should be up in 6 hours or so, then I'll need to get it quanted again.

[-]

LoafyLemon@reddit

Unfortunately, it seems to make a lot of spelling mistakes. I assume the dataset is contaminated?

[-]

realechelon@reddit (OP)

What quant are you running? I've seen maybe 1-2 spelling errors at Q8_0 in \~500 messages.