I know it's not for everyone, but I think this model needs more spotlight.

Posted by cgs019283@reddit | LocalLLaMA | View on Reddit | 16 comments

writing-roleplay-20k-context-nemo-12b-v1.0 is very interesting Nemo finetuned model that I experienced so far. You know, I know, what are we using these 'roleplay' models for ;)

I tried at least 20+ Nemo models to find which is better at reasoning, instruct following, and multilingual. And this one is the most 'stable' one that I had experienced so far.

What 'stable' means that this doesn't feel wacky with complicate instruction and reasoning, no need to bonk like horny models, and no line of alien tokens whenever you try to talk in different language over English.

I don't mean it is best model for rp or other general task, but I think this is worth to try and recommend to someone else.

[-]

Gloomy-Hedgehog-8772@reddit

Any quantizations? Or should I be able to make them myself?

[-]

Mythril_Zombie@reddit

What does this mean?

[-]

Gloomy-Hedgehog-8772@reddit

People often make modes that use less bits, which are slightly lower quality but need less RAM (which is particularly useful when that means they fit on your GPU)

[-]

Mythril_Zombie@reddit

So these are essentially compressed by different ratios?

[-]

cgs019283@reddit (OP)

There's only 5k_m gguf in their hf rn.

[-]

Scam_Altman@reddit

I've got Q_8.0 quants up too, still uploading the fp16. I'm trying to figure out what the heck happened with llama.cpp because the new quant script only seems to support Q_8.0. I had to use the huggingface space for converting the 4k_m.

[-]

Scam_Altman@reddit

Yeah, I have one exllama2 finished (4.5bpw), working on one or two more (6 and 8bpw). I quantize at very long context length with way more rows than default so it takes forever. I'm also uploading some gguf quants in a separate repo. I can do any quant; gptq, awq, etc if someone requests.

[-]

Scam_Altman@reddit

I wasn't expecting anyone to notice this model, I was going to make a post about it this weekend.

I created several synthetic datasets from scratch including a lot of long context multi-turn chat from chub.ai characters (and other places I pulled from). I generated as many turns as would fit in the 4k output window, prompting the model (command-r-plus) to create both sides of the conversation in one shot. My rationale was, by doing many turns in one generation, the model would be able to create a more coherent, naturally flowing conversation. Also, it's much more tokens per second from my experiments, and I only have 6gpu to play with so that was a factor. Once I got the first 8kish tokens of chat history finished, I let abliterated mistral-small-instruct continue batching the chats. Once the model had 8k example context for the conversation, I didn't see any quality decrease from switching from command-r-plus to the faster model.

I also used some existing datasets for story writing instruct, and I used a few categories from airoboros 3.2, mostly for non erotica "de-alignment" (re-alignment?). I ended up with something like 1.3gb of data total. I ran all that through abliterated Mistral-small-instruct, scoring it for quality. With repetition being penalized severely, and underage sexual content given an automatic 0. That took me down to about 900mb of data.

Ran that through axolotl with a lowish learning rate, only one epoch, and some parameters I pulled out of my ass. My rationale was, I'm not really trying to teach the model any new knowledge. My main goal was to get long context multiturn working as reliably as possible, without going off the rails, or resorting to loops or repetition. With as little gpt-isms and slop as possible. I remember reading, that the more you try to "teach" a model through fine tuning, the more of the "intelligence" of the base model is damaged, especially for tasks not relatable to training data. Citation needed. So I thought it made sense to try to teach long context and multiturn over a big range of samples with a light touch, basically trying to convert the base model to long context multi turn instruct model while damaging the original model as little as possible.

I'm launching a service that provides managed solutions for NSFW generative AI software. You pay a monthly fee, and get access to a web instance hooked up to my backend, pre-configured. Some example: SillyTavern, Agnaistic, OpenWebui, LibreChat, with stable diffusion and voice services plugged in. So I wanted to create a model that would excel for this specific environment. The most urgent feature was stable multiturn, because most models seem to fail this badly. But I think also there could be some big improvements with things like training data of common RAG solutions (ex, SillyTavern lorebooks) and inline/native stable diffusion prompt generation for each software, among other things. The models and datasets will be open source, because open source makes my dick hard. Plus, I feel like my target audience are the kind of people who just want to try/use this kind of software for fun without spending a bunch of money on hardware and learning about LLMs. They just want to sex the computer. I just need a few more patents to expire and then we can even have AI controlled sex toys. SillyTavern Orange Pi robo dildo/fleshlite plugin TBA.

I will make my own post trying to expand on some of this and get community feedback this weekend.

[-]

Infinite-Potato-9605@reddit

Your project sounds like a cool venture! It’s impressive how you’ve leveraged datasets for long context multiturn conversations. I’ve been dabbling in similar setups, and balancing learning rate and epochs was key for me to avoid messing up the base model’s inherent qualities. For promoting your service and connecting with specific audiences, consider platforms like Replicate for model deployment, along with UsePulse for engaging effectively on Reddit. These have helped me in reaching niche user segments without needing extensive resources or infrastructure. Look forward to seeing how your service shapes up.

[-]

cgs019283@reddit (OP)

Very well-made model! I was impressed by what you mentioned and how it drives the story forward so naturally in a coherent way.

I would like to see what your next model can do and improve over this model.

[-]

comunication@reddit

For reasoning you can make a script. I use one made by me and the results are much better. For example if before was like a 5 years old, now is 14. I hope to get at 30.

Nice work and i will test today.

[-]