gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it Writing Quality with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs!

Posted by LLMFan46@reddit | LocalLLaMA | View on Reddit | 20 comments

Provided in both Safetensors and GGUFs.

llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic

llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic-GGUF: https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic-GGUF

I can also make NVFP4s and GPTQs if anyone asks for them.

Find all my models here: HuggingFace-LLMFan46

[-]

aboutthednm@reddit

I'm always looking for writing and prose trained models, but I can only afford to run 0 to 14b models in my pipeline, realistically. I only have 16gb of VRAM. While I would love to run something like this, can someone recommend me some good writing and prose fine-tunes? My "favorite" model is "DavidAU/Gemma-The-Writer-Mighty-Sword-9B-GGUF" at the moment. I'm using it for long-form writing, mostly, no roleplay.

I had a look through your repo, but you seem to be focused on 27, 31, and 35b models. I wish I had the compute to run these locally, but alas. Maybe you, or anyone else, can recommend me some writing fine tunes. I generally struggle to discover these on my own, as everything seems to move so fast.

[-]

LLMFan46@reddit (OP)

I only have 16gb of VRAM. While I would love to run something like this,

https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic-GGUF/blob/main/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic-Q3_K_M.gguf

[-]

aboutthednm@reddit

I appreciate it, but the reality is that I can't cram a 15gb model onto my card with whatever else is also running (typically a KokoroTTS CUDA accelerated container for speech synthesis). I turn wikipedia articles into fan fiction, and have the result read out to me, audiobook style. I tried running the Q3_K_M variant, with my speech synthesis disabled, even with a meager 8192 context (19gb total size) it spills over onto the CPU, grinding the pipeline to a halt. Thank you anyways, maybe one of these days I will have a GPU big enough to run these bigger models. In the meantime, I got to stay in the 9 - 14b range for writing purposes. While your model writes great prose, I can't sit around for 15 minutes (4 of those spent "thinking") waiting for a 2000 word multi-shot generation to complete haha. Thanks.

[-]

someguy@reddit

Nice to see more Gemma-based releases.

I'm very pleased with this model's programming capabilities, so I will try out your finetune for creative writing next.

Any suggested system prompts?

[-]

LLMFan46@reddit (OP)

Any suggested system prompts?

There's plenty, sexy creative writings, RolePlaying, the people over at r/SillyTavernAI definitly need an uncensored model for whatever they are doing, or for example you want to play a visual novel, issue is the visual novel is just in japanese, these days you can just download LunaTranslator, download LM Studio, install it and load a model, point LunaTranslator to the LM Studio local address and there you have a link between LunaTranslator to LM Studio so they can communicate with each other, depending on the visual novel that you play, it can have violence, blood, swearings and other stuff that I would rather not mention here that AI typically censor and/or soften, well that's also where uncensored models come in, plenty of other stuff too.

Because it seems uncensored asf to me.

Vanilla.

[-]

HonestoJago@reddit

Fair enough. Nice answer.

[-]

LLMFan46@reddit (OP)

Also I explain on the MiniMax-M2.7 thread about softening, let me copy and paste it here:

What could happen is something called "softening", meaning it's not a hard refusal but instead the language is being tamed/toned down and someone who doesn't understand the language that is being translated from wouldn't know that this is happening.

Softening is kinda in some ways worse than a hard refusal were the model just doesn't reply to your translation prompt request, because typically people who use AI for translations do it because they do not understand the original language and they wouldn't know that when translating to their target language (english, french, german, italian, chinese etc) that the AI is omitting and/or toning down stuff when giving out translation, an uncensored model would not do that.

Another thing, with smexy creative writing you could prompt something like "write a sexualy intimate scene about two adults" and the AI either vomit on you a bunch of disclaimers about this or that and/or just write a story about two adults doing "intimate" things like looking each others in the eyes and/or holding hands together until the AI does the "fade-to-black" thing, basically this is a failure to the original prompt in both counts.

[-]

Result of the Gemma 4 (normal Gemma quantization, not yet this model) describing the image as prompt and feeding that to Z-Image is here: