MN-GRAND-Gutenburg-Lyra4-Lyra-23.5B - Long Form Output / NON "AI" prose.

Posted by Dangerous_Fix_5526@reddit | LocalLLaMA | View on Reddit | 28 comments

This is a Mistral Nemo model, max context of 128k+ (131,000+) that excels in long form output generation at high detail levels - including dialog, narration, and "non AI" like prose.

This model can output SFW and NSFW prose.

It is for any writing, fiction or role play activity.

This model has outstanding story telling abilities, prose and long form coherence (one test blew past 8k) and is comprised of THREE "Gutenburg" models that score very high at multiple websites including EQBench and UGI-Leaderboard.

The model loves to go on and on at 2k, 3k, 5k and higher outputs on a single prompt are not uncommon. It will likely "overwrite" rather than underwrite - meaning far more detail, narration, dialog and "meat" in the output so to speak.

Detailed and varied (different prompts/temp) examples which show why this "raw" model deserves the light of day with 1k, 2k, 3k, and 5k examples.

https://huggingface.co/DavidAU/MN-GRAND-Gutenburg-Lyra4-Lyra-23.5B-GGUF

[-]

Right-Law1817@reddit

Hey OP, for some reason the templates provided on repo doesn't work at all. I am using ollama:

TEMPLATE """{
  "name": "Alpaca",
  "inference_params": {
    "input_prefix": "### Instruction:",
    "input_suffix": "### Response:",
    "antiprompt": [
      "### Instruction:"
    ],
    "pre_prompt": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n"
  }
}
"""

But it always returns output like this:

, {

"name": "Alpaca-3b-base",

"inference_params": {

"input_prefix": "### Instruction: \n",

"input_suffix": "\n### Response:\n",

"antiprompt": [

"### Instruction: \n"

],

"pre_prompt": "### Prompt: \n

Can you tell me how you made it workout?

[-]

Dangerous_Fix_5526@reddit (OP)

Suggest googling: "ollama alpaca template"

The template from the repo page is in JSON format, and may require adjustments to work with Ollama's systems.

[-]

NobleWhale@reddit

Hi. I thought I'd give this model a try given glowing opinions but when I try to load it into Oobabooga, I'm getting "AttributeError: 'LlamaCppModel' object has no attribute 'model'". Any idea why? Here's the complete output:

https://pastebin.com/KREkiUey

[-]

Dangerous_Fix_5526@reddit (OP)

hmm ; tried it here - no issues - exact quant, via llamacpp and llamacpp_HF .
Maybe corrupt download?

Update OObabooga ?

Watch for any settings on the loader screen - all should be off, likewise context size will likely require manual adjustment; otherwise it defaults to 1,000,000

[-]

4as@reddit

Is it just me or does this model has some weird repetition problems? It really likes to repeat some words twice in a row, like "He stood in a dark, dark cavern." Or "On her cold, cold feet." Etc.
It also really likes adding adjectives as the stories go on, overdoing them to a point where it says a lot without actually saying anything.
Maybe I've missed something in SillyTavern, but I've been using the recommended settings from the model's page. I even tried switching between Alpaca and Mistral templates but it didn't have any effect. Hopefully it's not because I limited the context to 8k...
On the upside, the model has fantastic role-playing capabilities. I do my testing by having the AI impersonate some existing characters from anime, tv, and games, and this model portrayed them in very creative and interesting ways. The responses were both fun and faithful, at least from the initial impressions.

[-]

MyPervyAccount28@reddit

The model notes mention increasing repetition penalty to help with this. I had to increase it to 1.1 to get it to not lock up on "tent" and even at that level the model is straight up obsessed with the concept of tenting. NSFW >!anyone getting hard will have lines like "Jake's erect cock obscenely tenting his boxers tent tented the thin cloth." With the recommended 1.05, Jake's cock would tent tent tent x500 tokens.!< These happened multiple times in multiple stories, specifically with the word tent.

[-]

4as@reddit

Yeah, I can confirm. I've also noticed the problem of getting stuck on the word "tent," along with verbalized screams ie. "AAaarrrrGGGGghhhhhuuuuuhhhhHHHHHHaa..." It will get stuck extending that scream forever.
Repetition penalty might help, but I don't like using it since it tends to make the AI devolve into nonsense sooner or later.
At this point I think this model shows great promise, but seems to be broken.

[-]

MyPervyAccount28@reddit

it does seem to be specifically the sexual idea of tenting triggering it, I had it write some stories about camping and had zero problems with men pitching tents in a non-sexual context.

[-]

Dangerous_Fix_5526@reddit (OP)

It is unclear the source of the dbl adjs. ; could be the order of the Gutenburgs used, sampling within the merge or in the datasets (and books) used to create the Gutenburgs used in the merge.

And yes... it does go on and on ; in fact of all the models constructed so far this one takes the cake.
Mistral Nemo / Llama 3.1 seems to go a lot longer in terms of "default" output than older models.

[-]

DerfK@reddit

failed to allocate buffer of size 339738624032

I might need a small upgrade before I can use the full 128k context :P

[-]

Dangerous_Fix_5526@reddit (OP)

64k? Note that model can actually go over 128k ; however benchmarks online show Mistral Nemo models drop after 128k in terms of "needle in a haystack" quality after this point.

[-]

Downtown-Case-1755@reddit

This is not really true, Mistral Nemo is pretty awful past like 24K.

[-]

Dangerous_Fix_5526@reddit (OP)

Do you refer to quality or coherence?

[-]

Downtown-Case-1755@reddit

Yes. It doesn't remember anything from earlier in the context, and quality degrades significantly. Pretty much all "128K" mistral models are like this, and TBH most 128K models.

Models that are "better" at long context, from my testing, are the new Command-R (which peters out around 64K-80K), InternLM 2.5 (20B pretty good at 64K-128K, including the base model), and Qwen 2.5 (working OK at 50K-64K for me, but very bad at 100K with YaRN). Llama 70B 3.1 is said to be good too, but its too rich for my blood, lol.

[-]

DerfK@reddit

Note that model can actually go over 128k

Ah, that's the reason. Without a specific context size on the commandline, llama-cpp uses the context size pulled from the model, which makes it try to allocate about 340GB ram. I originally assumed the model had 128k coded in it but it is much higher.

[-]

IrisColt@reddit

Thanks for the insight, this will likely solve my problem!

[-]

vsoutx@reddit

is it on eqbench

[-]

Dangerous_Fix_5526@reddit (OP)

Source has not been released yet, this will happen later this week. Eq requires source version.

[-]

cyan2k@reddit

Any comparison you can make to gemma-2-9b-ifable?

I just spend a week to rework and optimize all my creative writing agents to that model xD

[-]

lothariusdark@reddit

Where/How are you using "creative writing agents"? Im not using LLMs that much and this is just the first time I read of agents being used for something other than coding.

[-]

cyan2k@reddit

Nothing magical.

It's basically how you'd build research agents, but instead of writing a paper, they write a story.

I have a "StoryAgent" that comes up with a rough story based on your prompt, an "ActAgent" that structures the story into three acts, a "ChapterAgent" that creates chapters for each act, and so on. You get the idea. it's just good ol' divide and conquer. depending on the lentgh there are more division agents, until no agent has to generate more than 500words on its own.

There's also an "EditorAgent" and a "ReviewerAgent" that communicate with the other agents, providing feedback for them to act on.

And last but not least, there's the "MasterAgent" which orchestrates everything. It's also the interface for a knowledge tree-based RAG, meaning all other agents can ask it for information about the story, characters, and so on, and they can feed it information to store in the knowledge tree.

[-]

Glittering_Manner_58@reddit

Sounds neat, are you using an agent framework?

[-]

cyan2k@reddit

Autogen

https://github.com/microsoft/autogen

[-]

Dangerous_Fix_5526@reddit (OP)

Hey:
I have a Gemma model which also contains Ifable (and 3 other top Gemma models as listed at EQBench) here:

https://huggingface.co/DavidAU/Gemma-The-Writer-9B-GGUF

There is prose output too at the repo.

RE: Compare - roughly Gemma will be on par with this model in terms of prose, with this model having more detail, narration then Gemmas (based on testing a number of gemmas).

Where it will differ: Gore, Horror, and Swearing and NSFW content... and vivid details related to these.

Gemma won't do these - not to this level, and the "uncensored" versions I tried of Gemma won't either.

Hope this helps ;

[-]

robertotomas@reddit

is there a multilingual prose dataset/models available?

[-]

Admirable-Star7088@reddit

It is for any writing, fiction or role play activity.

Looks like I've to cancel the planned sexual intercourse with my wife tonight. My apology will be: "I need to do important AI stuff, you know, the technology that will revolutionize our lives".

[-]

Nrgte@reddit

Always a fan of Gutenberg finetunes. But where is the base model and are there some exl2 quants yet?

[-]

Dangerous_Fix_5526@reddit (OP)

Full source will drop this week ; hopefully EXL2 too. I can't do EXL2 quants on my machine.