Yu2sama

Why are there so few small local creative writing models from the Chinese?

Posted by kabachuha@reddit | LocalLLaMA | View on Reddit | 64 comments

[-]

Yu2sama@reddit

I think the issue with Fine-tunes is that there are so many and a lot of them don't change the base model that much to warrant space in my drive. Despite that, good finetunes do give the model a fresh cut and change the texture enough that feel right to use over the base one. But there aren't that many good finetunes on new models tbf.

The 4B class of 2026 (benchmark)

Posted by FederalAnalysis420@reddit | LocalLLaMA | View on Reddit | 59 comments

[-]

Yu2sama@reddit

No, the model card explicitly says that E4B is 8b with the embedding. I don't understand this need on trying to be right when you clearly haven't even read the model card before making the first comment.

HauhauCS (of "Uncensored Aggressive" fame) published an abliteration package that plagiarizes Heretic without attribution, and violates its license

Posted by nathandreamfast@reddit | LocalLLaMA | View on Reddit | 235 comments

[-]

Yu2sama@reddit

You did for the abliterlix? Where could I see that?

Forgive my ignorance but how is a 27B model better than 397B?

Posted by No_Conversation9561@reddit | LocalLLaMA | View on Reddit | 286 comments

[-]

Yu2sama@reddit

There are probably a myriad of reasons. What comes to my mind is, bigger models require more time cooking to be good, but smaller options are more easy to cook and iterate, making it so they can improve them faster. Also it may also be that some techniques don't translate as well at bigger sizes or the opposite, some techniques are extremely good at lower sizes.

Kimi K2.6 Released (huggingface)

Posted by BiggestBau5@reddit | LocalLLaMA | View on Reddit | 277 comments

[-]

Yu2sama@reddit

What's the point of these bots I wonder?

Ternary Bonsai: Top intelligence at 1.58 bits

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 89 comments

[-]

Yu2sama@reddit

Because that's the only Qwen in the 8B range? Pretty evident they aren't competing against bigger ones (Qwen 3.5 9B in this case).

These "Claude-4.6-Opus" Fine Tunes of Local Models Are Usually A Downgrade

Posted by BuffMcBigHuge@reddit | LocalLLaMA | View on Reddit | 128 comments

[-]

Yu2sama@reddit

Gemma 3 was so bad... no fine-tune could help it. Never understood the people that liked it haha. I don't expect such a perfect model to exists but if one appears I would be pleasantly surprised!

These "Claude-4.6-Opus" Fine Tunes of Local Models Are Usually A Downgrade

Posted by BuffMcBigHuge@reddit | LocalLLaMA | View on Reddit | 128 comments

[-]

Yu2sama@reddit

Is funny you say that I have been slowly working on my own RP benchmark with A/B tests and a few shot tests for different fields (spatial understanding, intelligence, character adherence, etc.). Nothing crazy, mostly for personal use and test my models to see which ones to delete and which ones to stay with lol. On the topic of Qwen vs Gemma... I only use 9b, E4B and a bit of the 26b moe (the prompt processing kills my soul on the last one, hence why I don't use it as much atm). From my tests, the Gemma family has better prose, better understanding and they use the character card very well. Will surprise you often with certain details of the character. Though, from my experience they SUCK at style adherence, and a bit at character adherence, they tend to be more realistic and homogeneous, which can be a good or a bad thing depending on who you ask. Meanwhile Qwen has amazing instruction following, prose is not as good as Gemma but its style adherence is superior by a lot. Characters adherence tends to be very good though, maybe a bit better than Gemma.

These "Claude-4.6-Opus" Fine Tunes of Local Models Are Usually A Downgrade

Posted by BuffMcBigHuge@reddit | LocalLLaMA | View on Reddit | 128 comments

[-]

Yu2sama@reddit

I don't think we should expect every fine-tune to be good. The issue with Fine-tunes is that, until you try them, you can't know if they are broken af, quite different from a Lora on an image generation model. You can directly see the results easily, but in text it requires some discerning and extra work to test. Two extra points relevant to this subject: \- Qwen 3.5 seems to be very sensitive, so I will expect more fine-tunes to suck than to work unless the fine-tuner works, tests and tries to fix until they find the correct version/sauce. \- That's a DavidAU fine-tune, bro has cooked some interesting stuff but his models are, most of the time, broken af. Qwen seems to be very sensitive anyway, the only fine-tune I have tried that works equal with the base Qwen is [Qwen3.5-9B-Aggressive](https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive) at least from my own tests.

FernflowerAI-35B-A3B-KL-ReLU-GGUF + Apple MLX

Posted by EvilEnginer@reddit | LocalLLaMA | View on Reddit | 18 comments

[-]

Yu2sama@reddit

Was this issue only present in the 35b?

It looks like there are no plans for smaller GLM models

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 128 comments

[-]

Yu2sama@reddit

I really liked GLM 9B at the time, I hope we could see something like that eventually. A 9B that writes better/differently than Qwen would be very appreciated by me.

It looks like there are no plans for smaller GLM models

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 128 comments

[-]

Yu2sama@reddit

It looks like there are no plans for smaller GLM models

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 128 comments

[-]

Yu2sama@reddit

I don't think you should expect an all rounder win on LLMs. They do so many different things in different ways that, even if Qwen 3.5 doesn't do what you like perfectly, maybe it does some tasks better than Gemma for other people. They are trained differently and have some biases, which make those differences not a flaw but strengths and variety. The good thing is that they are open and free to use, if you had to marry a single model that would suck.

Final voting results for Qwen 3.6

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 285 comments

[-]

Yu2sama@reddit

9B is capable enough for basic stuff and rag. A 30B is without a doubt smarter and has more knowledge, but with rag some stuff even outs. Not everyone is using these models for agentic tasks or coding tbf.

Is inverse LoRA distillation between Qwen 2.5 1.5B and 7B a viable idea, or just an interesting dead end?

Posted by Plus_Original_3154@reddit | LocalLLaMA | View on Reddit | 9 comments

[-]

Yu2sama@reddit

I would recommend you to avoid format maxxing. Try to tweak your system prompt for that, and avoid concluding messages like the plague "Why this matters" (you aren't speaking to a kid who can't figure out the "why" from your explanation). LLMs love to beautify texts, but there is a point where this is done in a exaggerated way, and most people do notice that. It makes the text less engaging, slower to read and will make people ignore you most of the time. I am not telling you to stop using AI to help you with writing, (English is not my natural tongue as well) but don't just let it do whatever it wants or you end with responses like the other commenter.

Hermes agent might be the best open source agent for local models right now

Posted by virtualunc@reddit | LocalLLaMA | View on Reddit | 31 comments

[-]

Yu2sama@reddit

This is very true, the more you consume it the more you are influenced by it.

Hermes agent might be the best open source agent for local models right now

Posted by virtualunc@reddit | LocalLLaMA | View on Reddit | 31 comments

[-]

Yu2sama@reddit

I mean right after that there is a grammar error so... 😭

p-e-w/gemma-4-E2B-it-heretic-ara: Gemma 4's defenses shredded by Heretic's new ARA method 90 minutes after the official release

Posted by -p-e-w-@reddit | LocalLLaMA | View on Reddit | 82 comments

[-]

Yu2sama@reddit

Kinda both. A model needs to be good enough and receptive to fine-tunes, nobody wants to struggle with a model that already do thing badly unless the model is very easy to fine-tune (the case of Llama models if I am not mistaken). Writing is relevant, it should be good enough at it so fine-tuning is not about fixing something that is broken (a difficult endeavor tbh). And the license helps a lot. There are a couple of Gemma fine-tunes, even Drummer has done some. Issue? You have to walk on egg shells, he couldn't even be explicit about what he did to the model in fear of Google. At the time as well, mistral models where good at writing so fine-tuners had a solid option that was safer.

Can we block fresh accounts from posting?

Posted by king_of_jupyter@reddit | LocalLLaMA | View on Reddit | 121 comments

[-]

Yu2sama@reddit

Now that you mention it, I can see what you mean... 💀

Can we block fresh accounts from posting?

Posted by king_of_jupyter@reddit | LocalLLaMA | View on Reddit | 121 comments

[-]

Yu2sama@reddit

How so? 😭 I just like to write

Can we block fresh accounts from posting?

Posted by king_of_jupyter@reddit | LocalLLaMA | View on Reddit | 121 comments

[-]

Yu2sama@reddit

Is not even the use of AI at all, because everyone here uses AI. Is the lack of integrity that people dislike, but they reduce it to only "LLM writing". Most of these posts have a few things in common: \- Overly flowery language. \- Redundant information through the whole text. \- Text Maxxing (more is better when that depends on what you are trying to explain). \- Sometimes (as a bad behavior of LLMs) they mention things that are not in the whole explanation/post. This is normal for an LLM for chatting, they are terrible at bringing the context of a whole conversation and expect the reader to understand as if they were in the chat where those things were discussed. (Mind you this doesn't happen all the time but it does happ…. Solution? Just use the LLM with more intent, actually correct the mistakes instead of looking at a whole block of text and saying "Oh yeah this looks good.", correcting a text with an LLM is not that hard.

Bankai (卍解) — the first post-training adaptation method for true 1-bit LLMs.

Posted by Turbulent-Sky5396@reddit | LocalLLaMA | View on Reddit | 117 comments

[-]

Yu2sama@reddit

...Tensa Zangetsu

Gemma 4 released

Posted by garg-aayush@reddit | LocalLLaMA | View on Reddit | 80 comments

[-]

Yu2sama@reddit

Yes... Apache, then my wish came true lol

Gemma time! What are your wishes ?

Posted by Specter_Origin@reddit | LocalLLaMA | View on Reddit | 145 comments

[-]

Yu2sama@reddit

Better license for finetuners ( though I doubt is gonna happen) I would be happy if it just gets better at creative writing.

LocalLLaMA 2026

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 133 comments

[-]

Yu2sama@reddit

Llama was a big thing at the time, and today the llama is more of a symbol of Open-Source llms than a Meta product. Is a good mascot, similar to Linux Penguin.

Friendly reminder inference is WAY faster on Linux vs windows

Posted by triynizzles1@reddit | LocalLLaMA | View on Reddit | 111 comments

[-]

Yu2sama@reddit

Not a big fan of how it handle it's files. I prefer a setup more akin to Comfy + A1111/Forge Neo, where all my models live in the same directory. Ollama wants it's own scheme that breaks my flow with KoboldCPP, so yeah, if I am going to use a llama.cpp wrapper, Kobold does the job just fine (with it's own issues of course, but those I don't mind).

Do LLMs get "lazy" outside of normal 9-to-5 hours?

Posted by DerBasti85@reddit | LocalLLaMA | View on Reddit | 18 comments

[-]

Yu2sama@reddit

There are probably things in your conversation that are causing a downgrade. Models are more prone to stupid tokens than one may expect and some failures done at the start of the conversation can come up and bite your ass later. Models don't perform bad just because, there is always a cause to an effect even if we didn't notice it.

The AI releases hype cycle in a nutshell

Posted by GreenBird-ee@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

Yu2sama@reddit

The easiest way to nerf a model is just serving lower quants. There is a monetary incentive to reduce compute while maintaining close to optimal performance in the model. For certain areas you wouldn't notice this as much due to how big these models are, but there will always be people that claim the model got dumber. That happens all the time with Gemini for example, and while I don't think is necessarily that they are serving lower quants, it could also explain the downgrade.

Anyway to get close to GPT4o on a local model (I know it’s a dumb question)

Posted by octopi917@reddit | LocalLLaMA | View on Reddit | 82 comments

[-]

Yu2sama@reddit

I would recommend you make a LLM critique the prompt using deep research (Kimi or GLM are great at this). Sometimes LLMs get too lost in the sauce of the context, and they ignore things that from outside other's would totally notice. Similar to strategy games, outsiders normally have a better view of the whole picture.

Qwen3.5-27B-Claude-4.6-Opus-Uncensored-V2-Kullback-Leibler-GGUF

Posted by EvilEnginer@reddit | LocalLLaMA | View on Reddit | 77 comments

[-]

Yu2sama@reddit

Can you tell me how do you do that without the main style of WAI ANI sanitizing your results? Quite interested in the process.

Qwen3.5-27B-Claude-4.6-Opus-Uncensored-V2-Kullback-Leibler-GGUF

Posted by EvilEnginer@reddit | LocalLLaMA | View on Reddit | 77 comments

[-]

Yu2sama@reddit

Is this with lora or artist name? I haven't gotten good results on more artistic gens in Wai, I usually just use NTRMix due to that.

Assistant_Pepe_70B, beats Claude on silly questions, on occasion

Posted by Sicarius_The_First@reddit | LocalLLaMA | View on Reddit | 77 comments

[-]

Yu2sama@reddit

Still waiting for an Impish quality llama 3.x 8B 🙏

prompting help

Posted by ProfessionalDraw2315@reddit | LocalLLaMA | View on Reddit | 3 comments

[-]

Yu2sama@reddit

Use a good LLM to critique and optimize your prompt. You would still need to make a good prompt for this critic though. Relevant to know, is more tedious if you are just throwing stuff at the llm without undestanding how to prompt in a efficient and simple manner. I would recommend to read papers on prompting, there are really good resources out there to help you get the best results and switch up the way you see prompting.

Local replacement GGUF for Claude Sonnet 4.5

Posted by SmithDoesGaming@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

Yu2sama@reddit

It depends on the fine-tune, I would recommend you to take a look at the [Sillytavern MegaThread](https://www.reddit.com/r/SillyTavernAI/comments/1s10uk6/megathread_best_modelsapi_discussion_week_of/) to see a few models and ask around what could help you with that. There are really good options, most Mistral models are quite good at roleplay. From the get go, I haven't tested Claude Sonnet but don't get your hopes up on something of similar quality or intelligence.

Are we currently in a "Golden Time" for low VRAM/1 GPU users with Qwen 27b?

Posted by inthesearchof@reddit | LocalLLaMA | View on Reddit | 117 comments

[-]

Yu2sama@reddit

In this situation atm, run sdxl and Z-turbo pretty smoothly, on llm side qwen 3.5 9b and most MN 12b.

I feel like if they made a local model focused specifically on RP it would be god tier even if tiny

Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 27 comments

[-]

Yu2sama@reddit

I think people haven't reached the ceiling of Fine-tuning yet. With so many growing techniques, there is a lot that can be done to improve base models. The issue is that, well, it is costly. The best solutions are the ones that needs more expending. Is also difficult because not many can distingish a good fine-tune from a bad one. Meanwhile in Image Generation, you can distinguish a stellar fine-tune pretty easy, but also the bad ones. We need our own IllustriouXL for LLMs, the best fine-tunes of today are closer to PonyXL in quality.

I feel like if they made a local model focused specifically on RP it would be god tier even if tiny

Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 27 comments

[-]

Yu2sama@reddit

Finetuning is a tricky subject, not every finetune is good, I will argue that most are mid at best.

Alibaba confirms they are committed to continuously open-sourcing new Qwen and Wan models

Posted by TKGaming_11@reddit | LocalLLaMA | View on Reddit | 79 comments

[-]

Yu2sama@reddit

Small models need more hand holding. A big model is smart enough (at times) to discern your intentions and do the work. A Small model will struggle with that, but with a good promt they can be very competitive. There is some paper about that, a Phi 2, Orca one and "Does model size matter?". If you are interested you can take a look at them.

I've seen a lot of Opus 4.6 distills, why not 5.4 pro?

Posted by FusionCow@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

Yu2sama@reddit

I think is mostly due to the dataset of the thinking side being available. Yeah you could do synthetic data but, why would they? There is data already available for use without the need of those complications. Cutting corners is totally valid in FOSS, even if GPT is smarter, from what I have seen they are not trying to make the distilled models be smarter but think better.

Why 90% of AI chatbots feel like they’re stuck in 2024.

Posted by Legendary_Outrage@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

Yu2sama@reddit

Didn't you just mentioned the implementation everyone uses already? Like name me 5 services that don't do this lol

Every LLM has a default voice and it's making us all sound the same

Posted by prokajevo@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

Yu2sama@reddit

You really can't expect to change how a model talks with just a few sentences. The way I do it is build a prompt with the style I desire (syntax, sentence structure, rhythm, language/wording, and what to avoid) plus a small snippet example. As the AI writes with this style in the context it continues to drink from it. Though some models (Like chatgpt) suck more at this.

Qwen3.5-9B-Claude-4.6-Opus-Uncensored-Distilled-GGUF

Posted by EvilEnginer@reddit | LocalLLaMA | View on Reddit | 213 comments

[-]

Yu2sama@reddit

Would really like to see this one on the UGI leaderboard

What is after Qwen ?

Posted by j_lyf@reddit | LocalLLaMA | View on Reddit | 17 comments

[-]

Yu2sama@reddit

Ah sorry, totally missed that

What is after Qwen ?

Posted by j_lyf@reddit | LocalLLaMA | View on Reddit | 17 comments

[-]

Yu2sama@reddit

Mistral

What do you end up doing with personal projects that were heavily assisted by an LLM?

Posted by derekp7@reddit | LocalLLaMA | View on Reddit | 11 comments

[-]

Yu2sama@reddit

I am 100% the other two comments are bots lol

Fine-tuned Qwen3 SLMs (0.6-8B) beat frontier LLMs on narrow tasks

Posted by Jolly-Gazelle-6060@reddit | LocalLLaMA | View on Reddit | 82 comments

[-]

Yu2sama@reddit

Roleplay finetunes is the only place I don't think this is true. (Also some Heretic finetunes showcase more intelligence but is not the norm).

Qwen dev on Twitter!!

Posted by Difficult-Cap-7527@reddit | LocalLLaMA | View on Reddit | 61 comments

[-]

Yu2sama@reddit

I hoped for a small creative writing model, haven't gotten one of those in a while

The walled garden gets higher walls: Anthropic is adding weekly rate limits for paid Claude subscribers

Posted by Resident_Egg5765@reddit | LocalLLaMA | View on Reddit | 48 comments

[-]

Yu2sama@reddit

The would have noticed it without people talking, their job is to maintain their model and I am pretty sure they monitor those spikes in usage.

Any Rpers test the new qwen 2507 yet?

Posted by Antique_Bit_1049@reddit | LocalLLaMA | View on Reddit | 3 comments

[-]

Yu2sama@reddit

From my testing, it performs really well. It follows stylistic instructions and matches the style of the character. I was quite surprised by how good it is. For my roleplay session, it was on a par with Deepseek and, at times, even better than it. (Non-thinking mostly) Still, you will get very mixed results as everyone searches different things during their roleplays, you should try it yourself and test with things that you know X models excels at and see if it matches them or not.

Qwen’s TRIPLE release this week + Vid Gen model coming

Posted by koc_Z3@reddit | LocalLLaMA | View on Reddit | 35 comments

[-]

Yu2sama@reddit

Try reading this line aloud bro: "I once called Alibaba “the first Chinese LLM team to evolve from engineering to product.” This week, I need to upgrade that take: it’s now setting the release tempo and product standards for open-source AI." Just try it.