Drummer's Skyfall 31B v4.2 aka SKYFALL-31B-V4.2-UNCENSORED-OPUS-4.6-ROLEPLAYING-100000X-XTREME-VALUE
Posted by TheLocalDrummer@reddit | LocalLLaMA | View on Reddit | 28 comments
Yes, Google stole my proprietary model size (31B). Yes, I plan to tune all the Gemma 4 models. Join us, and support the mission! Thank you all for the love <3
Specter_Origin@reddit
can someone make finetune of 26b-b4a which is better at function calling for opencode and cline, it seems to fall flat overtime on complex write calls xD
Specter_Origin@reddit
Btw, it was a parser issue on llama.cpp which has been fixed in new release, so if you are experiencing it please update your llama.cpp.
LoveMind_AI@reddit
joining the choir of people who very much want a drummer Gemma 4!
Nrgte@reddit
Okay what's the difference compared to 4.1?
Internet-Buddha@reddit
How does this compare to Magidonia, which is one of my favorite models!
rc_ym@reddit
It's worth it to try. I like it. It seems to have richer language but a very, very slight increase in non-sequiturs and impossiblisms. Very similar performance even with the larger size.
Sirosky@reddit
It's an upscale of Mistral Small, so it'll be better just by virtue of being larger. But in general, this model is exceptional, even by upscale standards.
AnonLlamaThrowaway@reddit
Can you explain what an "upscale" means vs a regular fine tune?
ttkciar@reddit
It works by using Goddard's mergekit (or equivalent technology) to make something called a "passthrough self-merge". This means making a new model from the first two-thirds of its layers (or thereabouts; it usually needs some trial and error to find the right cut-off) and the last two-thirds of its layers, appended to each other.
This results in a model about 30% larger, because some middle layers have been duplicated. This has two effects:
Heuristics (generalized knowledge) which are encoded in this middle layer get applied twice, which means they present more strongly in the inference result,
It adds some redundancy to the model's parameters, so that further training is less likely to obliterate something important (what the field calls "catastrophic forgetting"). The optimizer (AdamW or whatever) is able to repurpose some of those parameters to encode new heuristics without losing the old ones.
The theory of why this works is still very much under development, but David Ng has been developing what he calls RYS theory which describes part of it. You could look him up if you want to learn more about it.
Chief_Broseph@reddit
Something similar to the RYS method? Been waiting for a good rp finetune of that one.
ttkciar@reddit
Yes, you will notice I mention RYS theory in the last paragraph.
Sirosky@reddit
My layman's understanding is that additional layers were added on top of the model before tuning, resulting in a fatter but (hopefully) superior finetune. All the Skyfall models back to v1 are upscales of Mistral Small and its derivatives.
Folks on the Discord server did a blind test of Skyfall v2 vs. the same-generation Cydonia, and the preference was overwhelmingly for Skyfall, so it seems like upscaling does work, even if it comes at the expense of the model's VRAM requirements / speed.
freia_pr_fr@reddit
Should we start a r/locallamacirclejerk ?
TheLocalDrummer@reddit (OP)
It exists. You just need to add one more L
r/localllamacirclejerk
evenyourcopdad@reddit
be the change you want to see in the world
unless you're not going to do it right. then just leave it for someone that will.
DragonfruitIll660@reddit
Crazy name lmao, ty for all the finetunes.
Altruistic_Heat_9531@reddit
not even a week https://www.reddit.com/r/LocalLLaMA/comments/1salgre/comment/odwsxjg/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
mycall@reddit
Please fix your link
-dysangel-@reddit
I'm waiting for v2
9r4n4y@reddit
ðŸ˜yeah really crazy name
fractalcrust@reddit
damn i'm still on SKYFALL-31B-V4.2-UNCENSORED-OPUS-4.6-ROLEPLAYING-10000 i really need to upgrade
Hoppss@reddit
It's a free model for ya Jim!
seamonn@reddit
Time for Big Tiger Gemma 4 :D
MSXzigerzh0@reddit
You should Sue
Sirosky@reddit
As the name suggests, this model is peak (been testing it for about a month before the official release).
ttkciar@reddit
That's great to hear! :-) Will there be a Big Tiger anti-sycophancy fine tune? Big-Tiger-Gemma-27B-v3 has been a serious workhorse!
Thanks for all you do! Waiting on the edge of my seat
jacek2023@reddit
Waiting for Drumminggemmas
LegacyRemaster@reddit
The name I needed