Drummer's Cydonia 24B v3 - A Mistral 24B 2503 finetune!

[-]

gcavalcante8808@reddit

In my experience 22/24b are the ones that I had good experience on my 7900xtx card.

Reply

[-]

RedditSucksMintyBall@reddit

Do you overclock your card for LLM stuff? I recently got the same one.

Reply

[-]

RottenPingu1@reddit

Curious for any pointers in using this card as mine shows up this week...

Reply

[-]

27B with Q5\_K\_L bartowski quants is the sweetspot for me for \~16k context, with some headroom for more context if needed. 31B should fill that headroom, but might be reasonable. I just don't like to let too much layers/context bleed into my slow DDR4 RAM, I guess. System: 24GB VRAM + 64 GB DDR4 RAM

Reply

[-]

Mr_Moonsilver@reddit

For the uninitiated, what is this?

Reply

[-]

logseventyseven@reddit

their previous models are very popular for RP and writing

Reply

[-]

Glittering-Bag-4662@reddit

31B is fine for me

Reply

[-]

whiskers_z@reddit

Any notes on how this differs from v2.1? Granted I'm all the way down at Q2, but while this was still impressive on my initial test, v2.1 was a freaking magic trick.

Reply

[-]

paranoidray@reddit

I love 24b models, 22b would be even better I think for some room to spare.

Reply

[-]

Iory1998@reddit

I have an RTX3090, and in my opinion, I'd rather have a model at Q6 with a large context size than a Q4 with a limited context. Also, I am not sure if upscaling a 24B model would do it any good. If it were, don't you think the labs that created those models would have already being doing that?

Reply

[-]

Phocks7@reddit

In my experience lower quants of higher parameter models perform better than higher quants of lower parameter models. eg Q4 123b > Q6 70b.

Reply

[-]

blahblahsnahdah@reddit

Agreed. It's not a small difference either, even a Q3 of a huge model will blow away a Q8 of equivalent weights filesize when it comes to commonsense reasoning (I make no claims about benchmark scores).

Reply

[-]

AppearanceHeavy6724@reddit

Not sure about that.Qwen 2.5 instruct 32b iq3xs completely fell apart in fiction compared to 14b q4km. The latter sucked too as qwen 2.5 is unusable for creative writing anyway.

Reply

[-]

blahblahsnahdah@reddit

32B isn't huge! We're talking about 100B plus. Yeah, small models have unusable brain damage at low quants.

Reply

[-]

TheRealMasonMac@reddit

[https://github.com/QwenLM/ParScale](https://github.com/QwenLM/ParScale) is probably more interesting

Reply

[-]

SomeoneSimple@reddit

>Also, I am not sure if upscaling a 24B model would do it any good. If it were, don't you think the labs that created those models would have already being doing that? My thoughts as well. I mean, the only guys that making are bank off LLM's are doing the [the exact opposite](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1).

Reply

[-]

_Cromwell_@reddit

In ggufs, what are the ones that are _NL for? Or what do they do differently then the normal Imatrix?

Reply

[-]

toomuchtatose@reddit

For ARM devices, the inference speeds could be 1.5x to 8x faster.

Reply

[-]

SkyFeistyLlama8@reddit

Use the IQ4_NL or Q4_0 GGUF files if you're running on ARM CPUs like Snapdragon X or Ampere. I prefer Q4_0 for Snapdragon X because the Adreno OpenCL backend also supports this format, so you can fast inference on both CPU and GPU backends.

Reply

[-]

_Cromwell_@reddit

Ahhh... okay. So it's for ARM. thanks

Reply

[-]

Quazar386@reddit

The main thing about the IQ4\_NL quant from what I can understand is that it uses a non-linear quantization technique with a non-uniform codebook designed to better match LLM weight distributions. For practical uses though most people use IQ4\_XS as it has very similar (within margin of error) KL divergence as IQ4\_NL with better space savings or Q4\_K\_S for overall faster speeds. So IQ4\_NL does not really have much of a place in practical uses as other quants either have better space savings or faster speeds with similar KL divergence.

Reply

[-]

_Cromwell_@reddit

Thanks. Almost seems like there's too many options because people can't decide what's best. :) Or there's still debate on what's best. So people who prep these things just prep everything for everybody I guess, to avoid complaints they left something out.

Reply

[-]

SkyFeistyLlama8@reddit

I just wanna know how this would compare to Valkyrie Nemotron 49B. That's a sweet model but it's huge.

Reply

[-]

-Ellary-@reddit

Well, just download it, run it, test it, sniff it, rub it, what the point listening to random people, What if I will say that it is better than Valkyrie? On my own specific nya cat girl test?

Reply

[-]

Abandoned_Brain@reddit

The problem some people have is that their ISP (at least, in the US) will have bandwidth caps of some type in place. Grabbing an 18GB model sight-unseen (and that's a problem with Huggingface, less than about 1/4 of the models have cards which actually detail what the models actually are recommended for) can kill most hotspots' bandwidth for the month. I agree somewhat with you. It's a great time to be an AI hobbyist because you can download a different AI "brain" full of knowledge and personality every 5 minutes if you wanted to, but doing that causes other issues downstream for people. I had to block my model folder in my backup apps because they were constantly copying these new models to the cloud. My storage started costing me a lot more than previous months, which took a bit for me to figure out. :) BTW, where's your nya cat girl test, would be interested in testing it myself... :D

Reply

[-]

IrisColt@reddit

Heh!

Reply

[-]

MidAirRunner@reddit

Have you used it? How good is it?

Reply

[-]

RickyRickC137@reddit

What are the recommended temperature and other parameters?

Reply

[-]

Echo9Zulu-@reddit

Thanks for your work! So if we throw away questions about inference capability and just look at what benefits higher parameter counts provide, what do you think we gain from having more in this case?

Reply

[-]

LagOps91@reddit

31b sounds good for 24gb assuming context isn't too heavy. I would want to run either 16k or preferably 32k context without quanting context (for some reason quanting context is really slow for me).

Reply

Drummer's Cydonia 24B v3 - A Mistral 24B 2503 finetune!

Reply to Post

31 Comments

gcavalcante8808@reddit

RedditSucksMintyBall@reddit

gcavalcante8808@reddit

RottenPingu1@reddit

NimbzxAkali@reddit

Mr_Moonsilver@reddit

logseventyseven@reddit

Glittering-Bag-4662@reddit

whiskers_z@reddit

paranoidray@reddit

Iory1998@reddit

Phocks7@reddit

blahblahsnahdah@reddit

AppearanceHeavy6724@reddit

blahblahsnahdah@reddit

TheRealMasonMac@reddit

SomeoneSimple@reddit

_Cromwell_@reddit

toomuchtatose@reddit

SkyFeistyLlama8@reddit

_Cromwell_@reddit

Quazar386@reddit

_Cromwell_@reddit

SkyFeistyLlama8@reddit

-Ellary-@reddit

Abandoned_Brain@reddit

IrisColt@reddit

MidAirRunner@reddit

RickyRickC137@reddit

Echo9Zulu-@reddit

LagOps91@reddit