Why no one here talking about zamba2-7b?

[-]

Maykey@reddit

Their demo space with instruction model was not impressive: my first goto prompt about Cirno working starting to work as overconfident maid with no self awareness was broken: Yuuka in fanfic lived in SDM, she doesn't. Cirno wasn't overconfident as she was told to be in the prompt.

My second goto prompt was ZORK-like text adventure game. The model chose COYA style and listed possible options.

Maybe once long context kicks in, it will be good. But first impressions, based on their demo space, are not good.

But even bigger hurdle is the setup. I'm definitely not going for some time to make new venv with yet another copy of torch, spend an hour compiling another copy of mamba2, only to copy their fork of transformers because they can't be bothered to make modelling_zamba2.py alongside the model and use trust_remote_code..

Maybe later. But it requires cherry picking of code from their lora until the model works.

[-]

MerePotato@reddit

Pop culture knowledge aint a good test case for a 7b model

[-]

Maykey@reddit

It's the perfect test case, as is exactly kind of content I use other 7B models for.

[-]

GirthusThiccus@reddit

The tradeoff that smaller models have in turn for their performance and coherency is that they can't store and represent as much world knowledge as larger models can, exactly for the fact that those 7-11b parameters are needed just to make it clever.

If you really need to have the model know about random trivia, and both it and your PC support sufficient context, building good world info is the way to go.

Putting a small high-performance model down for it dedicating it's mind space to genuine abilities rather than random knowledge is silly, without the tavern.

[-]

Maykey@reddit

larger models

I don't compare it to larger models. I compare it to the models of about the same size, quantized on top of that.

If you really need to have the model know about random trivia, and both it and your PC support sufficient context, building good world info is the way to go.

No. Had no need to do it for more than a year now. There is a reason I've used such short prompt: I expect it to work like it does with other models. Other 7-11B models.

[-]

MerePotato@reddit

Don't you encounter a tonne of hallucination using a model in that size class for pop cultural knowledge

[-]

Maykey@reddit

No, or I wouldn't use them for that

[-]

Specialist-Scene9391@reddit

Because is really not that good! You need to beat claude 3.5 and gpt4 o to be recognized!

[-]

Independent_Try_6891@reddit

No, its because wen gguf

[-]

Difficult_Face5166@reddit

Did not have time to check fully yet

[-]

Saym@reddit

Why is no one talking about mini lasagnas in a loaf pan?

[-]

Life-Baker7318@reddit

I'm dropping a video later be sure to check out my course

[-]

some1else42@reddit

And do you have a newsletter? I would like to subscribe.

[-]

whomthefuckisthat@reddit

I’m listening

[-]

Disastrous-Peak7040@reddit

Cmon, I’m getting tired of this llm stuff so don’t tease us. Do they taste nice and will they cook on top of 3000 watts of gpu?

[-]

Decaf_GT@reddit

"apparently it beats x, y, and z" is a thing that every single new model claims, or a thing that people claim about a new model.

On top of that, a lack of GGUF support means it's out of reach for a lot of us.

The last time I looked for a way to easily have safetensor files be the backend for OpenWeb UI, I found it too difficult and gave up.

[-]

Future_Might_8194@reddit

Bc it blows balloons. I won't reiterate everyone else's reasons, but the comments are spot on with it.

[-]

ArsNeph@reddit

Intriguing. One of the main issues with Mamba is that it didn't quantize well. Is this Zamba model quantizable? If it cannot be brought down to at least 8 bit, that may greatly hinder adoption.

[-]

Maykey@reddit

Some quantization should possible, but it'll require tweaking: disable memory efficient path and then quantize(HQQ) mamba's in_proj and out_proj which looks to take majority of weights. llama.cpp doesn't support mamba2 yet.

Also there is a lot of 68 mamba layers so better hope that non efficient path is not that worse

As is, image hurts: it's barely usable on even 16GB

[-]

ArsNeph@reddit

Huh, interesting. Unfortunately, if post quantization is not able to fit on 8GB, most people will just ignore this model, since anyone with 24GB is running Mistral Small 22B or Qwen 2.5 32B. I wonder if it has any other unique advantages, like better scaling for long context.

[-]

robberviet@reddit

Gguf yet? If not then what to discuss?

[-]

teachersecret@reddit

In reality, it doesn't write well. I tested it out, and I wouldn't use it for my own writing related work.

Not sure what it is good at, but I didn't like the "feel" of this model when I played with it. Maybe with a good tune?

[-]

XMasterDE@reddit

The model was only pre-trained on 2T tokens. I'm not saying that it is a bad model, but I really don't think that, in reality, the model is on par with Llama3 8B or Gemma 9B or the latest Mistral 7B.
Because of that, I don't think that the benchmarks they have published accurately describe the real-world performance of this model.

[-]

Co0lboii@reddit

Ah my friend shared this too. Was waiting for the discussion on it to appear

[-]

kryptkpr@reddit

It's a base model, not an instruction tune and it's a mamba2-inspired architecture so no quants or support in any of the major inference backends.

Academic stuff

[-]

ninjasaid13@reddit

so no quants or support in any of the major inference backends.

so does https://hychiang.info/projects/quamba/ work on mamba2?

[-]

Zyphra's team is committed to democratizing advanced AI systems, exploring novel architectures on the frontier of performance, and advancing the scientific study and understanding of powerful models. We look forward to collaborating with others who share our vision.

Both Zyphra and this model look rather great. I look forward to trying this out!

[-]

ThinkExtension2328@reddit

Can it be run on ollama if no there is your answer

[-]

Feztopia@reddit

Probably because we are busy reading about it