Why no one here talking about zamba2-7b?
Posted by TMTornado@reddit | LocalLLaMA | View on Reddit | 42 comments
Apparently it beats Mistral, llama 8b and Gemma.
Posted by TMTornado@reddit | LocalLLaMA | View on Reddit | 42 comments
Apparently it beats Mistral, llama 8b and Gemma.
Maykey@reddit
Their demo space with instruction model was not impressive: my first goto prompt about Cirno working starting to work as overconfident maid with no self awareness was broken: Yuuka in fanfic lived in SDM, she doesn't. Cirno wasn't overconfident as she was told to be in the prompt.
My second goto prompt was ZORK-like text adventure game. The model chose COYA style and listed possible options.
Maybe once long context kicks in, it will be good. But first impressions, based on their demo space, are not good.
But even bigger hurdle is the setup. I'm definitely not going for some time to make new venv with yet another copy of torch, spend an hour compiling another copy of mamba2, only to copy their fork of transformers because they can't be bothered to make
modelling_zamba2.py
alongside the model and usetrust_remote_code.
.Maybe later. But it requires cherry picking of code from their lora until the model works.
MerePotato@reddit
Pop culture knowledge aint a good test case for a 7b model
Maykey@reddit
It's the perfect test case, as is exactly kind of content I use other 7B models for.
GirthusThiccus@reddit
The tradeoff that smaller models have in turn for their performance and coherency is that they can't store and represent as much world knowledge as larger models can, exactly for the fact that those 7-11b parameters are needed just to make it clever.
If you really need to have the model know about random trivia, and both it and your PC support sufficient context, building good world info is the way to go.
Putting a small high-performance model down for it dedicating it's mind space to genuine abilities rather than random knowledge is silly, without the tavern.
Maykey@reddit
I don't compare it to larger models. I compare it to the models of about the same size, quantized on top of that.
No. Had no need to do it for more than a year now. There is a reason I've used such short prompt: I expect it to work like it does with other models. Other 7-11B models.
MerePotato@reddit
Don't you encounter a tonne of hallucination using a model in that size class for pop cultural knowledge
Maykey@reddit
No, or I wouldn't use them for that
Specialist-Scene9391@reddit
Because is really not that good! You need to beat claude 3.5 and gpt4 o to be recognized!
Independent_Try_6891@reddit
No, its because wen gguf
Difficult_Face5166@reddit
Did not have time to check fully yet
Saym@reddit
Why is no one talking about mini lasagnas in a loaf pan?
Life-Baker7318@reddit
I'm dropping a video later be sure to check out my course
some1else42@reddit
And do you have a newsletter? I would like to subscribe.
whomthefuckisthat@reddit
I’m listening
Disastrous-Peak7040@reddit
Cmon, I’m getting tired of this llm stuff so don’t tease us. Do they taste nice and will they cook on top of 3000 watts of gpu?
Decaf_GT@reddit
"apparently it beats x, y, and z" is a thing that every single new model claims, or a thing that people claim about a new model.
On top of that, a lack of GGUF support means it's out of reach for a lot of us.
The last time I looked for a way to easily have safetensor files be the backend for OpenWeb UI, I found it too difficult and gave up.
Future_Might_8194@reddit
Bc it blows balloons. I won't reiterate everyone else's reasons, but the comments are spot on with it.
ArsNeph@reddit
Intriguing. One of the main issues with Mamba is that it didn't quantize well. Is this Zamba model quantizable? If it cannot be brought down to at least 8 bit, that may greatly hinder adoption.
Maykey@reddit
Some quantization should possible, but it'll require tweaking: disable memory efficient path and then quantize(HQQ) mamba's in_proj and out_proj which looks to take majority of weights. llama.cpp doesn't support mamba2 yet.
Also there is a lot of 68 mamba layers so better hope that non efficient path is not that worse
As is, image hurts: it's barely usable on even 16GB
ArsNeph@reddit
Huh, interesting. Unfortunately, if post quantization is not able to fit on 8GB, most people will just ignore this model, since anyone with 24GB is running Mistral Small 22B or Qwen 2.5 32B. I wonder if it has any other unique advantages, like better scaling for long context.
robberviet@reddit
Gguf yet? If not then what to discuss?
teachersecret@reddit
In reality, it doesn't write well. I tested it out, and I wouldn't use it for my own writing related work.
Not sure what it is good at, but I didn't like the "feel" of this model when I played with it. Maybe with a good tune?
XMasterDE@reddit
The model was only pre-trained on 2T tokens. I'm not saying that it is a bad model, but I really don't think that, in reality, the model is on par with Llama3 8B or Gemma 9B or the latest Mistral 7B.
Because of that, I don't think that the benchmarks they have published accurately describe the real-world performance of this model.
Co0lboii@reddit
Ah my friend shared this too. Was waiting for the discussion on it to appear
kryptkpr@reddit
It's a base model, not an instruction tune and it's a mamba2-inspired architecture so no quants or support in any of the major inference backends.
Academic stuff
ninjasaid13@reddit
so does https://hychiang.info/projects/quamba/ work on mamba2?
nabokovian@reddit
What are some examples of inference backends? (Google not super helpful)
ekaj@reddit
Llama.cpp, vllm, exllama
jd_3d@reddit
Instruct model: https://huggingface.co/Zyphra/Zamba2-7B-Instruct
wahnsinnwanscene@reddit
Other than the mamba, what additional tweaks were there?
Maykey@reddit
Shared MLP with additional LoRA for each layer.
AlgorithmicKing@reddit
because it doesn't have vision? and more probably because it was released today, and no one knows about it.
Scott_Tx@reddit
got a link for the ggufs? :P
bearbarebere@reddit
From simply searching huggingface, "zamba gguf" turns up nothing as of now
Scott_Tx@reddit
thus answering the question why no one is talking about it
sunshinecheung@reddit
can it beat qwen2.5 7b?
Ylsid@reddit
Are you operating on dog time? Does it feel like a whole week of zero discussion has happened for you?
xXWarMachineRoXx@reddit
Lol
Thistleknot@reddit
I tried to setup zamba2 a while ago, and found out rather rudely that it doesn't support compute 6.0
MMAgeezer@reddit
Both Zyphra and this model look rather great. I look forward to trying this out!
ThinkExtension2328@reddit
Can it be run on ollama if no there is your answer
Feztopia@reddit
Probably because we are busy reading about it