Here's a [janky frankenMoE with 27.9b params.](https://huggingface.co/mradermacher/Knight-Miqu-27B-MoE-GGUF/tree/main) Should give you a good idea of file size at various bpw.
32GB or 24GB would very likely be very nice (assuming one can quantize their model to Q8/Q7/Q6 nicely).
20G should similarly be expected to be usefully good as above with some more moderate quantization.
16GBy + maybe some RAM offload should be possible and still reasonably fast / usefully good assuming one can get useful quality with a ~Q4 quantization of the particular model.
They've been making slow but steady progress, though. Bard was clown shoes at launch, and never really got *good*, but it did steadily improve to the point where it wasn't a complete embarrassment. Gemini was a big step up from that, and gemini pro right now trades blows with GPT4 in certain tasks.
You got to remember, there's a bit of a desert for mid-size hobbyist models. There's command-R 35b, but it lacks GQA so it's basically unusable if you want more than maybe 2-4k context. There are a few Chinese models like Yi, internLM, and Qwen, but they're very... Chinese. Weird and janky implementation and trained so heavily on Chinese data that their English performance suffers quite a bit. Yi has proved more or less impossible to tame, even with extensive finetuning. Nobody wants to risk throwing more effort into Chinese models when the return is likely to be the same.
Currently, there's *nothing new or interesting* between ~7b models at the low end and 70b models at the high end except mixtral - which still won't fit into 24GB of VRAM without resorting to a small, crappy quant. So when a big western company who has been doing some decent (if not spectacular) LLM work lately says they're about to publish weights for a mid-sized model that will fit into 24GB easily or even 16GB at a squeeze, people are right to get excited about it. Maybe it falls flat on its face, but realistically, what else does anybody have to look forward to in this category?
https://preview.redd.it/ph82x81aig0d1.png?width=1999&format=png&auto=webp&s=70e049e8c0b745b9e56d32167fa0d724ea2c752c
Not bad at all. Still pretraining according to the article.
[https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/](https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/)
If this benchmark translates into actual performance, that would be a great model for local use. Q6 at ≈ 20GB and Q4 at ≈ 13,5GB seems like a sweet spot for 24GB and 16GB GPUs respectively.
That's going to depend a lot on context efficiency. You *wouldn't think* any competent company would release a model without GQA in this day and age, but command-R 35b shows that there's still plenty of bad-decision-making going around.
Competition‘s getting really stiff lately. I remember when Falcon 180b came out and everybody was impressed, it was the only thing ppl talked about, even though like 4 people could run it and it sucked. Now half the new releases don’t even get llama.cpp support (the mark of approval basically imo) because they just don’t stack up. It’s great.
There’s little space for mistral to one up llama3, considering that wizardlm 2 is still being detoxed. Basically OpenAI has to remain the ms flagship, open source was fun when the difference was great, or is fun with small models. But I don’t think we will large models coming from ms until the gap is big enough to place some other open source models in there
It does not inspire confidence for such a small model to be announced but not released.
Think back, how many of the open weight models got announced but not released immediately?
It's still in pretraining, like Llama3's 400b model. The benchmarks are from the latest checkmark. They announced it today because today was the I/O conference.
https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/
>Gemma 2 is still pretraining. This chart shows performance from the latest Gemma 2 checkpoint along with benchmark pretraining metrics.
23 Comments
CatalyticDragon@reddit
candre23@reddit
Calcidiol@reddit
rawednylme@reddit
candre23@reddit
rerri@reddit
CesarBR_@reddit
candre23@reddit
IndicationUnfair7961@reddit
Eralyon@reddit
lemon07r@reddit
AlgorithmicKing@reddit
Master-Meal-77@reddit
IndicationUnfair7961@reddit
sky-syrup@reddit
MoffKalast@reddit
Able-Locksmith-1979@reddit
Key_Run8379@reddit
Spindelhalla_xb@reddit
pseudonerv@reddit
AnticitizenPrime@reddit
toothpastespiders@reddit
OkQuietGuys@reddit