Google Gemma 2 27B will be released in June

[-]

CatalyticDragon@reddit

27B, what sort of VRAM usage are we looking at here?

Reply

[-]

candre23@reddit

Here's a [janky frankenMoE with 27.9b params.](https://huggingface.co/mradermacher/Knight-Miqu-27B-MoE-GGUF/tree/main) Should give you a good idea of file size at various bpw.

Reply

[-]

32GB or 24GB would very likely be very nice (assuming one can quantize their model to Q8/Q7/Q6 nicely). 20G should similarly be expected to be usefully good as above with some more moderate quantization. 16GBy + maybe some RAM offload should be possible and still reasonably fast / usefully good assuming one can get useful quality with a ~Q4 quantization of the particular model.

Reply

[-]

rawednylme@reddit

Is anyone really interested in anything from Google? They haven't exactly been hitting out the park lately.

Reply

[-]

candre23@reddit

They've been making slow but steady progress, though. Bard was clown shoes at launch, and never really got *good*, but it did steadily improve to the point where it wasn't a complete embarrassment. Gemini was a big step up from that, and gemini pro right now trades blows with GPT4 in certain tasks. You got to remember, there's a bit of a desert for mid-size hobbyist models. There's command-R 35b, but it lacks GQA so it's basically unusable if you want more than maybe 2-4k context. There are a few Chinese models like Yi, internLM, and Qwen, but they're very... Chinese. Weird and janky implementation and trained so heavily on Chinese data that their English performance suffers quite a bit. Yi has proved more or less impossible to tame, even with extensive finetuning. Nobody wants to risk throwing more effort into Chinese models when the return is likely to be the same. Currently, there's *nothing new or interesting* between ~7b models at the low end and 70b models at the high end except mixtral - which still won't fit into 24GB of VRAM without resorting to a small, crappy quant. So when a big western company who has been doing some decent (if not spectacular) LLM work lately says they're about to publish weights for a mid-sized model that will fit into 24GB easily or even 16GB at a squeeze, people are right to get excited about it. Maybe it falls flat on its face, but realistically, what else does anybody have to look forward to in this category?

Reply

[-]

rerri@reddit

https://preview.redd.it/ph82x81aig0d1.png?width=1999&format=png&auto=webp&s=70e049e8c0b745b9e56d32167fa0d724ea2c752c Not bad at all. Still pretraining according to the article. [https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/](https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/)

Reply

[-]

CesarBR_@reddit

If this benchmark translates into actual performance, that would be a great model for local use. Q6 at ≈ 20GB and Q4 at ≈ 13,5GB seems like a sweet spot for 24GB and 16GB GPUs respectively.

Reply

[-]

candre23@reddit

That's going to depend a lot on context efficiency. You *wouldn't think* any competent company would release a model without GQA in this day and age, but command-R 35b shows that there's still plenty of bad-decision-making going around.

Reply

[-]

IndicationUnfair7961@reddit

If it's not censored like crazy it could be a good model for those without the requirements to run LLAMA3 70B.

Reply

[-]

Eralyon@reddit

Just pray the uncensoring gods to apply their magic...

Reply

[-]

lemon07r@reddit

If it can beat yi34b and qwen 1.5 32b (and phi 14b when it's out) I'll be very very happy. 27b is an amazing size for consumer use

Reply

[-]

AlgorithmicKing@reddit

does it have vision capabilities like pali gemma

Reply

[-]

Master-Meal-77@reddit

Interesting. I hope it’s not garbage

Reply

[-]

IndicationUnfair7961@reddit

It will probably be great at some stuff, but refuse to answer simple questions because half a token seems offensive 😂

Reply

[-]

sky-syrup@reddit

Competition‘s getting really stiff lately. I remember when Falcon 180b came out and everybody was impressed, it was the only thing ppl talked about, even though like 4 people could run it and it sucked. Now half the new releases don’t even get llama.cpp support (the mark of approval basically imo) because they just don’t stack up. It’s great.

Reply

[-]

MoffKalast@reddit

What's really worth looking forward to is when/if Mistral one ups llama-3.

Reply

[-]

Able-Locksmith-1979@reddit

There’s little space for mistral to one up llama3, considering that wizardlm 2 is still being detoxed. Basically OpenAI has to remain the ms flagship, open source was fun when the difference was great, or is fun with small models. But I don’t think we will large models coming from ms until the gap is big enough to place some other open source models in there

Reply

[-]

Key_Run8379@reddit

gemini is worst paid ai model so don't expect a lot from gemma

Reply

[-]

Spindelhalla_xb@reddit

I mean there’s not a great deal of room to go much lower at the moment with Gemma. The first release was not great.

Reply

[-]

pseudonerv@reddit

It does not inspire confidence for such a small model to be announced but not released. Think back, how many of the open weight models got announced but not released immediately?

Reply

[-]

AnticitizenPrime@reddit

It's still in pretraining, like Llama3's 400b model. The benchmarks are from the latest checkmark. They announced it today because today was the I/O conference. https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/ >Gemma 2 is still pretraining. This chart shows performance from the latest Gemma 2 checkpoint along with benchmark pretraining metrics.

Reply

[-]

toothpastespiders@reddit

I'll give them some props for an actual date instead of the usual marketing blurbs of "the coming months".

Reply

[-]

OkQuietGuys@reddit

I'm expecting nothing but miracles from a zombie hedge fund packed with engineers whose time is 90% dedicated to avoiding HR.

Reply

Google Gemma 2 27B will be released in June

Reply to Post

23 Comments

CatalyticDragon@reddit

candre23@reddit

Calcidiol@reddit

rawednylme@reddit

candre23@reddit

rerri@reddit

CesarBR_@reddit

candre23@reddit

IndicationUnfair7961@reddit

Eralyon@reddit

lemon07r@reddit

AlgorithmicKing@reddit

Master-Meal-77@reddit

IndicationUnfair7961@reddit

sky-syrup@reddit

MoffKalast@reddit

Able-Locksmith-1979@reddit

Key_Run8379@reddit

Spindelhalla_xb@reddit

pseudonerv@reddit

AnticitizenPrime@reddit

toothpastespiders@reddit

OkQuietGuys@reddit