TheaterFire

Google Gemma 2 27B will be released in June

Posted by hackerllama@reddit | LocalLLaMA | View on Reddit | 23 comments

Reply to Post

23 Comments

CatalyticDragon@reddit

27B, what sort of VRAM usage are we looking at here?
View on Reddit #26489179

candre23@reddit

Here's a [janky frankenMoE with 27.9b params.](https://huggingface.co/mradermacher/Knight-Miqu-27B-MoE-GGUF/tree/main) Should give you a good idea of file size at various bpw.
View on Reddit #26506526

Calcidiol@reddit

32GB or 24GB would very likely be very nice (assuming one can quantize their model to Q8/Q7/Q6 nicely). 20G should similarly be expected to be usefully good as above with some more moderate quantization. 16GBy + maybe some RAM offload should be possible and still reasonably fast / usefully good assuming one can get useful quality with a ~Q4 quantization of the particular model.
View on Reddit #26491184

rawednylme@reddit

Is anyone really interested in anything from Google? They haven't exactly been hitting out the park lately.
View on Reddit #26478479

candre23@reddit

They've been making slow but steady progress, though. Bard was clown shoes at launch, and never really got *good*, but it did steadily improve to the point where it wasn't a complete embarrassment. Gemini was a big step up from that, and gemini pro right now trades blows with GPT4 in certain tasks. You got to remember, there's a bit of a desert for mid-size hobbyist models. There's command-R 35b, but it lacks GQA so it's basically unusable if you want more than maybe 2-4k context. There are a few Chinese models like Yi, internLM, and Qwen, but they're very... Chinese. Weird and janky implementation and trained so heavily on Chinese data that their English performance suffers quite a bit. Yi has proved more or less impossible to tame, even with extensive finetuning. Nobody wants to risk throwing more effort into Chinese models when the return is likely to be the same. Currently, there's *nothing new or interesting* between ~7b models at the low end and 70b models at the high end except mixtral - which still won't fit into 24GB of VRAM without resorting to a small, crappy quant. So when a big western company who has been doing some decent (if not spectacular) LLM work lately says they're about to publish weights for a mid-sized model that will fit into 24GB easily or even 16GB at a squeeze, people are right to get excited about it. Maybe it falls flat on its face, but realistically, what else does anybody have to look forward to in this category?
View on Reddit #26506044

rerri@reddit

https://preview.redd.it/ph82x81aig0d1.png?width=1999&format=png&auto=webp&s=70e049e8c0b745b9e56d32167fa0d724ea2c752c Not bad at all. Still pretraining according to the article. [https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/](https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/)
View on Reddit #26465881

CesarBR_@reddit

If this benchmark translates into actual performance, that would be a great model for local use. Q6 at ≈ 20GB and Q4 at ≈ 13,5GB seems like a sweet spot for 24GB and 16GB GPUs respectively.
View on Reddit #26479313

candre23@reddit

That's going to depend a lot on context efficiency. You *wouldn't think* any competent company would release a model without GQA in this day and age, but command-R 35b shows that there's still plenty of bad-decision-making going around.
View on Reddit #26505234

IndicationUnfair7961@reddit

If it's not censored like crazy it could be a good model for those without the requirements to run LLAMA3 70B.
View on Reddit #26472047

Eralyon@reddit

Just pray the uncensoring gods to apply their magic...
View on Reddit #26475662

lemon07r@reddit

If it can beat yi34b and qwen 1.5 32b (and phi 14b when it's out) I'll be very very happy. 27b is an amazing size for consumer use
View on Reddit #26500940

AlgorithmicKing@reddit

does it have vision capabilities like pali gemma
View on Reddit #26497738

Master-Meal-77@reddit

Interesting. I hope it’s not garbage
View on Reddit #26457475

IndicationUnfair7961@reddit

It will probably be great at some stuff, but refuse to answer simple questions because half a token seems offensive 😂
View on Reddit #26471951

sky-syrup@reddit

Competition‘s getting really stiff lately. I remember when Falcon 180b came out and everybody was impressed, it was the only thing ppl talked about, even though like 4 people could run it and it sucked. Now half the new releases don’t even get llama.cpp support (the mark of approval basically imo) because they just don’t stack up. It’s great.
View on Reddit #26461513

MoffKalast@reddit

What's really worth looking forward to is when/if Mistral one ups llama-3.
View on Reddit #26461514

Able-Locksmith-1979@reddit

There’s little space for mistral to one up llama3, considering that wizardlm 2 is still being detoxed. Basically OpenAI has to remain the ms flagship, open source was fun when the difference was great, or is fun with small models. But I don’t think we will large models coming from ms until the gap is big enough to place some other open source models in there
View on Reddit #26464846

Key_Run8379@reddit

gemini is worst paid ai model so don't expect a lot from gemma
View on Reddit #26460592

Spindelhalla_xb@reddit

I mean there’s not a great deal of room to go much lower at the moment with Gemma. The first release was not great.
View on Reddit #26459721

pseudonerv@reddit

It does not inspire confidence for such a small model to be announced but not released. Think back, how many of the open weight models got announced but not released immediately?
View on Reddit #26469261

AnticitizenPrime@reddit

It's still in pretraining, like Llama3's 400b model. The benchmarks are from the latest checkmark. They announced it today because today was the I/O conference. https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/ >Gemma 2 is still pretraining. This chart shows performance from the latest Gemma 2 checkpoint along with benchmark pretraining metrics.
View on Reddit #26471066

toothpastespiders@reddit

I'll give them some props for an actual date instead of the usual marketing blurbs of "the coming months".
View on Reddit #26464908

OkQuietGuys@reddit

I'm expecting nothing but miracles from a zombie hedge fund packed with engineers whose time is 90% dedicated to avoiding HR.
View on Reddit #26464460