TheaterFire

Gemma time! What are your wishes ?

Posted by Specter_Origin@reddit | LocalLLaMA | View on Reddit | 145 comments

Gemma time! What are your wishes ?
Gamma 4 drops most likely tomorrow! what will it take to make it a good release for you?

Reply to Post

145 Comments

Another__one@reddit

1-bit-120B-sparse-CPU-friendly-continious-learning-omni model that beats all the benchmarks imaginable. Also TurboQuant optimizations from the box, obviously.
View on Reddit #82229095

SpicyWangz@reddit

Just ask opus 4.6 to make it for you with no mistakes
View on Reddit #82230324

Another__one@reddit

I tried, hitted token limits and now the bank wants their money back. However I have a great plan in mind, so a couple more AI spins and I am gonna pay off all my credits and make even more for sure!
View on Reddit #82230661

More-Curious816@reddit

That's on you bro. Why you didn't vibe code a saas with $10m in arr.
View on Reddit #82286912

lolwutdo@reddit

Make 1 million dollars, don’t mess up.
View on Reddit #82232993

brown2green@reddit

- Less preachy tone than Gemma 3 - Less stubborn training data filtering; no anti-swearword brainwashing like Gemma 1/2/3 - No stonewalling refusals like some of the recent releases from other companies - Quantization-aware training from the get-go - Improved vision even in soft tasks, illustrations, etc - Better long-context / multi-turn conversational capabilities - Performance greater than Qwen 3 in general tasks - Collaboration with character.AI for improving roleplay capabilities - Less sloppy outputs (Gemma 3 was pretty bad in this regard) - Not abandoning the consumer single-GPU segment with just either huge model sizes or tiny ones That's about what that would make it a good release for me, although I probably forgot something.
View on Reddit #82228511

the_mighty_skeetadon@reddit

Your wishes... Are granted (mostly)
View on Reddit #82280450

ELPascalito@reddit

Unfortunately Google is moving towards exactly the opposite of what you mentioned, they probably even need the new Gemma to have a better slzmm guard model for censorship, that's literally what GPT did with oss
View on Reddit #82231900

toothpastespiders@reddit

Yeah, a combination of the incident with Senator Blackburn and the recent success of heretic with Gemma 3 is my biggest concern about a possible Gemma 4. Wouldn't shock me if the combination of both made them double down on the guardrails. Which is a concern because I saw a significant amount of false positives with Gemma 3's alignment. That, but worse, is worrisome. Worst case scenario for me is Google pruning their training data rather than just trying to align the model away from wrongthink.
View on Reddit #82239910

Spara-Extreme@reddit

Gemini 3.1 pro could get pretty dark per the RP folks in ST so I’m betting gemma 4 is probably a bit looser in some regards then Gemma 3. That being said, it’s western corpo model so even though so it’s still going to be pretty safe.
View on Reddit #82240822

redditorialy_retard@reddit

It will decide whether we use it or keep using Qwen 3.5
View on Reddit #82247991

brown2green@reddit

I saw this screenshot elsewhere. This sort of response would have been impossible for Gemma 3 without extensive prompting. https://i.imgur.com/j7c0CDO.png
View on Reddit #82232729

Weird-Field6128@reddit

Okay this was pretty funny! Can't believe and LLM can say this! 😂
View on Reddit #82236223

CryptoUsher@reddit

agreed on the preachy tone, it's wild how much it fights you on basic stuff. if they actually fixed the refusal rate without just removing safety entirely, would you tolerate slightly weaker coding performance in exchange?
View on Reddit #82231396

brown2green@reddit

I've never used Gemma for coding; only cloud models for that. Most (all?) of Gemma 3's safety (which is weak and mostly surface-level) can be easily defeated just with prompting, but what works for that puts it in a "roleplay mode", which degrades response quality noticeably compared to when it works as the default assistant. But when it acts like the default assistant, most requests that can be construed as even vaguely "unsafe" are enough to trigger disclaimers, crisis hotlines or (weak) refusals, and it's just annoying for serious and legitimate uses. Other than that, something was done to the weights (in addition to extensive training data filtering, another issue) to make it almost impossible for Gemma to generate dirty words or profanities if you don't fill the context with them first. I wish they quit doing this since Gemini has no issue with them (though from tests with `significant-otter` on LM Arena it seems it might finally be the case. Dunno if they've been more lax with training data filtering as well).
View on Reddit #82232621

CryptoUsher@reddit

fwiw, even in roleplay mode, i've seen gemma 3 drop from like 80% code accuracy to 60% on simple scripts. not sure if that's the safety or just the mode messing with context.
View on Reddit #82246262

CryptoUsher@reddit

fair, i mainly use it for coding too so the roleplay mode kinda defeats the purpose. fwiw latest 13b runs decent on a 4090 if you quantize it right, but yeah the safety dance still sucks
View on Reddit #82239912

WhoRoger@reddit

That's what we have Heretic for (some of these)
View on Reddit #82238203

tiffanytrashcan@reddit

You need to try https://huggingface.co/Aleteian/Storyteller-gemma3-27B It's still hands down my favorite writing model. Primarily based on Big Tiger from The Drummer. The insane merge tree shoved more knowledge into it, as well as removing refusals. With slight prompting, this thing can be dark and fucked up. It will curse you out and never preach at you. The slop is still there, but re-rolling often provides a better result. [Mradermacher iMatrix GGUFs](https://huggingface.co/mradermacher/Storyteller-gemma3-27B-i1-GGUF)
View on Reddit #82230899

carnyzzle@reddit

second that I just want a release that isn't one tiny model and one huge model nobody at home can run lol
View on Reddit #82229445

VoiceApprehensive893@reddit

a 15b ish model that is better or equal to qwen 3.5
View on Reddit #82278822

hackerllama@reddit

🍿
View on Reddit #82228673

Think-Ad389@reddit

120a15 please!
View on Reddit #82242145

_-_David@reddit

Okay, I'm really curious about this. What hardware are you running? Because I'm having a hard time figuring out, who actually wants 15 billion activated parameters on a 120b parameter model. My DDR4 RAM has a hard enough time running something like 6 billion active parameters.
View on Reddit #82264938

durden111111@reddit

Really more about fitting as many paramters as possible in your vram+ram capacity while leverage faster generation from moe component. 120B A15 would be perfect in Q4 for those of us with 32GB vram ans 96GB ddr5 setups
View on Reddit #82275921

_-_David@reddit

Aren't your speeds at 15b active running on the CPU pretty miserable though? We all have different opinions on what is usable of course.
View on Reddit #82276487

durden111111@reddit

25-30tks is fast enough for me. Maybe more is needed if you require very long context coding
View on Reddit #82277162

_-_David@reddit

Ah, I tend to use local models and "free" tokens for things I wouldn't spend millions of tokens on otherwise. For example, I like studying new languages. And it takes a \*ton\* of tokens to have a pipeline that writes a story framework, spends time thinking about adding twists, breaks it down at the sentence level and alters vocab and grammar for my ability level, writes image generator prompts, analyzes the batch and chooses a favorite, suggests edits, logs decisions and model reflections during the process, runs an analysis of the logs for the full pipeline... 25 toks/sec would turn an overnight job into an overweekend job for something like that lol
View on Reddit #82277913

maschayana@reddit

128gb mac users want that
View on Reddit #82272920

_-_David@reddit

Interesting. It seems like there is a whole lot of overhead for something like that. I'd figure 128gb peeps would want something in the 180-220b range. Because these MoE's have pretty tiny, in relative terms, KV cache requirements. Seems like you'd load the model at something like 65gb-ish, then have more headroom than you'd want or need. Does it have to do with the Mac aspect? I can imagine the "shared memory" might mean you want that extra overhead for the OS, and other typical RAM-duty applications. Is that more or less it?
View on Reddit #82274235

Aaaaaaaaaeeeee@reddit

https://preview.redd.it/vodwoayrxosg1.jpeg?width=604&format=pjpg&auto=webp&s=1bbeb859a6ebcc83e06820caa0b062f48288d1b8 any hints, boss? What are you all working on?
View on Reddit #82237637

RandumbRedditor1000@reddit

I hope it's NOT a giant moe that the gpu poor cannot run. Hopefully we get another 27B dense model. I hope for better world knowledge and finetuneability.
View on Reddit #82234769

ElementNumber6@reddit

You're really going to let elite scalpers lobotomize all future LLMs? We need the absolute largest, most intelligent, most capable LLMs that are technically possible at any given time, but with distillations for more modest use cases.
View on Reddit #82260321

Emport1@reddit

This guy gets it, it's better to get more out of google that we can then distill
View on Reddit #82275753

Geritas@reddit

If you want absolute largest and smartest in hopes that HW will catch up then you shouldn't cheer for MoE, IMO. Why not 1T dense?
View on Reddit #82264737

ElementNumber6@reddit

Now we're talking
View on Reddit #82265913

2muchnet42day@reddit

This is Google. Either beat Qwen or prove America can't beat Gyna in the open model race.
View on Reddit #82275064

larrytheevilbunnie@reddit

I came
View on Reddit #82234893

FusionCow@reddit

I saw
View on Reddit #82274835

a_beautiful_rhind@reddit

They're gonna skimp on active parameters is my prediction but definitely now what I want.
View on Reddit #82271664

LMTLS5@reddit

april 1 👀
View on Reddit #82227913

VoiceApprehensive893@reddit

hard to believe since "significant-otter" has been on [arena.ai](http://arena.ai) for a while
View on Reddit #82268474

Cereal_Grapeist@reddit

I will sign Logan's gmail up for so much weird shit if this is a prank
View on Reddit #82230816

Cool-Chemical-5629@reddit

Better yet, write a sexy girl AI character, all with believable background and story, have Gemma 3 Heretic AI agent to adopt the character and have it send him some naughty emails. As soon as he takes the bait, send him another email "April Fool! What does it feel like to fall for your dear old Gemma 3? We hope you had some fun with her! 😍🥵👉👌"
View on Reddit #82233653

PunnyPandora@reddit

least gooner localllama user
View on Reddit #82260344

RetiredApostle@reddit

https://preview.redd.it/ak4wedsncosg1.png?width=713&format=png&auto=webp&s=230ca0a39601a57fa871a7b63e49346d0a548a5c Apr 2.
View on Reddit #82231339

RuiRdA@reddit

4:44 AM Crazy attention to details with this hype posts
View on Reddit #82231958

ABLPHA@reddit

Wait... 04.04 is in 2 days...
View on Reddit #82250272

kvothe5688@reddit

they probably have cron schedules
View on Reddit #82237989

ResidentPositive4122@reddit

Gmail launched with 1GB of storage (something HUGE at the time, most e-mail providers were 10MB, some were 100MB) on 1st of April as well. A lot of people thought it was a joke.
View on Reddit #82242611

pinkyellowneon@reddit

they're really committing to the hype cycle on it, and it would feel a little strange for them to make fun of their own release as a joke. i would assume they're being genuine
View on Reddit #82230797

Far_Insurance4191@reddit

It says april 2 to me
View on Reddit #82228956

Prestigious-Use5483@reddit

My first thought
View on Reddit #82228735

Specter_Origin@reddit (OP)

Doubt! That does not seem like a joke, although it did cross my mind
View on Reddit #82227974

Inevitable-Name-1701@reddit

Gemma was too small for non english languages. Pass
View on Reddit #82262562

Iory1998@reddit

I am afraid we all are gonna be disappointed. Maybe we will not see any medium-sized Gemma-4 model.
View on Reddit #82258469

Chaotic_Choila@reddit

Honestly my main wish is just better documentation and transparency around the training data. The models themselves are solid but figuring out what they're actually good at versus what they just appear to be good at takes forever. Better tooling for evaluation would be nice too. Right now it feels like everyone is reinventing the same benchmarking wheel. Some kind of standardized way to test against real world business scenarios would save so much time.
View on Reddit #82255880

brown2green@reddit

> and transparency around the training data Why would you even want that? The moment the training data becomes "transparent" (especially for a model from a company as large as Google), it has to cater to the lowest common denominator, because anybody with an axe to grind could find an excuse to get offended or find something legally actionable in it.
View on Reddit #82256419

QuackerEnte@reddit

Architectural novelties. Something no other OSS model does yet. Because I know the models will be outdated pretty fast in terms of capabilities, so at least architecture novelties can be used in future models by everyone.
View on Reddit #82255516

Cubow@reddit

i desperately need a 1b model
View on Reddit #82246568

ComplexType568@reddit

have you tried Qwen3.5 0.6B or 1.7B?
View on Reddit #82249397

Cubow@reddit

Qwen3.5 2b is too big and 0.8b doesnt seem notably better than Gemma 3 1b
View on Reddit #82253218

Specialist_Golf8133@reddit

honestly just want them to not nerf it this time. gemma 2 was solid until they lobotomized it with safety tuning. like give us the raw model and let people choose their own guardrails? the base weights are always more useful for fine-tuning anyway. what safety features are you actually hoping for vs dreading lol
View on Reddit #82242200

brown2green@reddit

> the base weights are always more useful for fine-tuning anyway This has not been the case for a good while (since early 2024?). As an individual you just don't have any chance anymore of competing with the post-training work done by the companies training the models: too much data/compute needed for an actually good finetune from scratch nowadays, unless you're training them on very narrow tasks.
View on Reddit #82252541

InsideElk6329@reddit

This guy is a joke now as a result of the gemini garbage
View on Reddit #82252414

Leflakk@reddit

Gemma are useless for coding so nothing from me
View on Reddit #82250946

spaceman_@reddit

Something that fits 8GB, something that fits 16GB, something that fits 32 and something that fits 64?
View on Reddit #82248669

Alone-Possibility398@reddit

april fool dude
View on Reddit #82248190

DeepOrangeSky@reddit

Well, they're not going to do it, but, if they put out a 70b dense model, I'd be pretty curious just how insanely strong it would be. I mean, Llama 70b came out before dinosaurs walked the earth, and the fine tunes/merges based on it are *still* considered some of the strongest writing models around to this day. So, given how strong Qwen3.5 27b was just now, and that this is Google, who are maybe the only crew that can put something out that punches even harder for its size, it makes me wonder just how strong a 70b dense model from them would be right now. Probably would be pretty crazy. Yea, "crazy slow", but still... And of course they could still put out all the normal expected models that all the coders want and all the usual MoE type of stuff. But having at least *one* really sick dense model, instead of none, would be really nice. Not sure why these companies seem to be so anti-variety in that way. Like I get that MoE is the future and all, not saying the it can't be 80/20 or 90/10 that way, but would be nice if one of these heavy hitters released a 70b dense or 120b dense once in a blue moon instead of just literally never doing it ever again and years going by and the ancient ones still being the strongest ones at chatting/writing/RPG/etc years after they came out.
View on Reddit #82236029

Spara-Extreme@reddit

Aren’t they rumored to be doing a 120b dense model ?
View on Reddit #82241076

ambient_temp_xeno@reddit

That was just me trying to manifest it into reality. 70/80b dense would be great.
View on Reddit #82248067

DeepOrangeSky@reddit

Nah I think they're saying the 120b is going to be an MoE, albeit not a super-sparse one. Like 120b with 15b active. (Should hopefully still be pretty dang cool and strong, but, whole different ballgame from a 120b dense, which would be insane. I use the Behemoth 123b dense fine-tune of Mistral 123b dense all the time btw, as my go-to model, pretty much every day, and it is easily the strongest local model I've ever used by a really big margin. And 123b is super old. If a really serious lab like Google made a dense model that big *now* it is crazy to think how strong it would be. It would be about as strong as writing as Claude, Gemini, Grok, GPT, etc. Might sound crazy, but, even those aren't dense models (huge MoEs, but fairly sparse. They might have significantly less than 120b active parameters, so, a current-times 120b dense from Google would actually be seriously strong at writing. Very slow, but, would be very, very cool. I think if they actually do a big dense one, which I don't think they will, they won't go *that* big for dense though. Probably they'll do another dense in the 24-32b size range, and no bigger, but if they do go bigger, than 70b, not 120b dense. 120b dense would be considered too weird and dense and old fashioned or something "a model made for nobody" or something (other than me, who would love to run it, lol). Anyway my posts tend to get too long so I'll stop rambling, but yea, their 120b is gonna be MoE sounds like.
View on Reddit #82241650

Spara-Extreme@reddit

Ahhh thats disappointing. I was hoping for a 120b dense, I've used Behemoth-X-Redux a ton in the past and its one of my favorite models.
View on Reddit #82242818

power97992@reddit

Lol it wont be better than the upcoming gemini 3.1 flash and glm 5.1, probably even worse than gem 3 flash and minimax M2.7 
View on Reddit #82247093

BelgianDramaLlama86@reddit

Better at RP/creative writing, mainly. Other things are icing on the cake, but the soft skills are what Gemma 3 was most known for, that's where the focus should be now too.
View on Reddit #82246581

vladlearns@reddit

4:44 AM https://preview.redd.it/32ykctcb2qsg1.png?width=713&format=png&auto=webp&s=3d2216c0a2b84c678e3886769c2ba55d75314efd YEEEEEEEEEEEESSSSSSSSSSSSSSSSSSSSSSSSSSSSS
View on Reddit #82244990

coder543@reddit

I want an extreme sparsity 175B A3B model in Q4 QAT with text+image+audio input and text+image+audio output.
View on Reddit #82228962

CardNorth7207@reddit

With 2 million context length
View on Reddit #82242905

JorG941@reddit

A man can only dream
View on Reddit #82235358

Far-Low-4705@reddit

Hopefully multimodal (vision + text), reasoning, and tool calling, again with QAT. That’s basically the minimum to compete against qwen…
View on Reddit #82241933

qwen_next_gguf_when@reddit

Less censored.
View on Reddit #82229347

FinBenton@reddit

We have super good uncensoring stuff now lile the hauhau and heretic, wouldnt matter too much.
View on Reddit #82241846

pigeon57434@reddit

with how sophisticated heretic is these days its honestly not a big deal but obviously its better if its just out of the box less censored
View on Reddit #82238542

TopChard1274@reddit

A 7b model to run a q4\_k on my iPad. 8b is already a stretch. 7b is the most that wouldn’t crash the app upon importing. Right now I run a 4b qwe3.5 q6\_k variant on 32,000 context size. The dev made a pocketpal update with better suport for qwen3.5 and now the max context window I can run on iPad has basically doubled. So yeah, a 7b would be perfect for my needs.
View on Reddit #82241840

Orbiting_Monstrosity@reddit

To never see or hear the words "dust motes" again.
View on Reddit #82241728

m3kw@reddit

Gemma suck though
View on Reddit #82241361

Rich_Artist_8327@reddit

It needs to be little larger like 32B and 20%,better in every aspect as gemma3 then I love it.
View on Reddit #82241240

emteedub@reddit

omnipotence
View on Reddit #82241168

dobomex761604@reddit

1 million context and low (like Mistral 7b) censorship.
View on Reddit #82241167

chikengunya@reddit

120B model
View on Reddit #82228280

coder543@reddit

175B - 200B would be great. 120B is an awkward size when people are typically choosing between machines with <32GB of VRAM or 128GB of VRAM. 175B would make better use of 128GB of VRAM, and 120B isn't going to fit in <32GB anyways.
View on Reddit #82230621

Spara-Extreme@reddit

96GB users rejoice !
View on Reddit #82240950

gnnr25@reddit

That we would also get Gemma 4n so that smaller models can punch above their weight.
View on Reddit #82239501

Weird-Field6128@reddit

Okay! Idk about Gemma, but i find these models pretty useless, sorry but does anyone actually use them? And if so where? Also to be honest when I was using comfyUi i took the readymade workflow and in that I saw it was using gemma models and that is the only use i saw in encoding user queries aka prompts for the image/ video models Anything else people use these models for I would like to know. Maybe i am looking at them in the wrong way
View on Reddit #82236363

KageYume@reddit

Gemma3 is great at translation. Its 27B QAT was BiS for Japanese translation for 24GB VRAM class for a while.
View on Reddit #82236990

Weird-Field6128@reddit

Thank you so much honestly i did not know about this
View on Reddit #82239033

ttkciar@reddit

I'd mainly like to see three things: * A dense model in the 24B-to-32B range. Their traditional 27B is perfect. Whatever other sizes they release is just gravy. * All the soft-skills competence we've come to love about Gemma3, but better than Gemma3, * TheDrummer rolling out another Big Tiger anti-sycophancy fine-tune! Some nice-to-haves: * Less rapid long-context competence drop-off, * Longer context limit, * A larger model, like a 120B-A15B MoE or 72B dense, * Documentation tweak admitting that system prompts are supported. Gemma2 and Gemma3 both work great with system prompts, but people keep insisting they don't because the Gemma documentation and official prompt template say so.
View on Reddit #82236303

DeepOrangeSky@reddit

>24B-to-32B range >72B dense Yes, please. Although, particularly the 70-72b dense, even more so than the 24-32b dense. (The reason I say this isn't out of selfishness that I have enough vram to run it, rather, it's that Qwen3.5 27b dense just came out and is super strong (meaning maybe Google's edge over it might not be that huge, although, then again, it's Google, so who knows). Whereas we haven't had a super strong 70b dense model in forever, so the quality jump over what currently exists for that would potentially be really big. I don't know, I'm curious, what do you think, regarding the ~27b dense model, do you think it would still somehow be a lot stronger than even Qwen3.5 27b, or only slightly stronger/similar strength? I mean, I'm pretty new here, but from what I gather, Gemma3 27b was like crazy strong for its size when it came out (more so than even Qwen3.5 27b was relative to the current crop, maybe?).
View on Reddit #82236894

ttkciar@reddit

> \> particularly the 70-72b dense, even more so than the 24-32b dense Yeah :-) both would be great! But I prioritize the 27B for entirely selfish reasons, as that would fit on my 32GB MI50 for fast inference. A 72B dense would definitely be a nice-to-have, but I'm still figuring out the limitations of K2-V2-Instruct, LLM360's 72B dense. It's a **very** clever model, and if Google doesn't give us a large dense Gemma4, we might be able to distill the rumored 120B-A15B into K2-V2-Instruct to get a decent approximation. > \> I don't know, I'm curious, what do you think, regarding the ~27b dense model, do you think it would still somehow be a lot stronger than even Qwen3.5 27b, or only slightly stronger/similar strength? IMO the main strength of Gemma3 wasn't that it did any particular thing best, but rather that it did everything "well enough". Qwen3 had some excellent inference skills, but there were gaps in those skills where it was weak, like editing (rewriting), self-critique, geopolitics, RAG, Theory-of-Mind, and Evol-Instruct. That didn't impact its popularity much, though, because those are somewhat niche skills that most users don't care about. That having been said, Qwen3.5 has closed those gaps. When I evaluated Qwen3.5-27B it exhibited all of those missing skills, and its competence at many of them surpassed Gemma3's. The question is, while Qwen has caught up with Gemma3's diversity of skills, has Google been sitting still? Or will Gemma4 exhibit as much improvement over Gemma3 as Gemma3 did over Gemma2?
View on Reddit #82238860

pigeon57434@reddit

omnimodal
View on Reddit #82238481

MerePotato@reddit

Omnimodality
View on Reddit #82237955

WhoRoger@reddit

r/skamtebord
View on Reddit #82237781

fyvehell@reddit

1. That this is not an April Fools joke 2. That if they also release a bigger model, they also keep the current sizes too so that more people can have a chance to run these models That is all.
View on Reddit #82235431

llmentry@reddit

Yep. I think we're mostly hoping that Googs doesn't try to fix what isn't broken. Really hoping Gemma4 doesn't turn out to be nothing more than NicheSpecialityGemma\_300M :/
View on Reddit #82237367

dtdisapointingresult@reddit

A 400B A20B model, natively trained at 4-bit like GPT-OSS was, that's basically perfect for people with 128GB memory.
View on Reddit #82237159

ForsookComparison@reddit

Something dense
View on Reddit #82236779

c--b@reddit

Unsloth support day one.
View on Reddit #82236761

RickyRickC137@reddit

RP. Gemma 3 has the best prose out of all the open source models (even till date). The creativity was its strength when it came out.
View on Reddit #82235541

Yu2sama@reddit

Better license for finetuners ( though I doubt is gonna happen) I would be happy if it just gets better at creative writing.
View on Reddit #82233871

MiyamotoMusashi7@reddit

If this is an april fools joke I will crash tf out
View on Reddit #82228295

Hans-Wermhatt@reddit

So excited, I’m ready to be let down. 
View on Reddit #82233863

celsowm@reddit

**Gemma 4 got 99% on ARC-AGI 3 !!!** >!April Fool!<
View on Reddit #82233815

KageYume@reddit

・27B dense or 35B MoE (can run on 24GB of VRAM) ・Reasoning can be turned on or off easily ・Better Japanese - English translation capability than equivalent size Qwen 3.5 even with reasoning turned off (Gemm3 was BiS for a long time). ・Better world knowledge than equivalent size Qwen 3.5 ・Better tool calling and instruction-following than equivalent size Qwen 3.5
View on Reddit #82230586

nickm_27@reddit

Really hoping for a MOE model between 20B and 35B
View on Reddit #82232593

War3Z@reddit

Honestly my biggest wish is better long-context reliability without massive VRAM requirements. Gemma (and others) are getting really good, but once you push longer contexts or real-world workflows, things still get shaky or expensive fast.
View on Reddit #82232264

triynizzles1@reddit

A few google models we’re available on LM Arena, one claiming to be unnamed made by Google and another claiming to be Gemma 4. Under the names Colosseum-1p3 and significant-otter. Colosseum-1p3 seemed very intelligent but refused to do any coding… which was odd. Based on the name I’m assuming it’s a small edge model. significant-otter self identified as Gemma 4 and sounded quite smart. It was decent with coding. Both appear to have an early 2025 knowledge cutoff (both models correctly said trump was president.) Both models responded right after pressing send, indicating they are not reasoning models. I don’t know if both models are still available to text on lm arena but it looks like the release is soon. I am most looking forward to an updated, recent knowledge cutoff.
View on Reddit #82229871

ELPascalito@reddit

That's Google goal after all, Gemma is meant to be on edge AI, lightweight and production ready aka "guardrails"
View on Reddit #82232106

baseketball@reddit

Please be something good VRAM peasants can run.
View on Reddit #82231538

ab2377@reddit

if its 4b is better than qwen3.5 4b, that will be amazing & crazy.
View on Reddit #82231176

Full_Outcome_6289@reddit

https://preview.redd.it/wgqgxq7t3osg1.jpeg?width=800&format=pjpg&auto=webp&s=1658f12394e35b29cb0195aed26086b1fb27d2d0 yes pls 80b-20b moe
View on Reddit #82228918

rm-rf-rm@reddit

A20b for 80b total? Thats not as sparse as the SOTA.. (see A17b-397b in Qwen3.5)
View on Reddit #82229877

Full_Outcome_6289@reddit

My computer can run this model. 20b is a pretty smart model, and I think 80b is quite sophisticated. But I don't know about the standards for how many active parameters are typically used for MoE models relative to the general parameters.
View on Reddit #82230614

SpicyWangz@reddit

Yeah I think something like 80ba8b would be way more interesting to see
View on Reddit #82230282

Specter_Origin@reddit (OP)

and a 40b-A5b, may be xD
View on Reddit #82228965

Opening-Ad6258@reddit

Jost hope it runs well on my machine
View on Reddit #82230508

TheRealMasonMac@reddit

Only thing I want is a fucking base model. Am going to be seriously pissed if they got on the train of not releasing it.
View on Reddit #82229373

Recoil42@reddit

Improved agent/tool architectures would be a big one. This is an area where Google needs to focus for the SWE effort so I hope they do.
View on Reddit #82228421

EbbNorth7735@reddit

I think that's a guarantee to be honest. It's the one thing all of the latest models are targeting. 
View on Reddit #82228613

Recoil42@reddit

Yep, but it's also something Gemini Pro in particular is astonishingly bad at. Current 3.1 is a brilliant general-use model that acts like a bumbling novel laureate professor who frequently loses his glasses and forgets lesson plans. Amazing at doing complex one-shot work, terrible in long loops.
View on Reddit #82228870

masterlafontaine@reddit

Perfect description
View on Reddit #82229289

Mochila-Mochila@reddit

No censorship 😒
View on Reddit #82229201

random_boy8654@reddit

Any good dense model like 14B or moe 40b a3b type
View on Reddit #82229137

Kahvana@reddit

Not another meaningless tweet.
View on Reddit #82227935

Recoil42@reddit

The Logan tweets are pretty reliable. Pretty much always means a model release same-week.
View on Reddit #82228349

chikengunya@reddit

I think gemma2 and gemma3 were each released on a Wednesday/Thursday, so today or tomorrow would fit...
View on Reddit #82228948

Terminator857@reddit

Amazing how much this hype train works
View on Reddit #82228236

Technical-Earth-3254@reddit

Thinking, at least one large dense model (>100b) and ideally native 4 bit for all models.
View on Reddit #82228889

Specter_Origin@reddit (OP)

I will go first: I want to see a small diffusion based model for experimentation
View on Reddit #82228882

Specter_Origin@reddit (OP)

Mine is model between: 28-40b dense adn moe
View on Reddit #82227942

Prestigious-Use5483@reddit

Can't recall if it was a rumor or real, but I think they had models up to 4B, then the next model after 4B was 120B.
View on Reddit #82228844

aeqri@reddit

Anything but another RNN/hybrid model that needs to reprocess the entire context when you edit or remove even a single token from the very end of it.
View on Reddit #82228830

Revolutionary_Loan13@reddit

Faster tps
View on Reddit #82228802

Double_Cause4609@reddit

Parscale or Loop Transformers on a dense backbone / shared expert, with a residual super low active parameter count MoE that can be offloaded to system RAM or even streamed from NVMe. Some extension of the weird residual contribution of Gemma 3N for even more sparse parameter loading. Engram (or equivalent sparse embedding contribution). Aggressive QAT, in the sub 3bit range. Tbh, something like... A 400B A53B, where the first 50B activated parameters are Parscale/Looped Transformer, and the remaining conditional 350B A3B is conditional MoE params, with a 2bit QAT would be ideal for my hardware, personally. It'd perform roughly like an \~80B dense in hard reasoning (with a parscale rate of around 8-12 parallel requests), while still having the MoE params for rare sequence memorization and general knowledge base. Plus it'd run on about 12.5GB of VRAM (for all the shared parameters), and the active count would be so low that a CPU would be perfectly comfortable to run it (even if one didn't have enough system RAM and had to stream the experts from NVMe.
View on Reddit #82228629

5dtriangles201376@reddit

Good world awareness for the size and open license or at the bare minimum something like nvidia open where the outputs aren't Google's problem
View on Reddit #82227987

LMTLS5@reddit

120b moe
View on Reddit #82227875