Gemma time! What are your wishes ?

[-]

Another__one@reddit

1-bit-120B-sparse-CPU-friendly-continious-learning-omni model that beats all the benchmarks imaginable. Also TurboQuant optimizations from the box, obviously.

Reply

[-]

SpicyWangz@reddit

Just ask opus 4.6 to make it for you with no mistakes

Reply

[-]

Another__one@reddit

I tried, hitted token limits and now the bank wants their money back. However I have a great plan in mind, so a couple more AI spins and I am gonna pay off all my credits and make even more for sure!

Reply

[-]

More-Curious816@reddit

That's on you bro. Why you didn't vibe code a saas with $10m in arr.

Reply

[-]

lolwutdo@reddit

Make 1 million dollars, don’t mess up.

Reply

[-]

- Less preachy tone than Gemma 3 - Less stubborn training data filtering; no anti-swearword brainwashing like Gemma 1/2/3 - No stonewalling refusals like some of the recent releases from other companies - Quantization-aware training from the get-go - Improved vision even in soft tasks, illustrations, etc - Better long-context / multi-turn conversational capabilities - Performance greater than Qwen 3 in general tasks - Collaboration with character.AI for improving roleplay capabilities - Less sloppy outputs (Gemma 3 was pretty bad in this regard) - Not abandoning the consumer single-GPU segment with just either huge model sizes or tiny ones That's about what that would make it a good release for me, although I probably forgot something.

Reply

[-]

the_mighty_skeetadon@reddit

Your wishes... Are granted (mostly)

Reply

[-]

ELPascalito@reddit

Unfortunately Google is moving towards exactly the opposite of what you mentioned, they probably even need the new Gemma to have a better slzmm guard model for censorship, that's literally what GPT did with oss

Reply

[-]

toothpastespiders@reddit

Yeah, a combination of the incident with Senator Blackburn and the recent success of heretic with Gemma 3 is my biggest concern about a possible Gemma 4. Wouldn't shock me if the combination of both made them double down on the guardrails. Which is a concern because I saw a significant amount of false positives with Gemma 3's alignment. That, but worse, is worrisome. Worst case scenario for me is Google pruning their training data rather than just trying to align the model away from wrongthink.

Reply

[-]

Spara-Extreme@reddit

Gemini 3.1 pro could get pretty dark per the RP folks in ST so I’m betting gemma 4 is probably a bit looser in some regards then Gemma 3. That being said, it’s western corpo model so even though so it’s still going to be pretty safe.

Reply

[-]

redditorialy_retard@reddit

It will decide whether we use it or keep using Qwen 3.5

Reply

[-]

brown2green@reddit

I saw this screenshot elsewhere. This sort of response would have been impossible for Gemma 3 without extensive prompting. https://i.imgur.com/j7c0CDO.png

Reply

[-]

Weird-Field6128@reddit

Okay this was pretty funny! Can't believe and LLM can say this! 😂

Reply

[-]

CryptoUsher@reddit

agreed on the preachy tone, it's wild how much it fights you on basic stuff. if they actually fixed the refusal rate without just removing safety entirely, would you tolerate slightly weaker coding performance in exchange?

Reply

[-]

brown2green@reddit

I've never used Gemma for coding; only cloud models for that. Most (all?) of Gemma 3's safety (which is weak and mostly surface-level) can be easily defeated just with prompting, but what works for that puts it in a "roleplay mode", which degrades response quality noticeably compared to when it works as the default assistant. But when it acts like the default assistant, most requests that can be construed as even vaguely "unsafe" are enough to trigger disclaimers, crisis hotlines or (weak) refusals, and it's just annoying for serious and legitimate uses. Other than that, something was done to the weights (in addition to extensive training data filtering, another issue) to make it almost impossible for Gemma to generate dirty words or profanities if you don't fill the context with them first. I wish they quit doing this since Gemini has no issue with them (though from tests with `significant-otter` on LM Arena it seems it might finally be the case. Dunno if they've been more lax with training data filtering as well).

Reply

[-]

CryptoUsher@reddit

fwiw, even in roleplay mode, i've seen gemma 3 drop from like 80% code accuracy to 60% on simple scripts. not sure if that's the safety or just the mode messing with context.

Reply

[-]

CryptoUsher@reddit

fair, i mainly use it for coding too so the roleplay mode kinda defeats the purpose. fwiw latest 13b runs decent on a 4090 if you quantize it right, but yeah the safety dance still sucks

Reply

[-]

WhoRoger@reddit

That's what we have Heretic for (some of these)

Reply

[-]

tiffanytrashcan@reddit

You need to try https://huggingface.co/Aleteian/Storyteller-gemma3-27B It's still hands down my favorite writing model. Primarily based on Big Tiger from The Drummer. The insane merge tree shoved more knowledge into it, as well as removing refusals. With slight prompting, this thing can be dark and fucked up. It will curse you out and never preach at you. The slop is still there, but re-rolling often provides a better result. [Mradermacher iMatrix GGUFs](https://huggingface.co/mradermacher/Storyteller-gemma3-27B-i1-GGUF)

Reply

[-]

carnyzzle@reddit

second that I just want a release that isn't one tiny model and one huge model nobody at home can run lol

Reply

[-]

VoiceApprehensive893@reddit

a 15b ish model that is better or equal to qwen 3.5

Reply

[-]

hackerllama@reddit

🍿

Reply

[-]

Think-Ad389@reddit

120a15 please!

Reply

[-]

_-_David@reddit

Okay, I'm really curious about this. What hardware are you running? Because I'm having a hard time figuring out, who actually wants 15 billion activated parameters on a 120b parameter model. My DDR4 RAM has a hard enough time running something like 6 billion active parameters.

Reply

[-]

durden111111@reddit

Really more about fitting as many paramters as possible in your vram+ram capacity while leverage faster generation from moe component. 120B A15 would be perfect in Q4 for those of us with 32GB vram ans 96GB ddr5 setups

Reply

[-]

_-_David@reddit

Aren't your speeds at 15b active running on the CPU pretty miserable though? We all have different opinions on what is usable of course.

Reply

[-]

durden111111@reddit

25-30tks is fast enough for me. Maybe more is needed if you require very long context coding

Reply

[-]

_-_David@reddit

Ah, I tend to use local models and "free" tokens for things I wouldn't spend millions of tokens on otherwise. For example, I like studying new languages. And it takes a \*ton\* of tokens to have a pipeline that writes a story framework, spends time thinking about adding twists, breaks it down at the sentence level and alters vocab and grammar for my ability level, writes image generator prompts, analyzes the batch and chooses a favorite, suggests edits, logs decisions and model reflections during the process, runs an analysis of the logs for the full pipeline... 25 toks/sec would turn an overnight job into an overweekend job for something like that lol

Reply

[-]

maschayana@reddit

128gb mac users want that

Reply

[-]

_-_David@reddit

Interesting. It seems like there is a whole lot of overhead for something like that. I'd figure 128gb peeps would want something in the 180-220b range. Because these MoE's have pretty tiny, in relative terms, KV cache requirements. Seems like you'd load the model at something like 65gb-ish, then have more headroom than you'd want or need. Does it have to do with the Mac aspect? I can imagine the "shared memory" might mean you want that extra overhead for the OS, and other typical RAM-duty applications. Is that more or less it?

Reply

[-]

Aaaaaaaaaeeeee@reddit

https://preview.redd.it/vodwoayrxosg1.jpeg?width=604&format=pjpg&auto=webp&s=1bbeb859a6ebcc83e06820caa0b062f48288d1b8 any hints, boss? What are you all working on?

Reply

[-]

RandumbRedditor1000@reddit

I hope it's NOT a giant moe that the gpu poor cannot run. Hopefully we get another 27B dense model. I hope for better world knowledge and finetuneability.

Reply

[-]

ElementNumber6@reddit

You're really going to let elite scalpers lobotomize all future LLMs? We need the absolute largest, most intelligent, most capable LLMs that are technically possible at any given time, but with distillations for more modest use cases.

Reply

[-]

Emport1@reddit

This guy gets it, it's better to get more out of google that we can then distill

Reply

[-]

Geritas@reddit

If you want absolute largest and smartest in hopes that HW will catch up then you shouldn't cheer for MoE, IMO. Why not 1T dense?

Reply

[-]

ElementNumber6@reddit

Now we're talking

Reply

[-]

2muchnet42day@reddit

This is Google. Either beat Qwen or prove America can't beat Gyna in the open model race.

Reply

[-]

larrytheevilbunnie@reddit

I came

Reply

[-]

FusionCow@reddit

I saw

Reply

[-]

a_beautiful_rhind@reddit

They're gonna skimp on active parameters is my prediction but definitely now what I want.

Reply

[-]

LMTLS5@reddit

april 1 👀

Reply

[-]

VoiceApprehensive893@reddit

hard to believe since "significant-otter" has been on [arena.ai](http://arena.ai) for a while

Reply

[-]

Cereal_Grapeist@reddit

I will sign Logan's gmail up for so much weird shit if this is a prank

Reply

[-]

Cool-Chemical-5629@reddit

Better yet, write a sexy girl AI character, all with believable background and story, have Gemma 3 Heretic AI agent to adopt the character and have it send him some naughty emails. As soon as he takes the bait, send him another email "April Fool! What does it feel like to fall for your dear old Gemma 3? We hope you had some fun with her! 😍🥵👉👌"

Reply

[-]

PunnyPandora@reddit

least gooner localllama user

Reply

[-]

RetiredApostle@reddit

https://preview.redd.it/ak4wedsncosg1.png?width=713&format=png&auto=webp&s=230ca0a39601a57fa871a7b63e49346d0a548a5c Apr 2.

Reply

[-]

RuiRdA@reddit

4:44 AM Crazy attention to details with this hype posts

Reply

[-]

ABLPHA@reddit

Wait... 04.04 is in 2 days...

Reply

[-]

kvothe5688@reddit

they probably have cron schedules

Reply

[-]

ResidentPositive4122@reddit

Gmail launched with 1GB of storage (something HUGE at the time, most e-mail providers were 10MB, some were 100MB) on 1st of April as well. A lot of people thought it was a joke.

Reply

[-]

pinkyellowneon@reddit

they're really committing to the hype cycle on it, and it would feel a little strange for them to make fun of their own release as a joke. i would assume they're being genuine

Reply

[-]

Far_Insurance4191@reddit

It says april 2 to me

Reply

[-]

Prestigious-Use5483@reddit

My first thought

Reply

[-]

Specter_Origin@reddit (OP)

Doubt! That does not seem like a joke, although it did cross my mind

Reply

[-]

Inevitable-Name-1701@reddit

Gemma was too small for non english languages. Pass

Reply

[-]

Iory1998@reddit

I am afraid we all are gonna be disappointed. Maybe we will not see any medium-sized Gemma-4 model.

Reply

[-]

Chaotic_Choila@reddit

Honestly my main wish is just better documentation and transparency around the training data. The models themselves are solid but figuring out what they're actually good at versus what they just appear to be good at takes forever. Better tooling for evaluation would be nice too. Right now it feels like everyone is reinventing the same benchmarking wheel. Some kind of standardized way to test against real world business scenarios would save so much time.

Reply

[-]

brown2green@reddit

> and transparency around the training data Why would you even want that? The moment the training data becomes "transparent" (especially for a model from a company as large as Google), it has to cater to the lowest common denominator, because anybody with an axe to grind could find an excuse to get offended or find something legally actionable in it.

Reply

[-]

QuackerEnte@reddit

Architectural novelties. Something no other OSS model does yet. Because I know the models will be outdated pretty fast in terms of capabilities, so at least architecture novelties can be used in future models by everyone.

Reply

[-]

Cubow@reddit

i desperately need a 1b model

Reply

[-]

ComplexType568@reddit

have you tried Qwen3.5 0.6B or 1.7B?

Reply

[-]

Cubow@reddit

Qwen3.5 2b is too big and 0.8b doesnt seem notably better than Gemma 3 1b

Reply

[-]

Specialist_Golf8133@reddit

honestly just want them to not nerf it this time. gemma 2 was solid until they lobotomized it with safety tuning. like give us the raw model and let people choose their own guardrails? the base weights are always more useful for fine-tuning anyway. what safety features are you actually hoping for vs dreading lol

Reply

[-]

brown2green@reddit

> the base weights are always more useful for fine-tuning anyway This has not been the case for a good while (since early 2024?). As an individual you just don't have any chance anymore of competing with the post-training work done by the companies training the models: too much data/compute needed for an actually good finetune from scratch nowadays, unless you're training them on very narrow tasks.

Reply

[-]

InsideElk6329@reddit

This guy is a joke now as a result of the gemini garbage

Reply

[-]

Leflakk@reddit

Gemma are useless for coding so nothing from me

Reply

[-]

spaceman_@reddit

Something that fits 8GB, something that fits 16GB, something that fits 32 and something that fits 64?

Reply

[-]

Alone-Possibility398@reddit

april fool dude

Reply

[-]

DeepOrangeSky@reddit

Well, they're not going to do it, but, if they put out a 70b dense model, I'd be pretty curious just how insanely strong it would be. I mean, Llama 70b came out before dinosaurs walked the earth, and the fine tunes/merges based on it are *still* considered some of the strongest writing models around to this day. So, given how strong Qwen3.5 27b was just now, and that this is Google, who are maybe the only crew that can put something out that punches even harder for its size, it makes me wonder just how strong a 70b dense model from them would be right now. Probably would be pretty crazy. Yea, "crazy slow", but still... And of course they could still put out all the normal expected models that all the coders want and all the usual MoE type of stuff. But having at least *one* really sick dense model, instead of none, would be really nice. Not sure why these companies seem to be so anti-variety in that way. Like I get that MoE is the future and all, not saying the it can't be 80/20 or 90/10 that way, but would be nice if one of these heavy hitters released a 70b dense or 120b dense once in a blue moon instead of just literally never doing it ever again and years going by and the ancient ones still being the strongest ones at chatting/writing/RPG/etc years after they came out.

Reply

[-]

Spara-Extreme@reddit

Aren’t they rumored to be doing a 120b dense model ?

Reply

[-]

ambient_temp_xeno@reddit

That was just me trying to manifest it into reality. 70/80b dense would be great.

Reply

[-]

DeepOrangeSky@reddit

Nah I think they're saying the 120b is going to be an MoE, albeit not a super-sparse one. Like 120b with 15b active. (Should hopefully still be pretty dang cool and strong, but, whole different ballgame from a 120b dense, which would be insane. I use the Behemoth 123b dense fine-tune of Mistral 123b dense all the time btw, as my go-to model, pretty much every day, and it is easily the strongest local model I've ever used by a really big margin. And 123b is super old. If a really serious lab like Google made a dense model that big *now* it is crazy to think how strong it would be. It would be about as strong as writing as Claude, Gemini, Grok, GPT, etc. Might sound crazy, but, even those aren't dense models (huge MoEs, but fairly sparse. They might have significantly less than 120b active parameters, so, a current-times 120b dense from Google would actually be seriously strong at writing. Very slow, but, would be very, very cool. I think if they actually do a big dense one, which I don't think they will, they won't go *that* big for dense though. Probably they'll do another dense in the 24-32b size range, and no bigger, but if they do go bigger, than 70b, not 120b dense. 120b dense would be considered too weird and dense and old fashioned or something "a model made for nobody" or something (other than me, who would love to run it, lol). Anyway my posts tend to get too long so I'll stop rambling, but yea, their 120b is gonna be MoE sounds like.

Reply

[-]

Spara-Extreme@reddit

Ahhh thats disappointing. I was hoping for a 120b dense, I've used Behemoth-X-Redux a ton in the past and its one of my favorite models.

Reply

[-]

power97992@reddit

Lol it wont be better than the upcoming gemini 3.1 flash and glm 5.1, probably even worse than gem 3 flash and minimax M2.7

Reply

[-]

BelgianDramaLlama86@reddit

Better at RP/creative writing, mainly. Other things are icing on the cake, but the soft skills are what Gemma 3 was most known for, that's where the focus should be now too.

Reply

[-]

vladlearns@reddit

4:44 AM https://preview.redd.it/32ykctcb2qsg1.png?width=713&format=png&auto=webp&s=3d2216c0a2b84c678e3886769c2ba55d75314efd YEEEEEEEEEEEESSSSSSSSSSSSSSSSSSSSSSSSSSSSS

Reply

[-]

coder543@reddit

I want an extreme sparsity 175B A3B model in Q4 QAT with text+image+audio input and text+image+audio output.

Reply

[-]

CardNorth7207@reddit

With 2 million context length

Reply

[-]

JorG941@reddit

A man can only dream

Reply

[-]

Far-Low-4705@reddit

Hopefully multimodal (vision + text), reasoning, and tool calling, again with QAT. That’s basically the minimum to compete against qwen…

Reply

[-]

qwen_next_gguf_when@reddit

Less censored.

Reply

[-]

FinBenton@reddit

We have super good uncensoring stuff now lile the hauhau and heretic, wouldnt matter too much.

Reply

[-]

pigeon57434@reddit

with how sophisticated heretic is these days its honestly not a big deal but obviously its better if its just out of the box less censored

Reply

[-]

TopChard1274@reddit

A 7b model to run a q4\_k on my iPad. 8b is already a stretch. 7b is the most that wouldn’t crash the app upon importing. Right now I run a 4b qwe3.5 q6\_k variant on 32,000 context size. The dev made a pocketpal update with better suport for qwen3.5 and now the max context window I can run on iPad has basically doubled. So yeah, a 7b would be perfect for my needs.

Reply

[-]

Orbiting_Monstrosity@reddit

To never see or hear the words "dust motes" again.

Reply

[-]

m3kw@reddit

Gemma suck though

Reply

[-]

Rich_Artist_8327@reddit

It needs to be little larger like 32B and 20%,better in every aspect as gemma3 then I love it.

Reply

[-]

emteedub@reddit

omnipotence

Reply

[-]

dobomex761604@reddit

1 million context and low (like Mistral 7b) censorship.

Reply

[-]

chikengunya@reddit

120B model

Reply

[-]

coder543@reddit

175B - 200B would be great. 120B is an awkward size when people are typically choosing between machines with <32GB of VRAM or 128GB of VRAM. 175B would make better use of 128GB of VRAM, and 120B isn't going to fit in <32GB anyways.

Reply

[-]

Spara-Extreme@reddit

96GB users rejoice !

Reply

[-]

gnnr25@reddit

That we would also get Gemma 4n so that smaller models can punch above their weight.

Reply

[-]

Weird-Field6128@reddit

Okay! Idk about Gemma, but i find these models pretty useless, sorry but does anyone actually use them? And if so where? Also to be honest when I was using comfyUi i took the readymade workflow and in that I saw it was using gemma models and that is the only use i saw in encoding user queries aka prompts for the image/ video models Anything else people use these models for I would like to know. Maybe i am looking at them in the wrong way

Reply

[-]

KageYume@reddit

Gemma3 is great at translation. Its 27B QAT was BiS for Japanese translation for 24GB VRAM class for a while.

Reply

[-]

Weird-Field6128@reddit

Thank you so much honestly i did not know about this

Reply

[-]

ttkciar@reddit

I'd mainly like to see three things: * A dense model in the 24B-to-32B range. Their traditional 27B is perfect. Whatever other sizes they release is just gravy. * All the soft-skills competence we've come to love about Gemma3, but better than Gemma3, * TheDrummer rolling out another Big Tiger anti-sycophancy fine-tune! Some nice-to-haves: * Less rapid long-context competence drop-off, * Longer context limit, * A larger model, like a 120B-A15B MoE or 72B dense, * Documentation tweak admitting that system prompts are supported. Gemma2 and Gemma3 both work great with system prompts, but people keep insisting they don't because the Gemma documentation and official prompt template say so.

Reply

[-]

DeepOrangeSky@reddit

>24B-to-32B range >72B dense Yes, please. Although, particularly the 70-72b dense, even more so than the 24-32b dense. (The reason I say this isn't out of selfishness that I have enough vram to run it, rather, it's that Qwen3.5 27b dense just came out and is super strong (meaning maybe Google's edge over it might not be that huge, although, then again, it's Google, so who knows). Whereas we haven't had a super strong 70b dense model in forever, so the quality jump over what currently exists for that would potentially be really big. I don't know, I'm curious, what do you think, regarding the ~27b dense model, do you think it would still somehow be a lot stronger than even Qwen3.5 27b, or only slightly stronger/similar strength? I mean, I'm pretty new here, but from what I gather, Gemma3 27b was like crazy strong for its size when it came out (more so than even Qwen3.5 27b was relative to the current crop, maybe?).

Reply

[-]

ttkciar@reddit

> \> particularly the 70-72b dense, even more so than the 24-32b dense Yeah :-) both would be great! But I prioritize the 27B for entirely selfish reasons, as that would fit on my 32GB MI50 for fast inference. A 72B dense would definitely be a nice-to-have, but I'm still figuring out the limitations of K2-V2-Instruct, LLM360's 72B dense. It's a **very** clever model, and if Google doesn't give us a large dense Gemma4, we might be able to distill the rumored 120B-A15B into K2-V2-Instruct to get a decent approximation. > \> I don't know, I'm curious, what do you think, regarding the ~27b dense model, do you think it would still somehow be a lot stronger than even Qwen3.5 27b, or only slightly stronger/similar strength? IMO the main strength of Gemma3 wasn't that it did any particular thing best, but rather that it did everything "well enough". Qwen3 had some excellent inference skills, but there were gaps in those skills where it was weak, like editing (rewriting), self-critique, geopolitics, RAG, Theory-of-Mind, and Evol-Instruct. That didn't impact its popularity much, though, because those are somewhat niche skills that most users don't care about. That having been said, Qwen3.5 has closed those gaps. When I evaluated Qwen3.5-27B it exhibited all of those missing skills, and its competence at many of them surpassed Gemma3's. The question is, while Qwen has caught up with Gemma3's diversity of skills, has Google been sitting still? Or will Gemma4 exhibit as much improvement over Gemma3 as Gemma3 did over Gemma2?

Reply

[-]

pigeon57434@reddit

omnimodal

Reply

[-]

MerePotato@reddit

Omnimodality

Reply

[-]

WhoRoger@reddit

r/skamtebord

Reply

[-]

fyvehell@reddit

1. That this is not an April Fools joke 2. That if they also release a bigger model, they also keep the current sizes too so that more people can have a chance to run these models That is all.

Reply

[-]

llmentry@reddit

Yep. I think we're mostly hoping that Googs doesn't try to fix what isn't broken. Really hoping Gemma4 doesn't turn out to be nothing more than NicheSpecialityGemma\_300M :/

Reply

[-]

dtdisapointingresult@reddit

A 400B A20B model, natively trained at 4-bit like GPT-OSS was, that's basically perfect for people with 128GB memory.

Reply

[-]

ForsookComparison@reddit

Something dense

Reply

[-]

c--b@reddit

Unsloth support day one.

Reply

[-]

RickyRickC137@reddit

RP. Gemma 3 has the best prose out of all the open source models (even till date). The creativity was its strength when it came out.

Reply

[-]

Yu2sama@reddit

Better license for finetuners ( though I doubt is gonna happen) I would be happy if it just gets better at creative writing.

Reply

[-]

MiyamotoMusashi7@reddit

If this is an april fools joke I will crash tf out

Reply

[-]

Hans-Wermhatt@reddit

So excited, I’m ready to be let down.

Reply

[-]

celsowm@reddit

**Gemma 4 got 99% on ARC-AGI 3 !!!** >!April Fool!<

Reply

[-]

KageYume@reddit

・27B dense or 35B MoE (can run on 24GB of VRAM) ・Reasoning can be turned on or off easily ・Better Japanese - English translation capability than equivalent size Qwen 3.5 even with reasoning turned off (Gemm3 was BiS for a long time). ・Better world knowledge than equivalent size Qwen 3.5 ・Better tool calling and instruction-following than equivalent size Qwen 3.5

Reply

[-]

nickm_27@reddit

Really hoping for a MOE model between 20B and 35B

Reply

[-]

War3Z@reddit

Honestly my biggest wish is better long-context reliability without massive VRAM requirements. Gemma (and others) are getting really good, but once you push longer contexts or real-world workflows, things still get shaky or expensive fast.

Reply

[-]

triynizzles1@reddit

A few google models we’re available on LM Arena, one claiming to be unnamed made by Google and another claiming to be Gemma 4. Under the names Colosseum-1p3 and significant-otter. Colosseum-1p3 seemed very intelligent but refused to do any coding… which was odd. Based on the name I’m assuming it’s a small edge model. significant-otter self identified as Gemma 4 and sounded quite smart. It was decent with coding. Both appear to have an early 2025 knowledge cutoff (both models correctly said trump was president.) Both models responded right after pressing send, indicating they are not reasoning models. I don’t know if both models are still available to text on lm arena but it looks like the release is soon. I am most looking forward to an updated, recent knowledge cutoff.

Reply

[-]

ELPascalito@reddit

That's Google goal after all, Gemma is meant to be on edge AI, lightweight and production ready aka "guardrails"

Reply

[-]

baseketball@reddit

Please be something good VRAM peasants can run.

Reply

[-]

ab2377@reddit

if its 4b is better than qwen3.5 4b, that will be amazing & crazy.

Reply

[-]

Full_Outcome_6289@reddit

https://preview.redd.it/wgqgxq7t3osg1.jpeg?width=800&format=pjpg&auto=webp&s=1658f12394e35b29cb0195aed26086b1fb27d2d0 yes pls 80b-20b moe

Reply

[-]

rm-rf-rm@reddit

A20b for 80b total? Thats not as sparse as the SOTA.. (see A17b-397b in Qwen3.5)

Reply

[-]

Full_Outcome_6289@reddit

My computer can run this model. 20b is a pretty smart model, and I think 80b is quite sophisticated. But I don't know about the standards for how many active parameters are typically used for MoE models relative to the general parameters.

Reply

[-]

SpicyWangz@reddit

Yeah I think something like 80ba8b would be way more interesting to see

Reply

[-]

Specter_Origin@reddit (OP)

and a 40b-A5b, may be xD

Reply

[-]

Opening-Ad6258@reddit

Jost hope it runs well on my machine

Reply

[-]

TheRealMasonMac@reddit

Only thing I want is a fucking base model. Am going to be seriously pissed if they got on the train of not releasing it.

Reply

[-]

Recoil42@reddit

Improved agent/tool architectures would be a big one. This is an area where Google needs to focus for the SWE effort so I hope they do.

Reply

[-]

EbbNorth7735@reddit

I think that's a guarantee to be honest. It's the one thing all of the latest models are targeting.

Reply

[-]

Recoil42@reddit

Yep, but it's also something Gemini Pro in particular is astonishingly bad at. Current 3.1 is a brilliant general-use model that acts like a bumbling novel laureate professor who frequently loses his glasses and forgets lesson plans. Amazing at doing complex one-shot work, terrible in long loops.

Reply

[-]

masterlafontaine@reddit

Perfect description

Reply

[-]

Mochila-Mochila@reddit

No censorship 😒

Reply

[-]

random_boy8654@reddit

Any good dense model like 14B or moe 40b a3b type

Reply

[-]

Kahvana@reddit

Not another meaningless tweet.

Reply

[-]

Recoil42@reddit

The Logan tweets are pretty reliable. Pretty much always means a model release same-week.

Reply

[-]

chikengunya@reddit

I think gemma2 and gemma3 were each released on a Wednesday/Thursday, so today or tomorrow would fit...

Reply

[-]

Terminator857@reddit

Amazing how much this hype train works

Reply

[-]

Technical-Earth-3254@reddit

Thinking, at least one large dense model (>100b) and ideally native 4 bit for all models.

Reply

[-]

Specter_Origin@reddit (OP)

I will go first: I want to see a small diffusion based model for experimentation

Reply

[-]

Specter_Origin@reddit (OP)

Mine is model between: 28-40b dense adn moe

Reply

[-]

Prestigious-Use5483@reddit

Can't recall if it was a rumor or real, but I think they had models up to 4B, then the next model after 4B was 120B.

Reply

[-]

aeqri@reddit

Anything but another RNN/hybrid model that needs to reprocess the entire context when you edit or remove even a single token from the very end of it.

Reply

[-]

Revolutionary_Loan13@reddit

Faster tps

Reply

[-]

Double_Cause4609@reddit

Parscale or Loop Transformers on a dense backbone / shared expert, with a residual super low active parameter count MoE that can be offloaded to system RAM or even streamed from NVMe. Some extension of the weird residual contribution of Gemma 3N for even more sparse parameter loading. Engram (or equivalent sparse embedding contribution). Aggressive QAT, in the sub 3bit range. Tbh, something like... A 400B A53B, where the first 50B activated parameters are Parscale/Looped Transformer, and the remaining conditional 350B A3B is conditional MoE params, with a 2bit QAT would be ideal for my hardware, personally. It'd perform roughly like an \~80B dense in hard reasoning (with a parscale rate of around 8-12 parallel requests), while still having the MoE params for rare sequence memorization and general knowledge base. Plus it'd run on about 12.5GB of VRAM (for all the shared parameters), and the active count would be so low that a CPU would be perfectly comfortable to run it (even if one didn't have enough system RAM and had to stream the experts from NVMe.

Reply

[-]

5dtriangles201376@reddit

Good world awareness for the size and open license or at the bare minimum something like nvidia open where the outputs aren't Google's problem

Reply to Post

145 Comments