Gemini 3 Pro vs Kimi K2 Thinking

Yeah its in this region in terms of hardware. Higher param count models quantise better than smaller ones so a bit more than usual can be done to get the size down a bit more.

[-]

minhquan3105@reddit

Cheapest option will be to get a 12-channel server system with 512GB ram ~10-15k, run Q3 and you should be able to do 10-15 tps

[-]

Serprotease@reddit

I wanted to highlight that he said cheapest, then I remembered the cost of ddr5 ecc ram.

What a bizarre world where apple is the cheapest option.

[-]

RevolutionaryLime758@reddit

Really where’s the training data? Oh it’s one of those people who doesn’t know what open source means

[-]

Orolol@reddit

Nice can you show me the training code and the dataset ?

[-]

lemon07r@reddit

Gemini will be better at most things, and it probably wont be close, but K2T should be cheaper, and possibly faster if you can find a good provider.

[-]

fairydreaming@reddit

Lech Mazur tested both Gemini 3 Pro Preview and Kimi K2 Thinking in his nyt-connections benchmark, Kimi got 56.7% (almost the same score as DeepSeek V3.2 Exp), Gemini 96.8%.

It seems that open models are far behind the closed ones - at least in logical reasoning.

[-]

I'm sure someone will be in here to say "uh, dude this is LocalLLaMA, this isn't local" and I agree, but since I happen to have put both under the knife (yeah, I've been absolutely cramming tokens through Gemini 3 since the moment it dropped) I can give you my quick take. Kimi K2 Thinking is a really unique reasoner. Its reasoning traces are about as complicated and dynamic as any I've ever seen. It's smart, it absolutely turns over a problem in an interesting, evolving way, but it's not really sensitive or intuitive. And it lapses into adversarial know-it-all territory at least on my use cases even worse than Claude. I think K2 Thinking is much less exciting than Kimi Linear, which is a genuine sea change. A smarter version of Kimi Linear would be a new era.

Gemini 3 Pro on the other hand doesn't seem like a huge upgrade from 2.5 Pro at first blush, but from what I can tell from just 5 hours or so of hardcore use, it's MUCH less prone to the sort of self-flagellating behavior 2.5 exhibited frequently. Definitely a more open minded model that is incredibly good at inferring context, not getting stuck fawning over the latest input (2.5 Pro seemed to think absolutely everything that was most recent was the true god of the universe in a frustrating way), great at modulating its writing style more, and overall a really solid upgrade. I'll be using 3 Pro as my main workhorse.

Truth be told though, of all the models to get an upgrade recently, I'm actually most impressed with the jump from Grok 4 to 4.1. 4.1 is actually a very good model, whereas I did not like 4. GPT-5.1 is also usable, whereas I didn't like 5 except in Pro Mode.

I don't like any of these models nearly as much as I like GLM-4.6, which I would use all day every day if it was a little more stable. GLM-4.5 Air is still what I use for offline work, and I try to do as much of that as possible!

[-]

triple_og_way@reddit

GLM 4.6? that's a wildcard entry.

[-]

LoveMind_AI@reddit

I think GLM-5 is going to be really special. Plus, MIT License!

[-]

triple_og_way@reddit

Looking forward to it. :)

I have a very specific question that maybe you can answer, I wanna use ai as a life coach of sort, Accountability partner perhaps...

Which model do you think is best for this? I was thinking of going for gemini 3.0 on the gemini app as I have a student access account with more limits.

[-]

SaintlyDeamon@reddit

Gemini 3 is sooo good, I use it for coding and from my testing it is better than chatgpt 5 and claude sonnet 4.5

[-]

dubesor86@reddit

They play in different leagues. Kimi always had very unique writing skills, which got kinda neutered a bit by long-cot with thinking, so now it's more of a generic smart open model.

It's not quite as smart as Gemini 2.5 Pro let alone 3. Still good model, but as stated, different leagues.

[-]

Round_Ad_5832@reddit

benchmark is running it will update in 5 mins. it include Gemini 3 and kimi.

[-]

SlowFail2433@reddit (OP)

Oh no

I watched it live and Gemini 3 did not do good

[-]

Round_Ad_5832@reddit

nooo its still running

it shows failed because its not done

[-]

SlowFail2433@reddit (OP)

Thanks I got scared 😳

[-]

Round_Ad_5832@reddit

sonnet 4.5 beat it

but I'm going to do some more testing, maybe i need to use a different temperature for best results

[-]

SlowFail2433@reddit (OP)

Thanks okay. Sonnet is really strong so its a hard one to beat. Sonnet did still beat Gemini 3 on SWEBench after all

[-]

Round_Ad_5832@reddit

so I ran the same benchmark 22 times over 2 hours and figured out the median optimal temperature for Gemini 3 Pro for code to be 0.35
I reran the benchmark using the new temp and Gemini 3 Pro is the first model ever capable of getting 100% A+ grade.
According to my own benchmark, Gemini 3 pro at temp 0.35 is the best coding model in the world.

[-]

SlowFail2433@reddit (OP)

Thanks I am so relieved. Will do param searches per task for sure

[-]

nanotothemoon@reddit

barrrrely

[-]

Round_Ad_5832@reddit

OK now its up

[-]

Round_Ad_5832@reddit

check again in ~7 mins

[-]

DekuTheHatchback@reddit

Would you be able to add Grok to this? Really great work!

[-]

Round_Ad_5832@reddit

whenever grok 4.1 is available via api
currently we only have access to sherlock-think-alpha which is rumored to be grok 4.1 in stealth

[-]

freesnackz@reddit

Kimi K2 Thinking is not even in the same universe as Gemini 3

[-]

fractal_yogi@reddit

so k2 is better? or gemini 3 is better?

[-]

freesnackz@reddit

Gemini is the SOT model now by a big margin

[-]

Federal_Spend2412@reddit

For coding, Claude 4.5 sonnet and gemini 3 pro, which better?

[-]

kev_11_1@reddit

bro its been only 15 minutes it came to ai studio.

[-]

Yes_but_I_think@reddit

Try in Anti gravity. It's good

[-]

OGRITHIK@reddit

It's been very mid for me in antigravity

[-]

SlowFail2433@reddit (OP)

I don’t have attention span

[-]

BlueSwordM@reddit

Tried it myself.

Not impressive at all for writing vs Kimi K2 Thinking.

Scientific writing is a bit better with Kimi K2T.

In my other tests, Gemini 3 Pro is a bit better, but not enough to matter in my tests.

For multimodal tests though, it managed to maul practically everything, include intern s1, which was my best model until Gemini 3 Pro for anything multi modal.

[-]

AnticitizenPrime@reddit

I uploaded a screenshot of the user interface from the TV show Severance and asked it to recreate it in HTML (which it did perfectly).

From its thinking:

Mimicking "Scary Numbers"

I'm now iterating on the number grid, focusing on replicating the "scary numbers" effect from the reference. I'm exploring different methods to make specific numbers larger and bolder, simulating the show's emphasis. I'm also planning the CSS implementation of the "bins" with borders and inner bars to mimic the source, and a single-file SVG for the Lumon logo is the next goal. The CRT "jiggle" is already working, so now I'm creating a test number to debug this logic!

The kicker is that I did NOT tell it the screenshot was from Severance or say anything about 'scary numbers'... yet it recognized it and made the scary numbers reference. Which means Gemini has watched Severance as part of its training, lol.

They are training these models on EVERYTHING. I have a feeling its world knowledge is going to be insane.

The result btw: https://codepen.io/Madvulcan/pen/QwNggyg

[-]

NinduTheWise@reddit

ask it to make it so when you hover your mouse over the numbers they become bigger and when your mouse is no longer near them they go back to normal size

[-]

AnticitizenPrime@reddit

https://codepen.io/Madvulcan/pen/xbVrrZm

[-]

SlowFail2433@reddit (OP)

Wow

[-]

SlowFail2433@reddit (OP)

Hmm nice I really liked this interface it is very aesthetic.

Its rly funny that it knew the Severance reference it could in some ways be an advantage I guess that it is a video-watching model which is rare among LLMs (there are a few research ones that do it too)

[-]

AnticitizenPrime@reddit

I wonder if they have it watch all the trending videos on YouTube...

[-]

SlowFail2433@reddit (OP)

Ye probably

[-]

SlowFail2433@reddit (OP)

Sounds good but what did you see LOL

Hoping it wasn’t just the SVG of a Pelican test

[-]

SlowFail2433@reddit (OP)

Sounds good, im interested in the longer tasks yeah