Gemini 3 Pro vs Kimi K2 Thinking
Posted by SlowFail2433@reddit | LocalLLaMA | View on Reddit | 63 comments
Has anyone done some initial comparisons between the new Gemini 3 Pro and Kimi K2 Thinking?
What are their strengths/weaknesses relative to each other?
hawk-ist@reddit
Kimi - open source. 🌝🌝
Repsol_Honda_PL@reddit
How much VRAM it needs to run locally??
nomorebuttsplz@reddit
About 512 for near top quality
Repsol_Honda_PL@reddit
512 GB of RAM I could get, but 512 GB VRAM - will be hard ;)
_murb@reddit
You check ram prices lately? It’s rough the last few weeks
Repsol_Honda_PL@reddit
Yes, lately they have raised prices. But comparing to what we had few years ago - still are cheap IMO.
MidnightProgrammer@reddit
You know what isn't hard? Saving 15% on your car insurance.
No_Afternoon_4260@reddit
The weights for the q4 are 580gb+ without context
nomorebuttsplz@reddit
Yes. Q 3 K X L is very good though.
SlowFail2433@reddit (OP)
Yeah its in this region in terms of hardware. Higher param count models quantise better than smaller ones so a bit more than usual can be done to get the size down a bit more.
minhquan3105@reddit
Cheapest option will be to get a 12-channel server system with 512GB ram ~10-15k, run Q3 and you should be able to do 10-15 tps
john0201@reddit
Or a Mac Studio
Serprotease@reddit
I wanted to highlight that he said cheapest, then I remembered the cost of ddr5 ecc ram.
What a bizarre world where apple is the cheapest option.
ShengrenR@reddit
Yes.
RevolutionaryLime758@reddit
Really where’s the training data? Oh it’s one of those people who doesn’t know what open source means
Orolol@reddit
Nice can you show me the training code and the dataset ?
SlowFail2433@reddit (OP)
Ye but maybe we can steal some ideas
lemon07r@reddit
Gemini will be better at most things, and it probably wont be close, but K2T should be cheaper, and possibly faster if you can find a good provider.
fairydreaming@reddit
Lech Mazur tested both Gemini 3 Pro Preview and Kimi K2 Thinking in his nyt-connections benchmark, Kimi got 56.7% (almost the same score as DeepSeek V3.2 Exp), Gemini 96.8%.
It seems that open models are far behind the closed ones - at least in logical reasoning.
LoveMind_AI@reddit
I'm sure someone will be in here to say "uh, dude this is LocalLLaMA, this isn't local" and I agree, but since I happen to have put both under the knife (yeah, I've been absolutely cramming tokens through Gemini 3 since the moment it dropped) I can give you my quick take. Kimi K2 Thinking is a really unique reasoner. Its reasoning traces are about as complicated and dynamic as any I've ever seen. It's smart, it absolutely turns over a problem in an interesting, evolving way, but it's not really sensitive or intuitive. And it lapses into adversarial know-it-all territory at least on my use cases even worse than Claude. I think K2 Thinking is much less exciting than Kimi Linear, which is a genuine sea change. A smarter version of Kimi Linear would be a new era.
Gemini 3 Pro on the other hand doesn't seem like a huge upgrade from 2.5 Pro at first blush, but from what I can tell from just 5 hours or so of hardcore use, it's MUCH less prone to the sort of self-flagellating behavior 2.5 exhibited frequently. Definitely a more open minded model that is incredibly good at inferring context, not getting stuck fawning over the latest input (2.5 Pro seemed to think absolutely everything that was most recent was the true god of the universe in a frustrating way), great at modulating its writing style more, and overall a really solid upgrade. I'll be using 3 Pro as my main workhorse.
Truth be told though, of all the models to get an upgrade recently, I'm actually most impressed with the jump from Grok 4 to 4.1. 4.1 is actually a very good model, whereas I did not like 4. GPT-5.1 is also usable, whereas I didn't like 5 except in Pro Mode.
I don't like any of these models nearly as much as I like GLM-4.6, which I would use all day every day if it was a little more stable. GLM-4.5 Air is still what I use for offline work, and I try to do as much of that as possible!
triple_og_way@reddit
GLM 4.6? that's a wildcard entry.
LoveMind_AI@reddit
I think GLM-5 is going to be really special. Plus, MIT License!
triple_og_way@reddit
Looking forward to it. :)
I have a very specific question that maybe you can answer, I wanna use ai as a life coach of sort, Accountability partner perhaps...
Which model do you think is best for this? I was thinking of going for gemini 3.0 on the gemini app as I have a student access account with more limits.
SaintlyDeamon@reddit
Gemini 3 is sooo good, I use it for coding and from my testing it is better than chatgpt 5 and claude sonnet 4.5
dubesor86@reddit
They play in different leagues. Kimi always had very unique writing skills, which got kinda neutered a bit by long-cot with thinking, so now it's more of a generic smart open model.
It's not quite as smart as Gemini 2.5 Pro let alone 3. Still good model, but as stated, different leagues.
Round_Ad_5832@reddit
benchmark is running it will update in 5 mins. it include Gemini 3 and kimi.
SlowFail2433@reddit (OP)
Oh no
I watched it live and Gemini 3 did not do good
Round_Ad_5832@reddit
nooo its still running
it shows failed because its not done
SlowFail2433@reddit (OP)
Thanks I got scared 😳
Round_Ad_5832@reddit
sonnet 4.5 beat it
but I'm going to do some more testing, maybe i need to use a different temperature for best results
SlowFail2433@reddit (OP)
Thanks okay. Sonnet is really strong so its a hard one to beat. Sonnet did still beat Gemini 3 on SWEBench after all
Round_Ad_5832@reddit
so I ran the same benchmark 22 times over 2 hours and figured out the median optimal temperature for Gemini 3 Pro for code to be 0.35
I reran the benchmark using the new temp and Gemini 3 Pro is the first model ever capable of getting 100% A+ grade.
According to my own benchmark, Gemini 3 pro at temp 0.35 is the best coding model in the world.
SlowFail2433@reddit (OP)
Thanks I am so relieved. Will do param searches per task for sure
nanotothemoon@reddit
barrrrely
Round_Ad_5832@reddit
OK now its up
Round_Ad_5832@reddit
check again in ~7 mins
DekuTheHatchback@reddit
Would you be able to add Grok to this? Really great work!
Round_Ad_5832@reddit
whenever grok 4.1 is available via api
currently we only have access to sherlock-think-alpha which is rumored to be grok 4.1 in stealth
freesnackz@reddit
Kimi K2 Thinking is not even in the same universe as Gemini 3
fractal_yogi@reddit
so k2 is better? or gemini 3 is better?
freesnackz@reddit
Gemini is the SOT model now by a big margin
Federal_Spend2412@reddit
For coding, Claude 4.5 sonnet and gemini 3 pro, which better?
kev_11_1@reddit
bro its been only 15 minutes it came to ai studio.
Yes_but_I_think@reddit
Try in Anti gravity. It's good
OGRITHIK@reddit
It's been very mid for me in antigravity
SlowFail2433@reddit (OP)
I don’t have attention span
BlueSwordM@reddit
Tried it myself.
Not impressive at all for writing vs Kimi K2 Thinking.
Scientific writing is a bit better with Kimi K2T.
In my other tests, Gemini 3 Pro is a bit better, but not enough to matter in my tests.
For multimodal tests though, it managed to maul practically everything, include intern s1, which was my best model until Gemini 3 Pro for anything multi modal.
AnticitizenPrime@reddit
I uploaded a screenshot of the user interface from the TV show Severance and asked it to recreate it in HTML (which it did perfectly).
From its thinking:
The kicker is that I did NOT tell it the screenshot was from Severance or say anything about 'scary numbers'... yet it recognized it and made the scary numbers reference. Which means Gemini has watched Severance as part of its training, lol.
They are training these models on EVERYTHING. I have a feeling its world knowledge is going to be insane.
The result btw: https://codepen.io/Madvulcan/pen/QwNggyg
NinduTheWise@reddit
ask it to make it so when you hover your mouse over the numbers they become bigger and when your mouse is no longer near them they go back to normal size
AnticitizenPrime@reddit
https://codepen.io/Madvulcan/pen/xbVrrZm
SlowFail2433@reddit (OP)
Wow
SlowFail2433@reddit (OP)
Hmm nice I really liked this interface it is very aesthetic.
Its rly funny that it knew the Severance reference it could in some ways be an advantage I guess that it is a video-watching model which is rare among LLMs (there are a few research ones that do it too)
AnticitizenPrime@reddit
I wonder if they have it watch all the trending videos on YouTube...
SlowFail2433@reddit (OP)
Ye probably
TheRealMasonMac@reddit
From my testing so far, Gemini 3 Pro is fairly dumb and poor at instruction following. Maybe it's just day 1 configuration issues on their end, but worse than even the heavily quantized 2.5 Pro.
abdouhlili@reddit
I gave Gemini 3 pro 2 prompts to test every new model and my jaw dropped.
WolfeheartGames@reddit
Please elaborate
SlowFail2433@reddit (OP)
Sounds good but what did you see LOL
Hoping it wasn’t just the SVG of a Pelican test
Repsol_Honda_PL@reddit
Why?
Emergency-Pomelo-256@reddit
Kimi k2 thinking was worse for me than non thinking
Pink_da_Web@reddit
There's no way around it, Gemini 3 is superior in every way.
dadidutdut@reddit
I did some test and its miles ahead with complex prompts that I use for testing. let wait and see benchmarks
SlowFail2433@reddit (OP)
Sounds good, im interested in the longer tasks yeah