Kimi K2.6 now leads all models in 3D Design
Posted by Repulsive-Mall-2665@reddit | LocalLLaMA | View on Reddit | 21 comments
One of the best benchmarks
Posted by Repulsive-Mall-2665@reddit | LocalLLaMA | View on Reddit | 21 comments
One of the best benchmarks
ttkciar@reddit
Violates Rule Three: Low-effort post
The moderator team is trying to raise the bar on benchmark posts, to avoid inundating the sub. It is no longer sufficient to provide a screenshot of benchmark results. Benchmarks should be accompanied by insightful analysis or on-topic points which bring new understanding to the community.
CheatCodesOfLife@reddit
No opus 4.7. Also, arsehole of a website let me type in a prompt then gave me a login wall upon submission.
annodomini@reddit
I see Opus 4.7 on the leaderboard; it actually scores worse than 4.6 on these, it's down a few.
CheatCodesOfLife@reddit
That's so unexpected I didn't even think to look there lmao
Dubaqe@reddit
Opus 4.7 is the lowest of the listed LLMs, you can see it on the right side.
EfficientSurprise954@reddit
Opus 4.7 is the lowest of the shown LLMs, you can see it on the right side.
Alex_1729@reddit
This bench doesn't test 5.4 on anything higher than medium reasoning? Major gap there.
jld1532@reddit
A lot of mad people in here that probably don't have access to Kimi. I do. It's legit. More people are going to figure out how to run this locally somehow, I'd bet money on it.
horeaper@reddit
How to do 3D design with kimi?
Glittering-Call8746@reddit
This
deejeycris@reddit
Trust me bro benchmark
Effective_Head_5020@reddit
Now that Claude/openAI is starting to lose ojln their own game there are a lot of comments disbelieving benchmarks, interesting
Zulfiqaar@reddit
But designarena is literally blind comparison?
hajime-owari@reddit
Crowdsourced benchmarks are so outdated man.
These arenas belong in 2024, not 2026.
Eyelbee@reddit
They certainly aren't, they remain the most reliable benchmark tool, in fact. No other benchmarks reflect real usage capabilities like it.
BlacksmithLittle7005@reddit
That's pretty cool! Thank you for sharing :)
Worried-Squirrel2023@reddit
trust me bro benchmark is doing a lot of work in this thread. without the eval methodology this is just a screenshot of a number.
Repulsive-Mall-2665@reddit (OP)
It's based on user voting. Probably better that morons like you don't ruin it.
Only_Response_3083@reddit
what is that benchmark? where is the source?
Repulsive-Mall-2665@reddit (OP)
https://www.designarena.ai/
https://www.designarena.ai/leaderboard
Only_Response_3083@reddit
thank you! tried searching but there were a lot of other "design leaderboards"