Comparison of latest reasoning models on the most recent LeetCode questions (Qwen-32B vs Qwen-235B vs nvidia-OpenCodeReasoning-32B vs Hunyuan-A13B)

Posted by kyazoglu@reddit | LocalLLaMA | View on Reddit | 30 comments

Comparison of latest reasoning models on the most recent LeetCode questions (Qwen-32B vs Qwen-235B vs nvidia-OpenCodeReasoning-32B vs Hunyuan-A13B)

Testing method

Coloring strategy:

A few observations:

Hardware: 2x H100

Backend: vLLM (for hunyuan, use 0.9.2 and for others 0.9.1)

Feel free to recommend another reasoning model for me to test but it must have a vLLM compatible quantized version that fits within 160 GB.

Keep in mind that strong performance on LeetCode doesn't automatically reflect real world coding skills, since everyday programming tasks faced by typical users are usually far less complex.

All questions are recent, with no data leakage involved. So don’t come back saying “LeetCode problems are easy for models, this test isn’t meaningful”. It's just your test questions have been seen by the model before.