Qwen3 on Dubesor Benchmark

Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 10 comments

[https://dubesor.de/benchtable.html](https://dubesor.de/benchtable.html) One of the few benchmarks that tested both thinking on/off of qwen https://preview.redd.it/eim5m35nxqye1.png?width=1265&format=png&auto=webp&s=cd814d571735444429331c73b4cd17a066497907 >Small-scale manual performance comparison benchmark I made for myself. This table showcases the results I recorded of various AI models across different personal tasks I encountered over time (currently 83). I use a **weighted rating system** and calculate the difficulty for each tasks by incorporating the results of all models. This is particularly relevant in scoring when failing easy questions or passing hard ones. >**NOTE, THAT THIS JUST ME SHARING THE RESULTS FROM MY OWN SMALL-SCALE PERSONAL TESTING. YMMV! OBVIOUSLY THE SCORES ARE JUST THAT AND MIGHT NOT REFLECT YOUR OWN PERSONAL EXPERIENCES OR OTHER WELL-KNOWN BENCHMARKS.**