R1+Sonnet set a new SOTA on the aider polyglot benchmark, at 14X less cost compared to o1

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 46 comments

https://preview.redd.it/zub2yfarfzee1.jpg?width=1656&format=pjpg&auto=webp&s=b92fd272248cd2290b56236ab40716acd51979aa **64% R1+Sonnet** 62% o1 **57%** **R1** 52% Sonnet 48% DeepSeek V3 >"There has been some recent discussion about extracting the <think> tokens from R1 and feeding them to Sonnet. To be clear, the results above are not using R1’s thinking tokens. Using the thinking tokens appears to produce worse benchmark results. o1 paired with Sonnet didn’t produce better results than just using o1 alone. Using various other models as editor didn’t seem to improve o1 or R1 versus their solo scores. >\--- Aider supports using a pair of models for coding: >\-An Architect model is asked to describe how to solve the coding problem. Thinking/reasoning models often work well in this role. >\-An Editor model is given the Architect’s solution and asked to produce specific code editing instructions to apply those changes to existing source files. >**R1 as architect with Sonnet as editor has set a new SOTA of 64.0%** on the aider polyglot benchmark. They achieve this at **14X less cost** compared to the previous o1 SOTA result." [*https://aider.chat/2025/01/24/r1-sonnet.html*](https://aider.chat/2025/01/24/r1-sonnet.html)