R1+Sonnet set a new SOTA on the aider polyglot benchmark, at 14X less cost compared to o1
Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 46 comments
https://preview.redd.it/zub2yfarfzee1.jpg?width=1656&format=pjpg&auto=webp&s=b92fd272248cd2290b56236ab40716acd51979aa
**64% R1+Sonnet**
62% o1
**57%** **R1**
52% Sonnet
48% DeepSeek V3
>"There has been some recent discussion about extracting the <think> tokens from R1 and feeding them to Sonnet.
To be clear, the results above are not using R1’s thinking tokens. Using the thinking tokens appears to produce worse benchmark results.
o1 paired with Sonnet didn’t produce better results than just using o1 alone. Using various other models as editor didn’t seem to improve o1 or R1 versus their solo scores.
>\---
Aider supports using a pair of models for coding:
>\-An Architect model is asked to describe how to solve the coding problem. Thinking/reasoning models often work well in this role.
>\-An Editor model is given the Architect’s solution and asked to produce specific code editing instructions to apply those changes to existing source files.
>**R1 as architect with Sonnet as editor has set a new SOTA of 64.0%** on the aider polyglot benchmark. They achieve this at **14X less cost** compared to the previous o1 SOTA result."
[*https://aider.chat/2025/01/24/r1-sonnet.html*](https://aider.chat/2025/01/24/r1-sonnet.html)
46 Comments
LittleGalaxyBrain@reddit
fedya1@reddit
Soft_Hedgehog_4317@reddit
Mother_Soraka@reddit
AriyaSavaka@reddit
boredcynicism@reddit
OXKSA1@reddit
Educational_Gap5867@reddit
boredcynicism@reddit
Long-John-Sliver22@reddit
NewGeneral7964@reddit
MoffKalast@reddit
davewolfs@reddit
vdp@reddit
pigeon57434@reddit
eposnix@reddit
mycall@reddit
HelpfulHand3@reddit
ThisWillPass@reddit
bitmoji@reddit
extopico@reddit
BoJackHorseMan53@reddit
ArgumentFeeling@reddit
HatZinn@reddit
Sky-kunn@reddit
BoJackHorseMan53@reddit
whosbabo@reddit
Enough-Meringue4745@reddit
MLDataScientist@reddit
Pro-editor-1105@reddit
ANONYMOUSEJR@reddit
jd_3d@reddit
jaMMint@reddit
Snoo_64233@reddit
Pvt_Twinkietoes@reddit
mycall@reddit
hassan789_@reddit
segmond@reddit
vert1s@reddit
m3kw@reddit
boredcynicism@reddit
Recoil42@reddit
flextrek_whipsnake@reddit
t_krett@reddit
cant-find-user-name@reddit
Mediocre_Tree_5690@reddit