Study: 2x+ coding performance of 7B model without touching the coding agent
Posted by 9gxa05s8fa8sh@reddit | LocalLLaMA | View on Reddit | 13 comments
Posted by 9gxa05s8fa8sh@reddit | LocalLLaMA | View on Reddit | 13 comments
9gxa05s8fa8sh@reddit (OP)
https://arxiv.org/abs/2509.17489
akd_io@reddit
Fascinating paper. I'm surprised though, that they saw these stats and concluded 32,32,32 was the best config, and not that they needed to do more testing.
Or maybe I'm way over my head and this makes sense somehow? Can somebody explain?
9gxa05s8fa8sh@reddit (OP)
I THINK the reason people don't just LoRA everything is because it's touchy voodoo. and those ups and downs show it. all the different details from the different test tests become a magic stew that then gets poured over different parts of the model. it is entirely unpredictable... I THINK, so someone can correct me
TomLucidor@reddit
Seconding this, debugging is harder than coding, reflection is harder than structured work.
joexner@reddit
Does the debugging agent have access to more information, like whether the code compiles and passes the tests? Can it do more stuff?
SnooPaintings8639@reddit
So it begins. The return of the LoRa.
DinoAmino@reddit
Return? It was gone?
TomLucidor@reddit
Cus people mostly do finetunes for RP not skills, there was a storm of people wanting to top the open leaderboard with evolutionary merging as well.
DinoAmino@reddit
Ah, yep. Forgot the /s
TomLucidor@reddit
The absolute unironic sad state of affairs. We need leaderboards agains but with live benchmarks to mess with benchmaxxers
kmouratidis@reddit
This is somewhat similar to what I wanted to do with Mistral/Devstral/Magistral about a year ago but after stitching, unmerging was a pain and I didn't. Nice to see someone with a functioning brain try this in a more formal way.
Pro-Row-335@reddit
Merging loras on the fly to adapt to a task... just like what people have been doing with image models
HornyGooner4402@reddit
This is what I've been thinking, like MoE but the area of expertise of each "expert" is clearly defined, e.g. specific tools or tasks