Study: 2x+ coding performance of 7B model without touching the coding agent

Posted by 9gxa05s8fa8sh@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

akd_io@reddit

Fascinating paper. I'm surprised though, that they saw these stats and concluded 32,32,32 was the best config, and not that they needed to do more testing.

Or maybe I'm way over my head and this makes sense somehow? Can somebody explain?

[-]

I THINK the reason people don't just LoRA everything is because it's touchy voodoo. and those ups and downs show it. all the different details from the different test tests become a magic stew that then gets poured over different parts of the model. it is entirely unpredictable... I THINK, so someone can correct me

[-]

TomLucidor@reddit

Seconding this, debugging is harder than coding, reflection is harder than structured work.

[-]

joexner@reddit

Does the debugging agent have access to more information, like whether the code compiles and passes the tests? Can it do more stuff?

[-]

SnooPaintings8639@reddit

So it begins. The return of the LoRa.

[-]

DinoAmino@reddit

Return? It was gone?

[-]

TomLucidor@reddit

Cus people mostly do finetunes for RP not skills, there was a storm of people wanting to top the open leaderboard with evolutionary merging as well.

[-]

DinoAmino@reddit

Ah, yep. Forgot the /s

[-]

TomLucidor@reddit

The absolute unironic sad state of affairs. We need leaderboards agains but with live benchmarks to mess with benchmaxxers

[-]

kmouratidis@reddit

This is somewhat similar to what I wanted to do with Mistral/Devstral/Magistral about a year ago but after stitching, unmerging was a pain and I didn't. Nice to see someone with a functioning brain try this in a more formal way.

[-]

Pro-Row-335@reddit

Merging loras on the fly to adapt to a task... just like what people have been doing with image models

[-]

HornyGooner4402@reddit

This is what I've been thinking, like MoE but the area of expertise of each "expert" is clearly defined, e.g. specific tools or tasks