Can application layer improve local model output quality?

Posted by ayechat@reddit | LocalLLaMA | View on Reddit | 16 comments

Hi -

I am building a terminal-native tool for code generation, and one of the recent updates was to package a local model (Qwen 2.5 Coder 7B, downloads on the first try). Initial response from users to this addition was favorable - but I have my doubts: the model is fairly basic and does not compare in quality to online offerings.

So - I am planning to improve RAG capabilities for building a message with relevant source file chunks, add a planning call, add validation loop, maybe have a multi-sample with re-ranking, etc.: all those techniques that are common and when implemented properly - could improve quality of output.

So - the question: I believe (hope?) that with all those things implemented - 7B can be bumped approximately to quality of a 20B, do you agree that's possible or do you think it would be a wasted effort and that kind of improvement would not happen?

The source is here - give it a star if you like what you see: https://github.com/acrotron/aye-chat