Looking for open source 10B model that is comparable to gpt4o-mini

Posted by bohemianLife1@reddit | LocalLLaMA | View on Reddit | 36 comments

Hi All, big fan of this community.

I am looking for a 10B model that is comparable to GPT4o-mini.
Application is simple it has to be coherent in sentence formation (conversational) i.e ability follow good system prompt (15k token length).
Good Streaming performance (TTFT, 600 ms).
Solid reliability on function calling upto 15 tools.

Some background:-

In my daily testing (Voice Agent developer) I found only one model till date which is useful in voice application. That is GPT4o-mini after this model no model in open / close has come to it. I was very excited for LFM model with amazing state space efficiency but I failed to get good system prompt adherence with it.

All new model again closed / open are focusing on intelligence (through reasoning) and not reliability with speed.

If anyone has proper suggestion it would help the most.

I am trying to put voice agent in single GPU.
ASR with https://huggingface.co/nvidia/parakeet_realtime_eou_120m-v1 (it's amazing takes 1GB of VRAM)
LLM <=== Need help!
TTS with https://github.com/ysharma3501/FastMaya (Maya 1 from maya research)

Hardware: 16GB 5060Ti