Forgive my ignorance but how is a 27B model better than 397B?

Posted by No_Conversation9561@reddit | LocalLLaMA | View on Reddit | 230 comments

Forgive my ignorance but how is a 27B model better than 397B?

Is Qwen just incredibly good at doing dense and not so good at doing MoE?

I get that dense is generally better than MoE but 27B being better than 397B just doesn’t sit right with me.

What are those additional experts even doing then?