What are your guys favorite models on hugging face and what do you use it for?

Posted by EducationalText9221@reddit | LocalLLaMA | View on Reddit | 13 comments

I am trying to start a coding project and use ai and would like to know what are some models that everyone likes? Preferably nothing too huge like not 350B lol

[-]

ttkciar@reddit

Your best bet for agentic codegen (like with OpenCode) is probably Qwen3.5-122B-A10B.

If you're just using it like a chatbot, and tool-calling isn't important, I strongly recommend GLM-4.5-Air.

If your hardware isn't up for hosting either of those, your best bets are Qwen3.7-27B or Gemma-4-31B-it.

The quantization "sweet spot" is Q4_K_M. Quantizing any smaller than that will lobotomize the model.

[-]

YourNightmar31@reddit

Pretty sure Qwen3.6 27B beats Qwen3.5 122B-A3B no?

[-]

ttkciar@reddit

I am not aware of a Qwen3.5-122B-A3B, only of Qwen3.5-122B-A10B.

Qwen3.6-27B is pretty great, but for codegen Qwen3.5-122B-A10B still beats it.

[-]

colin_colout@reddit

Any tips for 122b? I tried it and it's not that great compared to MiniMax M2.7 Q4 NL on my Strix Halo.

[-]

suicidaleggroll@reddit

MiniMax-M2.7 is newer, twice the size, and is designed for coding. It will beat Qwen3.5-122B every time. It’s also much slower.

[-]

ttkciar@reddit

No, that's to be expected. If you have the hardware to infer with MiniMax-M2.7, then do that. It's the better model.

My recommendation of Qwen3.5-122B-A10B was for lesser hardware.

[-]

El_Danger_Badger@reddit

Gemma 4 2B, locally hosted, fantastic. Will do whatever you need.

In the end, the difference between "the best model" is like which Ferarri is the best Ferrari. Clearly the red one is the best Ferrari.

Put one on your machine and try it out. If you dislike the results, put in another one.

Trial and error until you get what you deem are the best results.

There is no right answer to "which model is the best".

[-]

What hardware do you have and what size can you accommodate? My current daily driver is Qwen3.6-35B-A3B - unsloth's UD-Q4_K_M for my main pc and then I have been messing around with their UD-Q2_K_XL version on my Pi5 for portable offline testing of another side project I am working on (runs at 3 t/s on the pi so no good for main work). But the Q4 has been brilliant so far - did some initial stress testing with increasingly complex questions and it didn't s**t the bed once. So now I am using it for a vast data cleaning exercise and its performance has been remarkable compared to all previous offline models I have tried (that fit in my case)

But the UD-Q2_K_XL is also surprisingly capable for its footprint and again only really struggles with accuracy once you get into niche stuff but with the right RAG pipe it too can get round most problems you throw at it.

[-]

You can add your hardware to huggingface and it will suggest compatibility as well.

Probably some newer qwen 3.6 or 3.7 in ~9b dense or ~36b moe, don't go smaller than 4-bit but 6-bit or 8-bit can perform better sometimes, depends on the model.