What are your guys favorite models on hugging face and what do you use it for?
Posted by EducationalText9221@reddit | LocalLLaMA | View on Reddit | 13 comments
I am trying to start a coding project and use ai and would like to know what are some models that everyone likes? Preferably nothing too huge like not 350B lol
LocalLLaMA-ModTeam@reddit
Rule 3
ttkciar@reddit
Your best bet for agentic codegen (like with OpenCode) is probably Qwen3.5-122B-A10B.
If you're just using it like a chatbot, and tool-calling isn't important, I strongly recommend GLM-4.5-Air.
If your hardware isn't up for hosting either of those, your best bets are Qwen3.7-27B or Gemma-4-31B-it.
The quantization "sweet spot" is Q4_K_M. Quantizing any smaller than that will lobotomize the model.
YourNightmar31@reddit
Pretty sure Qwen3.6 27B beats Qwen3.5 122B-A3B no?
abnormal_human@reddit
Not in my benches but it’s close.
ttkciar@reddit
I am not aware of a Qwen3.5-122B-A3B, only of Qwen3.5-122B-A10B.
Qwen3.6-27B is pretty great, but for codegen Qwen3.5-122B-A10B still beats it.
colin_colout@reddit
Any tips for 122b? I tried it and it's not that great compared to MiniMax M2.7 Q4 NL on my Strix Halo.
suicidaleggroll@reddit
MiniMax-M2.7 is newer, twice the size, and is designed for coding. It will beat Qwen3.5-122B every time. It’s also much slower.
ttkciar@reddit
No, that's to be expected. If you have the hardware to infer with MiniMax-M2.7, then do that. It's the better model.
My recommendation of Qwen3.5-122B-A10B was for lesser hardware.
El_Danger_Badger@reddit
Gemma 4 2B, locally hosted, fantastic. Will do whatever you need.
In the end, the difference between "the best model" is like which Ferarri is the best Ferrari. Clearly the red one is the best Ferrari.
Put one on your machine and try it out. If you dislike the results, put in another one.
Trial and error until you get what you deem are the best results.
There is no right answer to "which model is the best".
Ok_Selection_7577@reddit
What hardware do you have and what size can you accommodate? My current daily driver is Qwen3.6-35B-A3B - unsloth's UD-Q4_K_M for my main pc and then I have been messing around with their UD-Q2_K_XL version on my Pi5 for portable offline testing of another side project I am working on (runs at 3 t/s on the pi so no good for main work). But the Q4 has been brilliant so far - did some initial stress testing with increasingly complex questions and it didn't s**t the bed once. So now I am using it for a vast data cleaning exercise and its performance has been remarkable compared to all previous offline models I have tried (that fit in my case)
But the UD-Q2_K_XL is also surprisingly capable for its footprint and again only really struggles with accuracy once you get into niche stuff but with the right RAG pipe it too can get round most problems you throw at it.
Ok_Commission_8260@reddit
For coding, I keep coming back to Hugging face models like Meta Platforms Llama 3.1 8B, Qwen Qwen2.5-Coder 7B, and Mistral AI Codestral. The 7B–8B models are a great balance of speed and quality, and they run comfortably on consumer hardware.
Fit-Produce420@reddit
Was this written by AI? Those models are old as dirt.
Fit-Produce420@reddit
You can literally search and sort on huggingface to answer that.
Usually the answer is: the largest and newest model that fits on the hardware you're using which runs at a speed fast enough for you to use it.
You can add your hardware to huggingface and it will suggest compatibility as well.
Probably some newer qwen 3.6 or 3.7 in ~9b dense or ~36b moe, don't go smaller than 4-bit but 6-bit or 8-bit can perform better sometimes, depends on the model.