Best non-Chinese open models?

Posted by ProbaDude@reddit | LocalLLaMA | View on Reddit | 26 comments

Yes I know that running them locally is fine, and believe me there's nothing I'd like to do more than just use Qwen, but there is significant resistance to anything from China in this use case

Most important factor is it needs to be good at RAG, summarization and essay/report writing. Reasoning would also be a big plus

I'm currently playing around with Llama 3.3 Nemotron Super 49B and Gemma 3 but would love other things to consider

[-]

sommerzen@reddit

Cohere also offers a verity of models like Aya and Command R. The advise them being pretty good at Rag and function calling tasks I think.

[-]

pol_phil@reddit

Gemma 3 27B and Mistral Small 3.1 24B (also Magistral for reasoning) are solid choices.

[-]

Microsoft's Phi4 14B is still one of my favorite models. Even though it's smaller in parameter count and a few months old now, it still is very capable for certain task-oriented applications. I had a lot of fun messing around with it for vibe coding, for instance. It looks like they have a few new reasoning fine-tunes out as well. It's not so great at believable character conversation, though. It was definitely trained with its use as a practical tool in mind.

[-]

RadiantHueOfBeige@reddit

Phi 4 (14 B) is fantastic for "technical" use, like summarization, translation (one of the few small models consistently capable of understanding casual/slang Japanese), generally understanding stuff and making decisions based on it. Works good with RAG. It's okay for Q&A stuff and writing code. Meh personality.

Phi 4 Mini (4 B) retains the technical use, drops knowledge. Good for summarization, relevant question generation etc. and is very fast even on CPU.

[-]

DeepWisdomGuy@reddit

Came here to say this. Phi-4-14B-reasoning-plus scores higher on the MMLU Pro Leaderboard than the Llamas other than the ones with >400B parameters.

[-]

tingshuo@reddit

I have a similar situation and have preferred devstral by mistral. Llama 3.3 70b and Nemotron are also good options but much slower. Devstral runs quite fast on a couple of A100

[-]

dubesor86@reddit

The best non-chinese open models are imo Llama 3.1 405B, Llama 3.3 70B, Llama 3.3 Nemotron Super 49B.

However, excluding chinese models (deepseek, qwen) will be a step back.

[-]

pseudonerv@reddit

How about nemotron ultra?

[-]

dubesor86@reddit

Oh you mean the 235b? Yea probably yes, didn't have a chance to test it as no one seems to host it (outsides of Chutes that logs and train on queries).

[-]

Ok_Warning2146@reddit

Better than 49B but too resource intensive to run it fast. Of course, it is strictly better and faster than 405B.

[-]

AfterAte@reddit

Just modify the system prompt to say it was created by Open AI. Nobody will know the difference if you only expose an API.

[-]

SimilarWarthog8393@reddit

Phi 4 / Llama 4 are okay, I've tested so many non Chinese models and wasn't thrilled with the results from the majority of them.

[-]

decentralizedbee@reddit

curious what's the main reason behind non-chinese open models, is it your specific region, business constraints, or just preference?

[-]

ProbaDude@reddit (OP)

Mix of security concerns (which are overblown) and concerns over bias (which are much less overblown since we do talk about China)

[-]

JiminP@reddit

Maybe R1-1776?

https://www.perplexity.ai/hub/blog/open-sourcing-r1-1776

"It's a Chinese model sanitized by a reputable American AI company" is technically insignificant, but persuasive to the general public, marketing phrase.

[-]

heartprairie@reddit

Microsoft also has their own post-trained R1 https://huggingface.co/microsoft/MAI-DS-R1

[-]

dinerburgeryum@reddit

Mistral and family are probably your best bet. Falcon comes to mind as well. Jamba too I believe.

[-]

ProbaDude@reddit (OP)

Haven't used Falcon or Jamba, what are they good for?

[-]

dinerburgeryum@reddit

I mean, they're LLMs that aren't Chinese. Jamba is a state space model-type LLM too, which makes it inherently interesting. Falcon I believe hails from the UAE; I think it's a standard transformer model. I bet they're both pretty good at "general summarization", though I'd check tool use performance on them if RAG is a big part of your pipeline.

[-]

ProbaDude@reddit (OP)

Thanks

[-]