Best current dense, nonthinking models in the 8b-14b range?

Posted by Priceless_Pennies@reddit | LocalLLaMA | View on Reddit | 23 comments

It seems like a lot of the state of the art open models that are being released are either MoE models or Thinking models.

I understand that these are useful ways to improve performance, but with my setup I'm looking for models that don't have these characteristics. I was wondering what recommendations you guys have?

Thanks!

[-]

our_sole@reddit

I've had good text summarization results with command-r7b from Cohere along with Ollama.

[-]

smirkishere@reddit

What do you need? Let me train you one.

[-]

noctrex@reddit

Some of the newer ones:

Granite-4.0-H-Tiny is 7B: https://huggingface.co/unsloth/granite-4.0-h-tiny-GGUF

Apertus-8B-Instruct-2509: https://huggingface.co/unsloth/Apertus-8B-Instruct-2509-GGUF

LFM2-8B-A1B: https://huggingface.co/unsloth/LFM2-8B-A1B-GGUF

Falcon-H1-7B-Instruct: https://huggingface.co/unsloth/Falcon-H1-7B-Instruct-GGUF

gemma-3n-E4B: https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF

[-]

noctrex@reddit

Yes, as its telling from the name: 8B parameters, A1B: active 1B parameters

The same for Granite-4.0-H-Tiny, its also MoE, 1B active of 7B.

Is that a problem?

Those MoE models will run pretty fast even on CPU alone or Mobile devices.

[-]

TheManicProgrammer@reddit

Sure, but I believe the request was for a dense non-thinking model wasn't it?

[-]

gemma-3-12b-it-q4_0_s - creative tasks and general tasks
gpt-oss-20b-Q8_0 - cuz work fast even on cpu, working tasks
Qwen3-30B-A3B-Instruct-2507-Q6_K - cuz work fast even on cpu, working tasks and general tasks
Llama-3.1-SuperNova-Lite-Q6_K - fast, smart for light tasks
MN-12B-Mag-Mell-R1.Q6_K - RP tasks and general tasks
NemoMix-Unleashed-12B-Q6_K - RP tasks and general tasks
phi-4-Q5_K_M - working tasks / JSON tasks
Qwen3-4B-Instruct-2507-Q6_K - insanely fast, smart for light tasks
Qwen3-14B - working tasks and general tasks

This list covers most of possible use cases.

[-]

RobotRobotWhatDoUSee@reddit

What is your use case?

As noted by others, these two can be quite good for tier size:

Phi 4 14B
Gemma 3 12B

Both are dense and non reasoning.

[-]

SkyFeistyLlama8@reddit

Gemma 3 12B, pretty much. Granite 4.0 7B is also supposed to be good but I haven't tried it yet. I've been running the 4B version on an NPU for summarizing and classification tasks and it's been great so far.

[-]

AppearanceHeavy6724@reddit

Gemma 3 12B

There is also antislop version of Gemma, recently made by /u/_sqrkl

[-]

Phi-4 (14B)
Gemma3-12B (or its less sycophantic fine tune, Tiger-Gemma-12B-v3)
Qwen3-14B (with /nothink or manually adding <think></think> to the prompt)

[-]

Gemma 12b (?)

I am unaware of any other ones I’d like to know as well