Best current dense, nonthinking models in the 8b-14b range?
Posted by Priceless_Pennies@reddit | LocalLLaMA | View on Reddit | 23 comments
It seems like a lot of the state of the art open models that are being released are either MoE models or Thinking models.
I understand that these are useful ways to improve performance, but with my setup I'm looking for models that don't have these characteristics. I was wondering what recommendations you guys have?
Thanks!
our_sole@reddit
I've had good text summarization results with command-r7b from Cohere along with Ollama.
smirkishere@reddit
What do you need? Let me train you one.
noctrex@reddit
Some of the newer ones:
Granite-4.0-H-Tiny is 7B: https://huggingface.co/unsloth/granite-4.0-h-tiny-GGUF
Apertus-8B-Instruct-2509: https://huggingface.co/unsloth/Apertus-8B-Instruct-2509-GGUF
LFM2-8B-A1B: https://huggingface.co/unsloth/LFM2-8B-A1B-GGUF
Falcon-H1-7B-Instruct: https://huggingface.co/unsloth/Falcon-H1-7B-Instruct-GGUF
gemma-3n-E4B: https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF
TheManicProgrammer@reddit
Heads up but I think LFM2 is an MoE?
noctrex@reddit
Yes, as its telling from the name: 8B parameters, A1B: active 1B parameters
The same for Granite-4.0-H-Tiny, its also MoE, 1B active of 7B.
Is that a problem?
Those MoE models will run pretty fast even on CPU alone or Mobile devices.
TheManicProgrammer@reddit
Sure, but I believe the request was for a dense non-thinking model wasn't it?
noctrex@reddit
oh..oups, missed that somehow
-Ellary-@reddit
gemma-3-12b-it-q4_0_s - creative tasks and general tasks
gpt-oss-20b-Q8_0 - cuz work fast even on cpu, working tasks
Qwen3-30B-A3B-Instruct-2507-Q6_K - cuz work fast even on cpu, working tasks and general tasks
Llama-3.1-SuperNova-Lite-Q6_K - fast, smart for light tasks
MN-12B-Mag-Mell-R1.Q6_K - RP tasks and general tasks
NemoMix-Unleashed-12B-Q6_K - RP tasks and general tasks
phi-4-Q5_K_M - working tasks / JSON tasks
Qwen3-4B-Instruct-2507-Q6_K - insanely fast, smart for light tasks
Qwen3-14B - working tasks and general tasks
This list covers most of possible use cases.
RobotRobotWhatDoUSee@reddit
What is your use case?
As noted by others, these two can be quite good for tier size:
Both are dense and non reasoning.
SkyFeistyLlama8@reddit
Gemma 3 12B, pretty much. Granite 4.0 7B is also supposed to be good but I haven't tried it yet. I've been running the 4B version on an NPU for summarizing and classification tasks and it's been great so far.
AppearanceHeavy6724@reddit
There is also antislop version of Gemma, recently made by /u/_sqrkl
GreenHell@reddit
You've mentioned this in 2 comments, but aren't mentioning the actual model and I can't find it on the user's page. Care to share the model name or link?
AppearanceHeavy6724@reddit
https://old.reddit.com/r/LocalLLaMA/comments/1oepfug/antislop_a_comprehensive_framework_for/nl946h3/
rorowhat@reddit
I lose dense models and cannot lie
dubesor86@reddit
I test a lot of models and allow for decent filtering, so this might be helpful to find specific size limits without thinking (use buttons on right): https://dubesor.de/benchtable#openmodels15b
ttkciar@reddit
"Best" depends on what you want to use it for.
Some good ones:
Phi-4 (14B)
Gemma3-12B (or its less sycophantic fine tune, Tiger-Gemma-12B-v3)
Qwen3-14B (with
/nothinkor manually adding<think></think>to the prompt)AppearanceHeavy6724@reddit
There is also antislop version of Gemma, recently made by /u/_sqrkl. Writes in very natural style.
Double_Cause4609@reddit
For what usecase?
dash_bro@reddit
Qwen3 has a 14B Use it with the /no_think argument
usernameplshere@reddit
Phi 4
Feztopia@reddit
Yuma42/Llama3.1-DeepDilemma-V1-8B is what I use for 8b I don't know about bigger ones. Just use the standard llama chat template and it won't use thinking.
ForsookComparison@reddit
Qwen3 14B with "/no_think" in the system prompt.
Adventurous-Gold6413@reddit
Older but they work
(with /no_think in sys prompt) Qwen 3 8b Qwen 3 14b
Gemma 12b (?)
I am unaware of any other ones I’d like to know as well