Would a fully open SmolLM4-750M with 16K context make sense?

Posted by Ok-Type-7663@reddit | LocalLLaMA | View on Reddit | 14 comments

I’ve been thinking about a possible gap in the current small local model space: a modern fully open \~750M model.

Hugging Face already has SmolLM2 at 135M, 360M, and 1.7B, and SmolLM3 pushes the family to 3B with long context, multilingual support, and reasoning. The Smol Models repo also describes the goal pretty clearly: fully open, compact models that can run effectively on-device while still having strong performance.

So my idea is:

SmolLM4-750M

High-level target:

I’m intentionally not suggesting exact architecture internals like layer count, FFN size, attention heads, RoPE settings, etc. Hugging Face would know better how to design that. I’m more interested in whether the size class itself makes sense.

Why 750M?

To me, it feels like a missing middle point:

Possible dataset direction:

  1. HuggingFaceTB/smollm-corpus
  2. HuggingFaceFW/fineweb-edu
  3. HuggingFaceTB/finemath
  4. HuggingFaceTB/stack-edu
  5. HuggingFaceTB/smoltalk2
  6. HuggingFaceTB/cosmopedia
  7. HuggingFaceFW/fineweb-2, Spanish subset spa_Latn
  8. open-thoughts/OpenThoughts-114k
  9. HuggingFaceTB/smol-smoltalk

The goal would not be to beat 3B models. The goal would be a very clean, open, practical sub-1B model that is stronger than ultra-tiny models and easier to run than 1.7B/3B.

Questions for r/LocalLLaMA:

Would \~750M be a useful size class, or is it too awkward between 360M and 1.7B?

Would 16K context be realistic/useful at this size?

Would you prefer this kind of model to focus on:

And what benchmarks would actually matter for a model this small?

(Note: this text was generated by GPT-5.5 Thinking. I am a human. Don't say "ai slop". Just respond questions)