Why no talk about Medium (size) Language Models? 70-200B

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 11 comments

People here brought SLM topic time to time(Ex: Is SLM the future?). But never seen anyone brought Medium (size) Language Model.

The definition of both SLM(Small Language Model) & MLM(Medium Language Model) changes over the time. Right now some already calling 20-35B models as SLMs. By this defination, I guess 70-150B(Max 200B) falls under Medium Language Models. 201-500B is Big & 501B-1T+ is Large Models.

List of Medium (size) Language Models(Popular & Recent ones from HF):

LongCat-Flash-Lite
Llama-3.3-70B-Instruct
LongCat-Next
Qwen3-Next-80B-A3B-Instruct
Qwen3-Next-80B-A3B-Thinking
Qwen3-Coder-Next
Solar-Open-100B
Ling-flash-2.0
Ring-flash-2.0
LLaDA2.1-flash
sarvam-105b
Llama-4-Scout-17B-16E-Instruct
GLM-4.5-Air
Leanstral-2603
Mistral-Small-4-119B-2603
gpt-oss-120b
Qwen3.5-122B-A10B
NVIDIA-Nemotron-3-Super-120B-A12B
Mistral-Large-Instruct-2411
Devstral-2-123B-Instruct-2512
Mixtral-8x22B-Instruct-v0.1
dots.llm1.inst
Step-3.5-Flash

Only Llama-3.2-90B there in 80-100B range.

Only Mixtral-8x22B there in 126-150B range.

Only Step-3.5-Flash there in 150-200B range. 150B is a good size, Q4 comes in 75GB which is good for 64/72/80GB VRAM.

Model creators could consider the above ranges for their upcoming medium size models.

I think many would prefer to see more new Medium (size) Language Models(70-200B) than Large 1T models. Like people who's with 96GB VRAM(4x 3090s or 3x 4090s) could run 200B models @ Q4 with Offloading(System RAM), -ncmoe, etc.,

(BTW I didn't forget models like MiniMax-M2.5, Qwen3-235B-A22B & Qwen3.5-397B .... Those falls under Big category, maybe separate thread is better for that. or MiniMax-M2.5 & Qwen3-235B-A22B belong to above list as it's sitting near to 200B range?)

(Previously I wished for more tiny/small models as my current laptop has only 8GB VRAM. But soon I'm getting new rig with 72-96GB VRAM so now expecting more medium size models)

So what are your expectations from Model creators on upcoming models?

[-]

uti24@reddit

Llama-3.3-70B-Instruct

It's like.. so so?

Qwen3-Next-80B-A3B-Instruct
Qwen3-Next-80B-A3B-Thinking

Most MOE models with big overall parameters count act like small dense models, so this model acts more like 15B parameter model.

Falcon 180B

Lets talk real deal!

[-]

pmttyji@reddit (OP)

I think I didn't get Falcon due to License filter. I guess same with Cohere Command models.

[-]

uti24@reddit

I mean, Falcon 180B is old anyways. But was the first model ever that worked locally somewhat like GPT-3.5

[-]

pmttyji@reddit (OP)

I came to LLM thing last year start only so I have no idea of 2024 & prior years of LLM timeline. I remember I downloaded Deepseek R1's distill models(7B, 14B) first. Still I was slow at that time.

[-]

llama-impersonator@reddit

this is a more popular category now with GLM 4.5-Air and gpt-oss-120b probably inspiring qwen 3.5-122b and the nemotron super to have the size they have, but there's always been a long tail in the distribution where the amount of people who can run a 30b model greatly exceeds the amount of people who can run a 120b model. cloud users mostly talk about the top models as they don't have hw limitations, so you got the U shaped engagement curve. but i think us local users are way more excited about future 150-400b models than a 1T deepseek i can't run.

[-]

pmttyji@reddit (OP)

Agree on the ratio of Users of 30B models vs 120B models. Myself currently belongs to 30B club.

but i think us local users are way more excited about future 150-400b models than a 1T deepseek i can't run.

Yep, same. I'm getting new rig with 72-96GB VRAM + 128GB DDR5 RAM for now. Later I'll add additional 128GB RAM & also additional GPU(24-32GB) so I could load 400B models @ Q4 in future.

[-]

lemondrops9@reddit

ummm Llama 3.3 is quite old and Llama 4 was not popular. Mixtral 8x22 also old as most models are moe or dense now.

[-]

pmttyji@reddit (OP)

I replied to other comment about this. Even with old models, we have only around 25 models in 70-200B range. It would be great to have 25 new models in this range this year.

[-]

SeymourBits@reddit

More of these should arrive soon as they can fit on a dual Spark setup! Personally, I consider any model over 10B to be large, and any model over 100B, giant (GLM).

[-]

RedParaglider@reddit

I mean I talk about them because I can run them, but the fact is most people are unable to run a 27b, so it's pretty obvious. When qwen did that poll asking what size of models they should make I was really impressed that 120's came in at 20 percent.

With the rampocolypse probably lasting until mid 2027 I doubt that will change. I hope all my gamer and LLM Bros can get the hardware they need but I would probably tell people not to buy right now unless they really have to.

[-]

pmttyji@reddit (OP)

I mean I talk about them because I can run them, but the fact is most people are unable to run a 27b, so it's pretty obvious.

Till now, I'm in same club as my current laptop has only 8GB VRAM. I get it what you're saying. But my point is people talk more about small models like you mentioned & large models like Kimi-K2.5, Deepseek-R1/V3, GLM-5.1. But didn't see much on Middle ground. Individually models like GLM-4.5-Air, GPT-OSS-120B, Qwen3-Coder-Next, Qwen3.5-120B, MiniMax-M2.5, Qwen3-235B-A22B & Qwen3.5-397B got good feedbacks separately.

Maybe threads like this could help Model creators to create models in mentioned range.