Why no talk about Medium (size) Language Models? 70-200B
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 11 comments
People here brought SLM topic time to time(Ex: Is SLM the future?). But never seen anyone brought Medium (size) Language Model.
The definition of both SLM(Small Language Model) & MLM(Medium Language Model) changes over the time. Right now some already calling 20-35B models as SLMs. By this defination, I guess 70-150B(Max 200B) falls under Medium Language Models. 201-500B is Big & 501B-1T+ is Large Models.
List of Medium (size) Language Models(Popular & Recent ones from HF):
- LongCat-Flash-Lite
- Llama-3.3-70B-Instruct
- LongCat-Next
- Qwen3-Next-80B-A3B-Instruct
- Qwen3-Next-80B-A3B-Thinking
- Qwen3-Coder-Next
- Solar-Open-100B
- Ling-flash-2.0
- Ring-flash-2.0
- LLaDA2.1-flash
- sarvam-105b
- Llama-4-Scout-17B-16E-Instruct
- GLM-4.5-Air
- Leanstral-2603
- Mistral-Small-4-119B-2603
- gpt-oss-120b
- Qwen3.5-122B-A10B
- NVIDIA-Nemotron-3-Super-120B-A12B
- Mistral-Large-Instruct-2411
- Devstral-2-123B-Instruct-2512
- Mixtral-8x22B-Instruct-v0.1
- dots.llm1.inst
- Step-3.5-Flash
Only Llama-3.2-90B there in 80-100B range.
Only Mixtral-8x22B there in 126-150B range.
Only Step-3.5-Flash there in 150-200B range. 150B is a good size, Q4 comes in 75GB which is good for 64/72/80GB VRAM.
Model creators could consider the above ranges for their upcoming medium size models.
I think many would prefer to see more new Medium (size) Language Models(70-200B) than Large 1T models. Like people who's with 96GB VRAM(4x 3090s or 3x 4090s) could run 200B models @ Q4 with Offloading(System RAM), -ncmoe, etc.,
(BTW I didn't forget models like MiniMax-M2.5, Qwen3-235B-A22B & Qwen3.5-397B .... Those falls under Big category, maybe separate thread is better for that. or MiniMax-M2.5 & Qwen3-235B-A22B belong to above list as it's sitting near to 200B range?)
(Previously I wished for more tiny/small models as my current laptop has only 8GB VRAM. But soon I'm getting new rig with 72-96GB VRAM so now expecting more medium size models)
So what are your expectations from Model creators on upcoming models?
uti24@reddit
It's like.. so so?
Most MOE models with big overall parameters count act like small dense models, so this model acts more like 15B parameter model.
Lets talk real deal!
pmttyji@reddit (OP)
I think I didn't get Falcon due to License filter. I guess same with Cohere Command models.
uti24@reddit
I mean, Falcon 180B is old anyways. But was the first model ever that worked locally somewhat like GPT-3.5
pmttyji@reddit (OP)
I came to LLM thing last year start only so I have no idea of 2024 & prior years of LLM timeline. I remember I downloaded Deepseek R1's distill models(7B, 14B) first. Still I was slow at that time.
llama-impersonator@reddit
this is a more popular category now with GLM 4.5-Air and gpt-oss-120b probably inspiring qwen 3.5-122b and the nemotron super to have the size they have, but there's always been a long tail in the distribution where the amount of people who can run a 30b model greatly exceeds the amount of people who can run a 120b model. cloud users mostly talk about the top models as they don't have hw limitations, so you got the U shaped engagement curve. but i think us local users are way more excited about future 150-400b models than a 1T deepseek i can't run.
pmttyji@reddit (OP)
Agree on the ratio of Users of 30B models vs 120B models. Myself currently belongs to 30B club.
Yep, same. I'm getting new rig with 72-96GB VRAM + 128GB DDR5 RAM for now. Later I'll add additional 128GB RAM & also additional GPU(24-32GB) so I could load 400B models @ Q4 in future.
lemondrops9@reddit
ummm Llama 3.3 is quite old and Llama 4 was not popular. Mixtral 8x22 also old as most models are moe or dense now.
pmttyji@reddit (OP)
I replied to other comment about this. Even with old models, we have only around 25 models in 70-200B range. It would be great to have 25 new models in this range this year.
SeymourBits@reddit
More of these should arrive soon as they can fit on a dual Spark setup! Personally, I consider any model over 10B to be large, and any model over 100B, giant (GLM).
RedParaglider@reddit
I mean I talk about them because I can run them, but the fact is most people are unable to run a 27b, so it's pretty obvious. When qwen did that poll asking what size of models they should make I was really impressed that 120's came in at 20 percent.
With the rampocolypse probably lasting until mid 2027 I doubt that will change. I hope all my gamer and LLM Bros can get the hardware they need but I would probably tell people not to buy right now unless they really have to.
pmttyji@reddit (OP)
Till now, I'm in same club as my current laptop has only 8GB VRAM. I get it what you're saying. But my point is people talk more about small models like you mentioned & large models like Kimi-K2.5, Deepseek-R1/V3, GLM-5.1. But didn't see much on Middle ground. Individually models like GLM-4.5-Air, GPT-OSS-120B, Qwen3-Coder-Next, Qwen3.5-120B, MiniMax-M2.5, Qwen3-235B-A22B & Qwen3.5-397B got good feedbacks separately.
Maybe threads like this could help Model creators to create models in mentioned range.