That would be really cool. Here I migrated from Qwen3-Coder to Qwen3.5-27B, then to Qwen3.6-27B, and it was a fantastic experience. Now I pair it with 35B-A3B, to delegate some smaller tasks as it is faster.
Not that much of the general public can run a 120b model at home. The ones I know physically, also do it, but also have at least one max plan one every major cloud provider.
Someone needs to design and sell an active cooler that attaches to 14" Macbook Pros so they can run this model without sounding like a 747 cleared for takeoff.
No way, its way to slow on a DGX Spark (256GB/s Bandwidth). Without MTP its 5 tok/s, With MTP/DFLASH its like 8-12tok/s depending on task and context. Even in INT4 or NVFP4 I can only reach 30 Tok/s on a DGX Spark.
Qwen 3.5 122B Int4 Autoround can run on 50-60tok/s on Spark with DFLASH no Problem. With Concurrency that can double or tripple for total throuput depending on number of concurrent Sessions.
pentium 3 gad damm. I miss the days when upgrading speed felt like you were strapped to a rocket. then intel comes out with dual core and says 'oh you've got two eyes now so you can read a book twice as fast'
I’m only getting 20 something tokens per second on my R9700 with 3.6 27b, not a very fun experience. But 70-80 t/s on Gemma 4 26b a4b and 3.6 35b a3b is incredible. I just think that the R9700 doesn’t have enough memory bandwidth for dense models.
50-60B dense would be so damn good. Although it'd have a somewhat limited audience compared to the huge, mega sparse MoEs that people run on macs and strixes I guess.
I'd rather enjoy it, anyway. It'd be a meaningful step up from the RYS 27b I've been using.
I just made a kernel called hydra that might let you do this, it’s not done but it works and is open source, and is to push qwen one model
Step
Up. It’s a split attention multi-head kv resident and it could work for your hope.
In previous posts i see discussion when person say that qwen release new open source model of previous gen after releasing new generation of closed models. So i think 3.6 version have more chances
14-18B dense is the missing sweet spot for 16-24gb VRAM. 27B is a struggle to fit in 24GB VRAM without sacrificing the quant, KV cache or context size.
We won't be getting new models. Qwen is taking a new direction. No focus on open weight other than bare minimum to train potential new AI developers. The new direction is monetizing cloud services.
Not if they actually give us 3.7 2b 4B and 9b this time though. Then I’m fine with them skipping 3.6 since it hasn’t been that long since 3.5. I actually have a use for 3.5 4B so I really hope we get an update to that and 27B
If they're not going to be open weight then I'm still rooting for Qwen Max. I'll take any additional price pressure on the ridiculous 5.5 and Opus 4.7 rates we can get
Yes, and that one is comparable to the first ones in this test, which is amazing.
In their own post, they claim nothing else. See: https://x.com/Alibaba_Qwen/status/2056403591464984753
Below 5.2 Chat, so in the area of 5.0 release.
But again, just because some features aren't enabled doesn't mean the model can't do tool calls. It seems extremely unlikely that they don't train a model to use tools in 2026
Look, qwen has big issues in the 3.5 and 3.6 model families with tool calling and thinking mode where it adds tool call blocks inside thinking blocks causing the chat to break or requiring the runner to hack around it.
I did not make my comment out of the blue.
Folks can downvote me all they want, but it is the biggest issue they have and the way the text reads makes me think they haven't fixed it.
Whelp, Qwen 3.7 Max passed my cipher test that only a few open source models have passed without tool use. Qwen 3.5 297b was was one of the previous ones that had passed, but this one did it much faster (maybe 6-8 minutes).
Kimi 2.5 and 2.6 are the only others to have cracked it without tools.
ComfyUser48@reddit
Qwen 3.7 27b plz
Dry_Natural_3617@reddit
if they can improve yet again on current model, this would be a beast.
ComfyUser48@reddit
To be honest I'm so satisfied with 3.6 27b that I'm not that much in a hurry. It's doing everything I ask it for. Just superb.
Borkato@reddit
Same. But it would be so cool haha. Qwen 3.7 35B would be amazing
Virtamancer@reddit
MLX with MTP, the holy grail that for some reason is the last to be implemented by anyone in the entire development and support pipeline
cuberhino@reddit
What are you running it on? I have not gotten it to work well enough for me locally I have a 3090
manwithgun1234@reddit
Eh, same 3090 but I can make it to 40t/s 230k context quant 4. Just stock llamacpp unsloth with -fit on
DistanceSolar1449@reddit
Quantizing context to 4 bit makes it a lot dumber.
rainbyte@reddit
That would be really cool. Here I migrated from Qwen3-Coder to Qwen3.5-27B, then to Qwen3.6-27B, and it was a fantastic experience. Now I pair it with 35B-A3B, to delegate some smaller tasks as it is faster.
Storge2@reddit
Hopefully Qwen 3.6/3.7 122B will be open sourced. That would be one hell of a model for DGX Spark, Ryzen 395+ and Apple 128GB devices.
the-username-is-here@reddit
Hopefully, but i won't hold my breath. Would make hosted models waaaay less attractive.
psyclik@reddit
Not that much of the general public can run a 120b model at home. The ones I know physically, also do it, but also have at least one max plan one every major cloud provider.
Free-Combination-773@reddit
All you need is 24GB GPU and enough RAM for expert weights that do not fit
StardockEngineer@reddit
Very few people would still be able to run it. Even if they had the hardware, even few would know what to do.
tat_tvam_asshole@reddit
Lmstudio and others abstract that away pretty easily. I doubt many people are buying 24GB GPUs while also unaware of AI ecosystem basics.
doomadah@reddit
Not for 122 unless you mean at very low quant
cafedude@reddit
growing your own tomatoes and making your own ketchup could make Heinz sad.
Borkato@reddit
That’s why tomato seeds don’t exist!
falcongsr@reddit
Someone needs to design and sell an active cooler that attaches to 14" Macbook Pros so they can run this model without sounding like a 747 cleared for takeoff.
idkanythingabout@reddit
My 16" 128gb m5 is pretty quiet, less fan noise than my desktop with a 3090 in it.
milkipedia@reddit
OWC might be on this
Eyelbee@reddit
Just run 27B bf16 in your machine. 122B variants are underwhelming
Storge2@reddit
No way, its way to slow on a DGX Spark (256GB/s Bandwidth). Without MTP its 5 tok/s, With MTP/DFLASH its like 8-12tok/s depending on task and context. Even in INT4 or NVFP4 I can only reach 30 Tok/s on a DGX Spark.
Qwen 3.5 122B Int4 Autoround can run on 50-60tok/s on Spark with DFLASH no Problem. With Concurrency that can double or tripple for total throuput depending on number of concurrent Sessions.
g_rich@reddit
On my DGX Spark I’m getting well over 20tps with Qwen 3.6 27b at fp8 and DFlash; with Qwen 3.5 122b at NVFP4 I’m getting high teens.
the-username-is-here@reddit
Funny man.
HistoricalStrength21@reddit
Is this real?
thread-e-printing@reddit
No, it's Iowa
No_easy_money@reddit
If you build it, they will come
OsmanthusBloom@reddit
Is this just fantasy?
the-username-is-here@reddit
Caught in a model loop.
OsmanthusBloom@reddit
No escape from cloud models
the-username-is-here@reddit
Open your wallet, order Claude Max and see
OsmanthusBloom@reddit
I'm just a GPU poor guy, I need more VRAM
the-username-is-here@reddit
Because token come, token go, usage up, money low
OsmanthusBloom@reddit
Any way my GPU fan blows doesn't really matter to me, to me
the-username-is-here@reddit
Mama, just called a tool
JSON parser broke the flow, empty quotes, i don't know
OsmanthusBloom@reddit
Mama, agent had just started
But now it's gone and rm -rf'd it all away
the-username-is-here@reddit
Mama, oooh, didn't mean to nuke repo
If no back up again this time tomorrow
Project dead, project dead as if nothing really matters
mxforest@reddit
Yes.. that's the max i can tell you.
LuckyLuckierLuckest@reddit
Qwen 3.7 thinks so.
Sutanreyu@reddit
I just want 3.6 9b
UnbeliebteMeinung@reddit
I hope qwen 3.7 1b is the same quality as opus 4.8
2Norn@reddit
its too dangerous to release
Particular-Way7271@reddit
Mynthos
RainierPC@reddit
The Freshmaker
zzzthelastuser@reddit
It's what Vlara would have wanted.
thrownawaymane@reddit
The official LLM of the Teen Titans
mivog49274@reddit
No one is ready for Mynthos-9B
Firepal64@reddit
mynthos
Borkato@reddit
Azarath metrion mynthos
Warsel77@reddit
Menthos.
Zestyclose839@reddit
i can't wait to unleash a swarm of these things and watch them devour some poor github repo
Warsel77@reddit
cue "Mwahahahah"
Southern-Expert22@reddit
It found vulnerabilities in my car, that's how dangerous it is
_TheWolfOfWalmart_@reddit
Same, I've been looking for a good model to run on my Pentium 3.
Warsel77@reddit
Uh. I still have the 486 but 66Mhz. Fingers crossed.
Ascending_Valley@reddit
Digging out an old z80...
nullbyte420@reddit
Is there a cuda alternative to run on 3dfx vodoo cards?
Low88M@reddit
Otherwise there is floppy disk 5"1/4 offloading as extended memory swap
Warsel77@reddit
Well I heard you can run it using the DSP chip of a Soundblaster 16
_TheWolfOfWalmart_@reddit
They're coming out with qwen 3.7 15M for us 486 guys! Can't wait.
Boricua-vet@reddit
Dang, I missed my old dual slot1 Intel SL4BS PIII 1.0Ghz server.
philmarcracken@reddit
pentium 3 gad damm. I miss the days when upgrading speed felt like you were strapped to a rocket. then intel comes out with dual core and says 'oh you've got two eyes now so you can read a book twice as fast'
Ashraf_mahdy@reddit
At least be realistic man 4B is more like it
UnbeliebteMeinung@reddit
Ok wer q4 quants.
TheTerrasque@reddit
q2, I need to run it on my abacus
JumpyAbies@reddit
q0.5 to run on my Samsung smartwatch..
Plasmx@reddit
9B will be Mythos level.
soteko@reddit
Yeah and to Autoround quant 1bit
Luigi_Boy_96@reddit
Don't tempt me.
bakawolf123@reddit
I could live with 2B even
ForsookComparison@reddit
LinkedIn genuinely believes this is not only true, but also happens every single week.
the-username-is-here@reddit
35B is going to outrank Scamthropic Mythos, fo sho.
the-username-is-here@reddit
Nah, they'll do it in 200M.
fbms2@reddit
😂
RenewAi@reddit
same
acetaminophenpt@reddit
27b gguf please!
jonas-reddit@reddit
And then some unsloth 27B MTP please.
jd52wtf@reddit
This is all gravy.
Getting 85% of the big boy models with a Q6 with 3.6 @27B is amazing as it is. Now they are teasing more.
Yes please.
Thinking of grabbing another R9700.
The first one has been worth every penny so far.
Athabasco@reddit
I’m only getting 20 something tokens per second on my R9700 with 3.6 27b, not a very fun experience. But 70-80 t/s on Gemma 4 26b a4b and 3.6 35b a3b is incredible. I just think that the R9700 doesn’t have enough memory bandwidth for dense models.
pmttyji@reddit
With usual 3.5/3.6 series(0.8B, 2B, 4B, 9B, 27B, 35B-A3B, 122B-A10B, 397B-A17B), would like to see below additional models if possible.
fantasticsid@reddit
50-60B dense would be so damn good. Although it'd have a somewhat limited audience compared to the huge, mega sparse MoEs that people run on macs and strixes I guess.
I'd rather enjoy it, anyway. It'd be a meaningful step up from the RYS 27b I've been using.
OMG_IM_A_GIRL@reddit
80B-A5B would be perfect. Please LLaMa gods.
maschayana@reddit
I second this. 80b A5b and i squirt right here right now
OMG_IM_A_GIRL@reddit
I would do damnable things for this mode.
DinoAmino@reddit
Comment
Foxiya@reddit (OP)
Reply
LetsGoBrandon4256@reddit
Maleficent-Ad5999@reddit
LetsGoBrandon4256@reddit
Wait,
hesperaux@reddit
Actually,
LetsGoBrandon4256@reddit
the user
swagonflyyyy@reddit
Upvote.
Blutusz@reddit
Downvote (sorry)
h4ck3r_n4m3@reddit
award
swagonflyyyy@reddit
Reverse downvote. You're welcome.
theUmo@reddit
Double reverse upvote with draw 4
Xyrus2000@reddit
Skip a vote.
castrator21@reddit
Contrarian reply (as is the reddit way)
thread-e-printing@reddit
Gguf wen post
PANIC_EXCEPTION@reddit
pmttyji@reddit
With MTP
Xyrus2000@reddit
Low effort troll.
thread-e-printing@reddit
Passionate defense that misses the mark by a mile
Euphoric_Emotion5397@reddit
waiting for qwen 3.7 MOE !! :D It works flawlessly for me!
LuckyLuckierLuckest@reddit
Thinking...
UnbeliebteMeinung@reddit
The_LSD_Soundsystem@reddit
Actually, wait
droans@reddit
It's exactly what the default chat template does when you disable thinking.
SkyFeistyLlama8@reddit
How TF they doing all this? Are these models all being trained from scratch for every single release or are they working off of older checkpoints?
tarruda@reddit
The base model is likely the same across all 3.x releases.
hackiv@reddit
Just give us Qwen 3.7 9B that performs similarly to 3.5 one weight level above
prince_pringle@reddit
I just made a kernel called hydra that might let you do this, it’s not done but it works and is open source, and is to push qwen one model Step Up. It’s a split attention multi-head kv resident and it could work for your hope.
No_Swimming6548@reddit
Own_Suspect5343@reddit
Qwen 3.6 122b pls
CockBrother@reddit
Since their max model is probably 397b it looks like 122b and 397b models skipped right over 3.6 and will [hopefully] be released as open weights 3.7.
FullOf_Bad_Ideas@reddit
Plus was 397b, max was ~1T last time around. I doubt it's any different now.
CockBrother@reddit
Ah - thank you. I can settle for "Plus at home"!
spaceman_@reddit
Qwen 3.7 122B perhaps?
asquareportal@reddit
I doubt it will fit in that bowl.
Dry_Natural_3617@reddit
it will, it’s MOE, just need more bowls 🤣🤣
SanduloSandadi@reddit
It's MOB mixture of bowls.
asquareportal@reddit
Nailed it!
asquareportal@reddit
We will all come with bowls, should add up hehe
Own_Suspect5343@reddit
In previous posts i see discussion when person say that qwen release new open source model of previous gen after releasing new generation of closed models. So i think 3.6 version have more chances
MDSExpro@reddit
Yes please
Storge2@reddit
Yes
The_LSD_Soundsystem@reddit
Please fix the overthinking problems
FullOf_Bad_Ideas@reddit
I'd like to see open release of Qwen 3.7 397B A17B.
Valuable_Touch5670@reddit
GGUF wen?
Significant-Yam85@reddit
14-18B dense is the missing sweet spot for 16-24gb VRAM. 27B is a struggle to fit in 24GB VRAM without sacrificing the quant, KV cache or context size.
Valuable_Touch5670@reddit
I cannot agree more!
Opteron67@reddit
qwen 3.5 omni at least
jacek2023@reddit
these are big closed source cloud models guys, no 27B/35B/9B/122B visible yet
phenotype001@reddit
won't be a problem, I'll be expecting Qwen3.8 in the next month or so.
the-username-is-here@reddit
Thursday, I think.
alexanderi96@reddit
You mean Chewsday
XStone1974@reddit
Qwensday
superdariom@reddit
Everyday!!
Borkato@reddit
Oh that’s cute
MaxKruse96@reddit
if we gonna get qwen3.7 before 3.6 2b 4b 9b 122b we gon be sad
alphapussycat@reddit
We won't be getting new models. Qwen is taking a new direction. No focus on open weight other than bare minimum to train potential new AI developers. The new direction is monetizing cloud services.
Borkato@reddit
Then why did they say they’d continue releasing open weight models. People have been saying what you just said since qwen 2 lmao
alphapussycat@reddit
They changed leadership in April and said they'd focus on monetization.
mslindqu@reddit
Yeah, I think this is probably correct.
UnbeliebteMeinung@reddit
We must be more friendly and thankfull for the CCP. Lets praise them more. We can only hope they will hear us and help us.
Bubbly-Staff-9452@reddit
Not if they actually give us 3.7 2b 4B and 9b this time though. Then I’m fine with them skipping 3.6 since it hasn’t been that long since 3.5. I actually have a use for 3.5 4B so I really hope we get an update to that and 27B
dryadofelysium@reddit
3.6/3.7 are mostly post-trained 3.5 for better agentic usage and the <27B models are not exactly targeting this anyway so don't get your hopes up
dryadofelysium@reddit
it's official now: https://x.com/Alibaba_Qwen/status/2056403591464984753
PhysicalIncrease3@reddit
Says they can't wait to release the models also, which is good news.
alphapussycat@reddit
Release, as in making them publicly available on their hosting, not for local use.
BallsInSufficientSad@reddit
What size are these?
ForsookComparison@reddit
If they're not going to be open weight then I'm still rooting for Qwen Max. I'll take any additional price pressure on the ridiculous 5.5 and Opus 4.7 rates we can get
meca23@reddit
Have you tried deepseek v4? API Prices are reasonable and runs pretty well with pi harness.
Maddolyn@reddit
Tell me about this pi harness?
NNN_Throwaway2@reddit
Sad that they stopped open sourcing Plus.
GreedyWorking1499@reddit
Please give us some smaller models 😭 I need \~8B
serige@reddit
gguf wen?!?!!
Oswolrf@reddit
Wasn´t 3.6 the last open source Qwen model for Alibaba?
darkbit1001@reddit
Is noone running on their cloud compute node?
r4in311@reddit
Look at this wild result for a voxel pagoda-world I generated using 3.7: https://jsfiddle.net/38zvp6om/
(In short: 4 prompts total, potential GPT5 quality at home, very solid, not SOTA but close)
Caffdy@reddit
there's a big difference between the first GPT5 versions and the latest ones
r4in311@reddit
Yes, and that one is comparable to the first ones in this test, which is amazing.
In their own post, they claim nothing else. See: https://x.com/Alibaba_Qwen/status/2056403591464984753
Below 5.2 Chat, so in the area of 5.0 release.
AppealSame4367@reddit
Do you know that feeling when Christmas comes early and you didn't even expect it until 3 months later? No? Me neither, until a minute ago.
Charming-Author4877@reddit
damn it, I hope we'll get qwen 27B and smaller on 3.7
If that is a similar improvement as 3.6 was to 3.5 we are in for a wild ride
t_krett@reddit
aaaand it's gone
Foreign_Yard_8483@reddit
A good thing we enacted the embargo; I have yet to finish building my alignment bunker against their relentless march toward AGI
BodybuilderLost814@reddit
Qwen3.7 35B A3B Would that be asking too much?
Then-Topic8766@reddit
trialbuterror@reddit
Some good soul suggest coding and reasoning related heavy use qwen 3.6 models
Radeon 9060xt 16gb gram 48gb ddr4 ram 1tb SSD Ubuntu 24.04
pmttyji@reddit
Qwen3.6-27B & Qwen3.6-35B-A3B. Q4 quant. Download MTP GGUFs of those models.
mivog49274@reddit
RIP Qwen3.6-Max-Preview, didn't even have the time to pass to non-preview phase. Qwen are the crazed unstoppable superseederz.
WhyLifeIs4@reddit
Pelican on Max: took about 5 minutes
CockBrother@reddit
That's too good. They must be pelicanmaxxing.
mivog49274@reddit
Definitely pelmaxxing.
More-Curious816@reddit
always has been. all public benchmarks are in the data.
Eyelbee@reddit
They are really cooking this time
Septerium@reddit
I do not ask for much... just Qwen 3.7 Coder 122B A10B natively trained on NVFP4. Nothing more. Thanks
sparty212@reddit
Very interesting chat.
More-Curious816@reddit
try asking the American models for sensitive topics and come back here with results. don't be hypocrite.
thread-e-printing@reddit
50 cents has been deposited in your OpenAI-Palantir shill account
RickyRickC137@reddit
We want Qwen 3.7 122B vs Gemma 4 120B - battle of legendary MoEs.
awitod@reddit
"Operates exclusively in thinking mode" - for me tool calling is not a negotiable feature, and this is a dealbreaker.
Also, the qwen implementation of thinking mode is the worst thing about the model series.
sjoti@reddit
Thinking does not exclude tool calling?
awitod@reddit
It literally says it can't do search or use code interpreter and tool calling with 3.5 and 3.6 with thinking on is a big issue today.
We have is disabled in almost every use case.
Healthy-Nebula-3603@reddit
Big issue is because you using too much compressed model with probably compressed cache.
That's why has a problem with it
sjoti@reddit
But again, just because some features aren't enabled doesn't mean the model can't do tool calls. It seems extremely unlikely that they don't train a model to use tools in 2026
awitod@reddit
Look, qwen has big issues in the 3.5 and 3.6 model families with tool calling and thinking mode where it adds tool call blocks inside thinking blocks causing the chat to break or requiring the runner to hack around it.
I did not make my comment out of the blue.
Folks can downvote me all they want, but it is the biggest issue they have and the way the text reads makes me think they haven't fixed it.
Septerium@reddit
Prepare your AI prompts for generating touching requests, so we can beg Alibaba on Twitter for open-weight releases
__some__guy@reddit
Maybe this time the endless repetition while thinking will be fixed!
(I don't believe)
No_Mango7658@reddit
Need benchmarks now!!!
No_Mango7658@reddit
I am genuinely vibrating with excitement
Sabin_Stargem@reddit
I want 122b, with MTP. Qwen3.5 thinks too much and tends to stall, so having an perfected version would be much appreciated.
norsurfit@reddit
It droped!
bakawolf123@reddit
any tweets with size polls yet? =)
314kabinet@reddit
If they release an even better 27B that’ll be great.
AnticitizenPrime@reddit
Whelp, Qwen 3.7 Max passed my cipher test that only a few open source models have passed without tool use. Qwen 3.5 297b was was one of the previous ones that had passed, but this one did it much faster (maybe 6-8 minutes).
Kimi 2.5 and 2.6 are the only others to have cracked it without tools.
ea_man@reddit
Let's have a 18-22B dense model that we can run on 16GB at Q4 and 12GB at Q3 with MTD and some \~100k context.
MerePotato@reddit
Hopefully they fix the benchmark contamination 3.5 and 6 suffered from so we can more accurately appraise this one
pigeon57434@reddit
i know we literally just got 3.6-27B but im sorry im a 27B lover i want a 3.7 version too
yeah-ok@reddit
Dropped so fast it skipped a p
marscarsrars@reddit
Good
VoiceApprehensive893@reddit
qwen 3.7 9b
True_Requirement_891@reddit
It's real.
dataexception@reddit
Well, I'm going to just say thank you.
kei-ayanami@reddit
wen goofs?
Nick-Sanchez@reddit
wen gguf wen 9b wen 122b
Blues520@reddit
Oh yesss
DrBearJ3w@reddit
Qwen 3.7 9b? 🥺
dryadofelysium@reddit
didn't see it at first, but sure enough it's at expand