Qwen is cooking hard
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 211 comments
I am waiting for 122B and new 27B
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 211 comments
I am waiting for 122B and new 27B
Divniy@reddit
Am I the only one annoyed by such posts? Nothing happened yet. Release is the news, twitter vagueposting isn't.
harpysichordist@reddit
These posts - especially ones for Qwen - are obviously botted up. Hundreds of upvotes for literally a post about nothing. "Something might be happening soon!" - hundreds of upvotes.
jacek2023@reddit (OP)
I am annoyed by posts about cloud prices on this sub, but many people upvote them. You can downvote mine and ignore the discussion if you are not interested.
a_beautiful_rhind@reddit
Hey you're the one who says people aren't running local models here. Maybe you're right. For the most part it's cheering about what sounds "good".
0-0x0@reddit
I'm in the minority that can't make use of the 27B model and I'm hoping for 9B, 122B, and a better 35B(if that's possible)
GCoderDCoder@reddit
You want a 9b or a 122b? Those are very different lol. What hardware do you have for the 122b? If you have a unified memory device that didn't like 27b dense then try the 27b with mtp which doubles the speed. Unsloth has versions that run with unsloth studio now I think. That's probably the easiest to run and manage.
relmny@reddit
9b for the phone. We already have 3.6 for computers.
RedParaglider@reddit
I wouldn't mind either of those either. I have a strix halo, and a few machines with 8gb cards 😄
GCoderDCoder@reddit
https://youtu.be/MI0Pm1d6YF4
Follow this guy on strix halo stuff. Sounds like even AMD is working with him now directly
Economy-Register97@reddit
I can vouch for that. Currently in a long eval run. Preliminary results is netting around 80 t/s up from 40-50 on strix halo.
big_ange_postecoglou@reddit
How has your experience been with the Strix Halo?
RedParaglider@reddit
It's cool, it's too expensive now but I got it for under 2000. It's a fun learning machine, but it's not very fast for inference.
DeepOrangeSky@reddit
If a 120b MoE only has 10b active parameters, then from the standpoint of your GPU, it can be easier to run a 120b a10b more efficiently with your small GPU than a 27b dense if the 27b dense doesn't fit on the GPU. If a dense model only halfway fits and half spills over from a GPU, that's really bad, it'll run super slow. If a 12:1 sparsity ratio MoE, on the other hand, only 8% fits to 10% fits or so, that can still be quite good, by comparison, since it can run the active params. I think it still depends on how many channels of dram you have, as far as how good you can do the active params offloading thing with the MoE, too, but, even with just 2 channels it can still be decent for a 120b MoE I think. I'm a noob so I might have some of that wrong, but I think that's the rough idea anyway.
0-0x0@reddit
I got 16gb vram and 128gb ram, I don't have reasons to justify getting an extra GPU or a dedicated device with unified memory. I'm just hoping for a decent small sized dense model or a larger MoE, though the 35B was good enough overall
VoiceApprehensive893@reddit
the silent majority has 16gb cards and cannot use 27b well
jacek2023@reddit (OP)
The silent majority uses ChatGPT and Claude Code
Intelligent-Form6624@reddit
The silent majority doesn’t use any sort of AI
jacek2023@reddit (OP)
now make another picture with all animals
RedParaglider@reddit
I'm not the guy you replied to, but I never use image generation and wanted to see what the hell GPT would come up with.
Sofakingwetoddead@reddit
I like turtles, but no turtles included.
tvall_@reddit
why are there so many extra limbs? i thought we moved past the image models not knowing how many limbs things have. but i also dont generate images often.
Mickenfox@reddit
This is really funny.
jacek2023@reddit (OP)
the idea was that on his image are humans who don't use AI so you can add all animals to this group, then all plants, then all fungus etc
FeiX7@reddit
can you source?
Intelligent-Form6624@reddit
https://petar.com/blog-posts/ai-bubble-chart-broke-the-internet
johndeuff@reddit
Written by ai
AffectionatePlastic0@reddit
Okay, what part of silent majority have access to the internet?
Intelligent-Form6624@reddit
Yes
Orolol@reddit
The majority of people in this picture are less than 13, more than 80, or don't have internet at all
snorkelvretervreter@reddit
That but with a 7900xtx as those can still be had for cheap and have 24gb.
TamSchnow@reddit
The silent majority may not even know that running a LLM locally is possible.
BringMeTheBoreWorms@reddit
You guys are running LLMs locally!?
sloth_cowboy@reddit
cries in 64gb vram
Blizado@reddit
Some longer than ChatGPT exists.
grumd@reddit
No, the silent majority all have an RTX 6000 Pro and just don't tell anyone
Long_comment_san@reddit
Thars deep lol
loversama@reddit
That’s why they’re silent because they’re not here and they’re too busy generating images on ChatGPT..
Cool-Chemical-5629@reddit
If we are talking about the silent minority in local llama here, then at least they know that the last useable generation of ChatGPT was GPT 3.5 era when the chat had no usage limits.
jacek2023@reddit (OP)
I have no idea what are you talking about
Cool-Chemical-5629@reddit
When ChatGPT became popular, they had only GPT 3+. Later they extended their offering with GPT 4+, o3, o4, GPT 5+... In GPT 3+ era, they used to have only one model for both free and paid users, with some extra perks for paid users, but no limits overall, so you had pretty much the same experience with both free and paid plan.
Ever since they introduced new models, they also introduced usage limits and now when you reach your limit, they automatically switch you to a dumber model. Over time the limits became only more tight and the service is now practically useless for free users. So what used to be a popular free alternative to local use back in the days of a single model is no longer usable as such. From what I've read, the limits actually hit paid users as well which resulted in a wave of massive cancellations of subscriptions.
Few_Water_1457@reddit
was open ... for real
a_beautiful_rhind@reddit
Yea, we're out of the phase where all AI is free and they just burn money forever.
Hypilein@reddit
This is true for everything on the internet. The only question is if it goes to shit, becomes expensive or both.
Few_Water_1457@reddit
accurate
bigh-aus@reddit
I would love them to build ones specific for consumer gpu sizes. I wish they’d do say a 99b a20.
Due-Project-7507@reddit
I comment it nearly every day: the 27B models runs perfectly (e.g. with OpenCode) with a good IQ4_XS quant with 110k context fully in 16 GB VRAM. Use the buun-llama-cpp fork with turbo3_tcq KV cache and this model: https://www.reddit.com/r/LocalLLaMA/comments/1sy0qj5/qwen3627b_iq4_xs_full_vram_with_110k_context/
gh0stwriter1234@reddit
The barrier to entry is fairly low... get a pair of 16GB MI50s or a $1300 R9700
Factemius@reddit
35b and --cpu-moe would unlocks agentic for a lot of people, including 8-12gb cards
Silver-Champion-4846@reddit
I have 8gb ddr4
FusionX@reddit
It's possible - https://old.reddit.com/r/LocalLLaMA/comments/1sy0qj5/qwen3627b_iq4_xs_full_vram_with_110k_context/
dreamer_2142@reddit
May I ask what use case you use for 9B?
chiwawa_42@reddit
In an agentic workflow, 9Bs or less are what you delegate research, parsing and sometimes vision (also audio if that's your thing) to.
It frees up token bandwidth on GPUs running larger models for actual complex work. It saves a lot of time and power to delegate simple tasks to smaller models rather than having it all run by mid-tier ones when you're running mostly local ones.
What I'm currently testing is using 9-Bs (also trying Gemma4 E2B / E4B) to interface the agentic workflow with external models while aliasing sensitive informations and filtering out potentially harmful commands and code.
Still a WiP, but a promising one.
dreamer_2142@reddit
Cool! Thanks for the detailed answer.
MaxKruse96@reddit
im pretty sure the silent majority that doesnt have 24gb VRAM uses the 35b all day everyday, or the lesser informed people use the 4b and 9b still (because "it must fit in vram")
Costed14@reddit
Can you elaborate further on this? I have 24GB of VRAM (+32 GB DDR5) and always go for stuff that fully fits in VRAM since the generation speed is so much greater. That means I can run at most a 27B model with nothing else running, but usually 9B (or gpt oss 20B) if I need to use my PC.
Am I doing it wrong?
MaxKruse96@reddit
if your target is, idk, 50t/s, a MoE model offloaded halfway to gpu and cpu will still likely reach that.
winnen@reddit
2x 3090 and DDR4 system here (Threadripper 1950X):
Offloading to RAM isn't a great option for me. Massive bottleneck getting data from RAM to CPU to PCIe on my platform. Haven't tested it much though, so I could be wrong.
While I could buy a new system, I can't afford one with the same number of PCIe lanes and RAM quality/quantity I have now. I'm well-off, but not 'DDR5 RDIMM' well-off.
blakeman8192@reddit
With enough VRAM (whole MoE fits), PCIe bandwidth becomes totally unnecessary as you're just passing a few kilobytes of state between layers.
Source: am running DeepSeek V4 Flash, MiniMax M2.7, etc reasonably (~200+ prefill, 20-30 gen) on a busted cheap-ass Octominer + 12x CMP100-210 rig at PCIe 3.0 1x speeds
chiwawa_42@reddit
So what you're saying is that for my 4+ gfx1030 rig, I should stop splitting hairs stabilising PCIe switches and just split lanes with plain ol' bifurcation ?
blakeman8192@reddit
Yeah, you just have to accept the fact that your system will act like one gfx1030 with a ton of VRAM as opposed to a bunch of them working together - it will not be 4x faster. Make sure you offload everything to vram properly, have a CLI agent help you get llama.cpp set up.
chiwawa_42@reddit
Well, this actually suits my need : moar VRAM while still sucking not too much power per card, so I can still cook pizzas off the microwave oven. Alright then, I'm boing to dig my old ETH/XMR mining rig out of the basement, so I could add a fourth GPU for cheap. Tensor-level splitting is so unstable right now, and I'm too lazy fixing recent builds, that'll probably do the job anyway.
blakeman8192@reddit
Yep exactly, most of my GPUs sit around 50W during inference with one at a time spiking to 100-150W. The whole box uses 300-400W at idle with a model loaded and maybe 800-900W when chooching at full power. Overall I'm pretty happy with the whole setup costing less than two 3090s.
a_beautiful_rhind@reddit
If running 27b is a "minority", may as well pack it up. That mid range is where it starts getting competitive on generalist models.
chiwawa_42@reddit
Spot on. I've been playing early on with toy-sized 7-9Bs, then decided to breach the gap with Qwen3.6-27B and it made my day(s). I'm still trying out bigger models running in unsafe public platforms, such as owl-alpha, and I get as much speed and as many wrong answers as with local Qwen3.5-9B. So I've switched all my inference to local 2*gfx1030 then bought two more. It's slower but far more accurate than whatever is available on free tiers, and I don't have the extra step of dedicating another card to run a 9B to filter communications with external models.
HelloSummer99@reddit
You're not in the minority, in real world very few people rock 128GB Mac setups. 16-32GB is what most devs have still
my_name_isnt_clever@reddit
It's Macs, DGX Spark, and Halo Strix systems. I agree regular GPU is more popular, but there are plenty of us here with unified memory systems.
yeah-ok@reddit
Snoo_27681@reddit
9B! 9B! 9B!
HistoricalStrength21@reddit
I would love to see a new Qwen Model too, but whats the usecase of a 9B model? Its slower than the 35B A3B and dumber than the 27B. What am I missing?
Snoo_27681@reddit
Qwen3.5-9B is a sweet spot of model for lower end tasks that I can count on the model to do ok work and call tools. I've got a version of 9b solving a ton of firmware problems for me and doing web search and stuff.
I'd say one thing I run into even with a 64Gb laptop is I have a ton of other apps. So 35B is fast but takes like 20-30Gb before KV cache. Then I have 4-6 parallel claude code sessions doing who knows what. And mixed in there is a lot of image building. So even with 64Gb space I'm not comfortably running the model and working uninhibited.
9B I can work uninhibited and offload real tasks for it, especially if you make good tools for your usual tasks.
DeepOrangeSky@reddit
Not everyone has 32GB of DRAM. Lots of people have macs that have 16GB of unified memory (the base amount for the mac minis, mac laptops, etc that tons of people use). So they can run the 9b, but not the 35b. Even though its active parameters would be fine on the mac, the inactive (total parameters) is too big to fit in memory at all, and would go into memory swap, or need to be set up to stream from NVMe, which would be brutal compared to the 9b that fits in memory.
Plus I guess there are also tons of people with just normie laptops and normie desktops that only have 16GB DRAM and just some crappy igpu, although not sure how relevant that scenario is since presumably those suck at running practically any AI (not sure, never tried on one of those). But maybe it matters for those too in terms of terrible vs ultra terrible or something.
chiwawa_42@reddit
Streaming to NVMe could look like a nice option, but it's deadly. You'll ruin your gear in just a few months if left unchecked. Also oMLX is shitty at best as of yesterday's releases. It's soo unstable it crashed my agentic setup while working on a live engine migration on other nodes and failed so hard I had to rebuild everything from scratch over last week-end. So big nope, oMLX isn't useable yet and definitely don't use your SSD as swap over an MoE not fitting into UMA, you'll ruin it sooner you might think.
sylverCode@reddit
9B is faster than 35B on 8GB/16GB VRAM since it can fit entirely onto the GPU. Prefill speed also suffers a lot ln the 35B since you have to offload it to RAM
HistoricalStrength21@reddit
Okay, nice. Do you feel any difference in the quality of the answers? Can a 9B model be good for coding? Is it good for generally answering questions? Thanks in advance.
Long_comment_san@reddit
The benefit of 9b-12b range can be seen on HF. Qwen 9b has literally thousands of finetunes where 35ba3b doesnt have even hundreds to my memory. Personally I think 9b is a relatively stupid choice because 12b is just so much better. If you can't do 9b locally, you'd go to 4b anyway. If you can do 9b, you can 100% do 12b with a bit smaller quant. It's as if Qwen explicitly targets 12gb vram where they should target 16gb vram
a_beautiful_rhind@reddit
MoE is notoriously hard for regular people to finetune.
sylverCode@reddit
I've been using one of the variants of Qwopus 9B coder for coding at 262K context and it's been quite decent. Qwopus is fine tuned for agentic coding. It's been running alongside 35B for reviews since I have some leftover VRAM to fit both, with 35B experts offloaded to RAM with --cpu-moe flag
while-1-fork@reddit
3 times the factorial of 9 billion may be kinda large.
OMG_IM_A_GIRL@reddit
This would be (9B!)^3. That’s something like 10^235,000,000,000.
Cool-Chemical-5629@reddit
3 x 9B = 27B. I guess they can do that for you...
Nahxiee@reddit
I wonder how the new Qwen models will be for writing 🤔
NoobMLDude@reddit
As usual Qwen is cooking hard.
I wonder how big is the Qwen team. Their release frequency is insane.
Confident-Aerie-6222@reddit
Looking forward for improved 4b and 9b models for my potato laptop
Wildnimal@reddit
If you can run 9B then 35A3B should run with ease.
cibernox@reddit
That was not true when i had an eGPU. Either the model fit entirely in memory or performance was atrocious
ithilelda@reddit
finger crossed🤞
Separate-Antelope188@reddit
Potatoes crossed.
Valuable_Relation634@reddit
Same. I've been holding off upgrading my homelab GPU setup hoping the efficiency gains from these new architectures would let me skip the power bill jump. My current 72B runs at 6t/s which is... usable for offline tasks but painful for anything interactive.Are you expecting the 122B to be a MoE? I've seen conflicting rumors.
HavenTerminal_com@reddit
these guys haven't taken a lunch break since 2023
j_lyf@reddit
996 Culture.
remeh@reddit
I would really love to see a Qwen 3.7 122B released, but the same person ran a poll foe 3.6 where 122B was mentioned, and we never saw it come out, so I'm a little worried that it might never happen...
the-username-is-here@reddit
Sad, but true. It doesn't feel like they will be releasing 122+ models any more (hope i'm wrong). 35B is genuinely good, but 122b is still smarter and can be run on low-end hardware.
Would eat huge chunk out of their model hosting business, with all them openrouter providers.
GiGiGus@reddit
Well, I wouldn't call a 64GB RAM + 16GB VRAM a "low-end hardware", I mean, if we compare it to local millionaires with RTX 6000s, then yes, it is indeed low end.
derekp7@reddit
But the 122b-a10b is perfect for something like strix halo (faster than cpu-only compute, slower than gpu, so the MoE makes up for that). And you can't compare the cost of a strix halo MB with the cost of a dedicated GPU, as you get a whole workstation class computer out of it too, so it is multi purpose (when not doing LLM inference tasks, I can spin up a farm of VMs or other ram-hungry tasks).
the-username-is-here@reddit
I run 122b on Spark, next to several containers with Postgres and embedding models (had to memory-manage the shit out it thougn). 50 TPS all the way, baby!
It's waaaay slower that 6K would do of course, but still half the price of 6K alone and probably one third of a workstation.
Swimming-Chip9582@reddit
what quant and setup do you have on your spark? ive got a couple at work to play around with but a bit unsure whats the best for agentic stuff atm - just got qwen3.6 fp8 a3b wired up which is pretty great
the-username-is-here@reddit
These days Spark or something comparable is "low-end". Sadly.
Swimming-Chip9582@reddit
it is indeed low-end - i get reminded every time i see recommended specs on huggingface say models need "a few b200s" 😭
RayHell666@reddit
Yeah same happened with Wan 2.5 and now Qwen-Image 2.0 in the image/video sphere. Alibaba is abandoning Open Source slowly.
ManySugar5156@reddit
lol same, 122B and that new 27B feel like theyre gonna be the real deal. hope it don’t take forever to drop weights.
korino11@reddit
3.7 some kind as 48B A6B will be nice
SnooPeripherals5499@reddit
Yesss please. Everyone stuck at A3B which is sad, A8b for 48 or 70B would be a dream
OMG_IM_A_GIRL@reddit
70B A8b or A10b would be so goddamned amazing on MacBook Pro M5Max. Even the 64GB model would support Q4.
redditscraperbot2@reddit
I wonder if it will be cooking my GPU soon.
Clean_Hyena7172@reddit
I doubt it will be open-source
Borkato@reddit
Remindme! One month
Clean_Hyena7172@reddit
lol they still haven't open-sourced qwen3.6 max or plus so one month for 3.7 to be open-sourced seems optimistic to say the least.
Borkato@reddit
You specifically meant max/plus?
Clean_Hyena7172@reddit
well yeah, that's what the screenshot in the post is talking about? Were you guys talking about the smaller models?
RemindMeBot@reddit
I will be messaging you in 1 month on 2026-06-19 17:11:51 UTC to remind you of this link
CLICK THIS LINK to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) ^(delete this message to hide from others.)
Dany0@reddit
give it between 3-6 weeks. this time no one is recovering from chinese new year, but also it took them a month last time
OriginalPlayerHater@reddit
I honestly don't care anymore.
steny007@reddit
Yet, you care to tell us.
OriginalPlayerHater@reddit
you had to know, otherwise i'd feel incomplete
jacek2023@reddit (OP)
what do you use?
OriginalPlayerHater@reddit
I'm just gonna leave the sub I think. I use Claude through the free web interface and a paid copilot subscription. Sometimes I'll use google api's for my projects.
Local LLMs are great, they are important they are fun. I just don't give a shit about the OH MY GOD hype train of the models themselves. I did and now I don't.
I'm not going to spend 10k on a local setup, I'm not going to burn out my current GPU with inference tasks, and as neat as it is to have these models do their thing, it makes no worldly difference to me.
Anyways have fun with it
tictactoehunter@reddit
Hard cooking is not always everyone's dish...
Ifihadanameofme@reddit
Not because I want it to run on my 6gigs of ~~gpu poor~~ slum poor vram card, But because I think they can probably break the internet by releasing a MOE smaller than the 35B-a3b but better than qwen3.6 MOE .
That model is already so freaking good and "fast enough" (and even faster with MTP now) .
szansky@reddit
What are the difference between 3.7 and 3.6?
madhan4u@reddit
According to Qwen3.7-Max
danihend@reddit
According to my neighbor's cat qwen 3.5 doesn't exist either. Fuck him though.
switchbanned@reddit
wow what a pussy
danihend@reddit
Agreed 👍
Paradigmind@reddit
About 0.1
Altruistic-Dust-2565@reddit
Not "about", exactly 0.1 unless you encounter floating point errors
Nyghtbynger@reddit
oh yeah.
if x - y < 0.000001 { return true; }
Good old days
szansky@reddit
nice but for programmers we know some changes / differences?
initalSlide@reddit
Best answer
Brief-Effect9065@reddit
now it can make working 3d cubic rubik
TurnOffAutoCorrect@reddit
They're waiting for Google IO to happen later today and then upstage them, aren't they?
BrewHog@reddit
Considering it took like a year to go from Gemma 3 to Gemma 4, I doubt Google will release another version of Gemma
DinoAmino@reddit
Exactly. Maybe another Gemma 4 model though. Given it took a year v4 was all done up from scratch, yeah? Qwen 3 came out a year ago. Are all qwen point releases using the same v3 base models, but with new post trainings? Same training cutoff across the board?
BrewHog@reddit
That would be most welcome
nacholunchable@reddit
Unless what they're "cooking" is 122b or 80b.. Im starting to get the feeling the team swap really did change things despite their public reassurance.
ego100trique@reddit
If they manage to greatly improve 9b or get an in-between 9 and 27 for the mass that performs marginally worst than 27b that would really be huge for them tbh
switchbanned@reddit
RIP 14b models
cafedude@reddit
Hopefully people on X are asking Chujie about the 122B. Multiple times. Every day.
QuackerEnte@reddit
Am I the only one who wants 80B-A3B MoE size?
SnooPeripherals5499@reddit
No more A3B please. Bring back 70B but with A8B
QuackerEnte@reddit
I would like A5B too Because I'm VRAM limited (at home) and my normal RAM is slow, 3200MT/s only
hesperaux@reddit
You're not the only one. That would be a perfect size for me. But that could only be qwen 4 since the param sizes they've trained are fixed. You won't see a qwen 3.x at 80b unless they've been severely training one all along.
jacek2023@reddit (OP)
No, I would love this size too
vogelvogelvogelvogel@reddit
tbf Qwen and DeepSeek resulted in an improvement of my views about China, not that they were like bad but still. I am very thankful they are open weighting their models
Dramatic_Entry_3830@reddit
Qwen especially. I also think the Chinese models think differently because they trained excessively on mandarin content. And that alters it's thinking to the better.
vogelvogelvogelvogel@reddit
why the downvotes, it is no new finding that language influences thinking.
Silver-Champion-4846@reddit
It does. Slightly off-topic, but the renaissance that happened to my language had two sides, one sought consolidating and drawing from heritage while the other wanted to drink from the west's whell. The second side won and now the majority of writers think in English/French even if they don't know the words and grammar because it's just embedded in the dominant institutional lingo. Also that makes western models better capable of learning that westernized Arabic, and if you discuss it with them a certain way they'll lean hard on how English works even when you tell them to be 'classical', or when they're trying to explain the difference between 'classical' and 'modern' Arabic. Point is, it's just my experience regarding that statement
Legitimate-Pumpkin@reddit
You would be surprised how different china is from what we are told in the news…
And how different we are too, btw.
vogelvogelvogelvogel@reddit
besides the good impression regarding open weights models i got i took that as a starting point to dig deeper in forums and youtube and so on, expats living in china etc..
also i worked with colleagues from china, always a good experience.
Hour_Bit_5183@reddit
How do you cook hard? Oh yeah they are making breaking bad stuff. Probably KET
while-1-fork@reddit
You get a hardon , then start cooking while keeping it hard.
Miserable-Dare5090@reddit
radeon hardon lfg
switchbanned@reddit
I hope the crockpot set to high and not low.
WebOsmotic_official@reddit
qwen is cooking so hard the local crowd is already negotiating with their GPU temps.
everyone wants the 122B until the download finishes and the fans start speaking in tongues.
iaNCURdehunedoara@reddit
Qwen is cooking but i wish they hadn't removed the free 2000 requests on their qwen-coder. It was extremely fun to play around with.
jacek2023@reddit (OP)
local LLMs are free
iaNCURdehunedoara@reddit
They're not free if you don't have the hardware. I can't run a decent LLM on 4060TI unfortunately.
MerePotato@reddit
Yes you can, Gemma 26BA4B Q8 with MoE layer offloading
jacek2023@reddit (OP)
but this is r/LocalLLaMA
a_beautiful_rhind@reddit
My electric says otherwise:
alphapussycat@reddit
But these, 3.7 are probably not open weight.
jacek2023@reddit (OP)
We are waiting for open releases
alphapussycat@reddit
But they've said a month ago that they won't be doing open weights anymore, and are going to focus on monetization.
I hope I'm wrong, but it's like expecting another gpt oss because gpt went from 5.4 to 5.5.
jacek2023@reddit (OP)
Could you show the exact quote?
alphapussycat@reddit
You can find it by just googling. The new leader is starting some token section thing, don't remember what it's called, and the new direction is monetization.
They do say they'll keep the spirit of open weights to attract/keep people interested in qwen. But realistically, releasing open weights works against themselves.
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
Kahvana@reddit
If that cooking results in local runnable models for all previous released ranges, then im interested.
AmoebaDue6638@reddit
The new 27B is what I'm most excited about. If it matches the jump from Qwen2 to Qwen2.5, it'll be the best model you can run on a single 3090.
Due_Ebb_3245@reddit
I want 9b please
Journeyj012@reddit
0.6B for the pre-release hype. just would be funny.
IndianITCell@reddit
Claude is crying in the corner xD
Long_comment_san@reddit
I hope they make 12b instead of 9b eventually. 12b is much more smart than 9b.
Long_comment_san@reddit
This avatar fits amazingly well
Divyansh3021@reddit
I have been using Qwen 3.6 for past few days and I am genuinely impressed by its performance.
PromptInjection_@reddit
Hm let's see if we get open models.
datbackup@reddit
Remember top qwen bro left and ppl were in hysterics, then we got 3.6 two of the best local models ever?
Maybe i’ll eat these words but qwen team looking good still
Better-Struggle9958@reddit
qwen marketing is hard, fully bots
jacek2023@reddit (OP)
I am fighting bots and "Chinese hype" but this is an important info, Qwen 3.7 for local setups will be a big milestone
Better-Struggle9958@reddit
3.6 was just fix for 3.5
jacek2023@reddit (OP)
What models do you use locally each day?
Better-Struggle9958@reddit
gemma4
jacek2023@reddit (OP)
I use both gemma4 and qwen3.6, I don't think 3.6 was just a "fix", I would like to see 4.1 from Google
Better-Struggle9958@reddit
I use my own benchmark for C++ projects, if model is better, she is better, 3.6 was slightly better 3.5, gemma4 anyway better in my benchmarks. BUT I don't post everyday 100 posts with OMG QWEN\GEMMA is amazing
RuthlessCriticismAll@reddit
I see tons of people shilling Gemma models which convinced me to try them again and I was disappointed once more like every time in the past. Meh, they always underperform.
Better-Struggle9958@reddit
Everything is relative of course
jacek2023@reddit (OP)
How do you run your benchmarks? I am working on Python project with pi and local models (at the same time I use Claude Code for C++ project).
Better-Struggle9958@reddit
I am the creator of https://github.com/Palm1r/QodeAssist and https://github.com/Palm1r/llmqore and has internal mvp where agents generate code by task. For example creating Qt application with certain feature or C++ code with some some algo, or Qt framework. And comparing result in the end
jacek2023@reddit (OP)
are you somehow related to Qt?
Better-Struggle9958@reddit
no, just develop in freetime
jacek2023@reddit (OP)
I remember Qt was adding some AI stuff, looks like it is also called Assistant https://doc.qt.io/qtcreator/creator-qtaiassistant.html
Better-Struggle9958@reddit
this is only for commercial license
Legitimate-Pumpkin@reddit
Well, if an AI company makes bots that work… that’s part of the marketing too, no? 🤭
Better-Struggle9958@reddit
marketing of what ? creating useless things?
Limp_Classroom_2645@reddit
Cook my GPU with some open weights instead please
Steus_au@reddit
skipping 3.6-122b and 397 shows like they are limiting/segregating releases now: toys for 'babies' (35/27) are free, the heavy lifting to APIs only.
sathi006@reddit
4B, 9B, and 27B only please, everything else does not give bang for the buck
False-Shirt-1700@reddit
I wonder how much of a boost it'll be
AdDizzy8160@reddit
... and we'r waiting for dinner
DarkArtsMastery@reddit
Hope for something delish
Aggressive_Aspect436@reddit
I only recently got myself a second-hand 3090 for a pretty decent price. Here's hoping I'll actually be able to run it. 🤞
Major-System6752@reddit
3.6 9b? or 3.5 last stable?
crantob@reddit
Don't sleep on ByteShape quants
9.6G Qwen3-Coder-30B-A3B-Instruct-Q3_K_S-2.69bpw.gguf 11G Qwen3.5-35B-A3B-Q3_K_S-2.69bpw.gguf
Kicks butt on an office laptop
Intelligent_Ice_113@reddit
all I want is qwen3.7-35b-a3b-UD-mlx-4bit, am I asking too much?
Sabin_Stargem@reddit
A Qwen in every pot!
khronyk@reddit
I wish they would finally actually release qwen image 2.0 7B ... Or z-image edit for that matter
ComplexType568@reddit
I notice the new qwen team is now focusing on incremental updates more than big drops. Interesting change.
jacek2023@reddit (OP)
well at least no software updates are needed, I would like to see more qwen 3.x finetunes
Mean-Ad1493@reddit
When can we expect them to release open weights?
cantgetthistowork@reddit
Just give me a bigger dense model
ProfessionalSpend589@reddit
Fingers crossed for a 3.7 397B model
L0ren_B@reddit
Since I don't think we will get anything like a 122B model, I hope we get 35B moe and 27B dense☺️
recitegod@reddit
My neural port is open. I am ready.