Qwen is cooking hard | TheaterFire

[-]

Divniy@reddit

Am I the only one annoyed by such posts? Nothing happened yet. Release is the news, twitter vagueposting isn't.

[-]

harpysichordist@reddit

These posts - especially ones for Qwen - are obviously botted up. Hundreds of upvotes for literally a post about nothing. "Something might be happening soon!" - hundreds of upvotes.

[-]

jacek2023@reddit (OP)

I am annoyed by posts about cloud prices on this sub, but many people upvote them. You can downvote mine and ignore the discussion if you are not interested.

[-]

a_beautiful_rhind@reddit

Hey you're the one who says people aren't running local models here. Maybe you're right. For the most part it's cheering about what sounds "good".

[-]

0-0x0@reddit

I'm in the minority that can't make use of the 27B model and I'm hoping for 9B, 122B, and a better 35B(if that's possible)

[-]

You want a 9b or a 122b? Those are very different lol. What hardware do you have for the 122b? If you have a unified memory device that didn't like 27b dense then try the 27b with mtp which doubles the speed. Unsloth has versions that run with unsloth studio now I think. That's probably the easiest to run and manage.

[-]

relmny@reddit

9b for the phone. We already have 3.6 for computers.

[-]

RedParaglider@reddit

I wouldn't mind either of those either. I have a strix halo, and a few machines with 8gb cards 😄

[-]

GCoderDCoder@reddit

https://youtu.be/MI0Pm1d6YF4

Follow this guy on strix halo stuff. Sounds like even AMD is working with him now directly

[-]

Economy-Register97@reddit

I can vouch for that. Currently in a long eval run. Preliminary results is netting around 80 t/s up from 40-50 on strix halo.

[-]

big_ange_postecoglou@reddit

How has your experience been with the Strix Halo?

[-]

RedParaglider@reddit

It's cool, it's too expensive now but I got it for under 2000. It's a fun learning machine, but it's not very fast for inference.

[-]

DeepOrangeSky@reddit

If a 120b MoE only has 10b active parameters, then from the standpoint of your GPU, it can be easier to run a 120b a10b more efficiently with your small GPU than a 27b dense if the 27b dense doesn't fit on the GPU. If a dense model only halfway fits and half spills over from a GPU, that's really bad, it'll run super slow. If a 12:1 sparsity ratio MoE, on the other hand, only 8% fits to 10% fits or so, that can still be quite good, by comparison, since it can run the active params. I think it still depends on how many channels of dram you have, as far as how good you can do the active params offloading thing with the MoE, too, but, even with just 2 channels it can still be decent for a 120b MoE I think. I'm a noob so I might have some of that wrong, but I think that's the rough idea anyway.

[-]

0-0x0@reddit

I got 16gb vram and 128gb ram, I don't have reasons to justify getting an extra GPU or a dedicated device with unified memory. I'm just hoping for a decent small sized dense model or a larger MoE, though the 35B was good enough overall

[-]

VoiceApprehensive893@reddit

the silent majority has 16gb cards and cannot use 27b well

[-]

jacek2023@reddit (OP)

The silent majority uses ChatGPT and Claude Code

[-]

Intelligent-Form6624@reddit

The silent majority doesn’t use any sort of AI

[-]

jacek2023@reddit (OP)

now make another picture with all animals

[-]

RedParaglider@reddit

I'm not the guy you replied to, but I never use image generation and wanted to see what the hell GPT would come up with.

[-]

Sofakingwetoddead@reddit

I like turtles, but no turtles included.

[-]

tvall_@reddit

why are there so many extra limbs? i thought we moved past the image models not knowing how many limbs things have. but i also dont generate images often.

[-]

Mickenfox@reddit

This is really funny.

[-]

jacek2023@reddit (OP)

the idea was that on his image are humans who don't use AI so you can add all animals to this group, then all plants, then all fungus etc

[-]

FeiX7@reddit

can you source?

[-]

Intelligent-Form6624@reddit

https://petar.com/blog-posts/ai-bubble-chart-broke-the-internet

[-]

johndeuff@reddit

Written by ai

[-]

AffectionatePlastic0@reddit

Okay, what part of silent majority have access to the internet?

[-]

Intelligent-Form6624@reddit

Yes

[-]

Orolol@reddit

The majority of people in this picture are less than 13, more than 80, or don't have internet at all

[-]

snorkelvretervreter@reddit

That but with a 7900xtx as those can still be had for cheap and have 24gb.

[-]

TamSchnow@reddit

The silent majority may not even know that running a LLM locally is possible.

[-]

BringMeTheBoreWorms@reddit

You guys are running LLMs locally!?

[-]

sloth_cowboy@reddit

cries in 64gb vram

[-]

Blizado@reddit

Some longer than ChatGPT exists.

[-]

grumd@reddit

No, the silent majority all have an RTX 6000 Pro and just don't tell anyone

[-]

Long_comment_san@reddit

Thars deep lol

[-]

loversama@reddit

That’s why they’re silent because they’re not here and they’re too busy generating images on ChatGPT..

[-]

Cool-Chemical-5629@reddit

If we are talking about the silent minority in local llama here, then at least they know that the last useable generation of ChatGPT was GPT 3.5 era when the chat had no usage limits.

[-]

jacek2023@reddit (OP)

I have no idea what are you talking about

[-]

Cool-Chemical-5629@reddit

When ChatGPT became popular, they had only GPT 3+. Later they extended their offering with GPT 4+, o3, o4, GPT 5+... In GPT 3+ era, they used to have only one model for both free and paid users, with some extra perks for paid users, but no limits overall, so you had pretty much the same experience with both free and paid plan.

Ever since they introduced new models, they also introduced usage limits and now when you reach your limit, they automatically switch you to a dumber model. Over time the limits became only more tight and the service is now practically useless for free users. So what used to be a popular free alternative to local use back in the days of a single model is no longer usable as such. From what I've read, the limits actually hit paid users as well which resulted in a wave of massive cancellations of subscriptions.

[-]

Few_Water_1457@reddit

was open ... for real

[-]

a_beautiful_rhind@reddit

Yea, we're out of the phase where all AI is free and they just burn money forever.

[-]

Hypilein@reddit

This is true for everything on the internet. The only question is if it goes to shit, becomes expensive or both.

[-]

Few_Water_1457@reddit

accurate

[-]

bigh-aus@reddit

I would love them to build ones specific for consumer gpu sizes. I wish they’d do say a 99b a20.

[-]

Due-Project-7507@reddit

I comment it nearly every day: the 27B models runs perfectly (e.g. with OpenCode) with a good IQ4_XS quant with 110k context fully in 16 GB VRAM. Use the buun-llama-cpp fork with turbo3_tcq KV cache and this model: https://www.reddit.com/r/LocalLLaMA/comments/1sy0qj5/qwen3627b_iq4_xs_full_vram_with_110k_context/

[-]

gh0stwriter1234@reddit

The barrier to entry is fairly low... get a pair of 16GB MI50s or a $1300 R9700

[-]

Factemius@reddit

35b and --cpu-moe would unlocks agentic for a lot of people, including 8-12gb cards

[-]

Silver-Champion-4846@reddit

I have 8gb ddr4

[-]

FusionX@reddit

It's possible - https://old.reddit.com/r/LocalLLaMA/comments/1sy0qj5/qwen3627b_iq4_xs_full_vram_with_110k_context/

[-]

dreamer_2142@reddit

May I ask what use case you use for 9B?

[-]

chiwawa_42@reddit

In an agentic workflow, 9Bs or less are what you delegate research, parsing and sometimes vision (also audio if that's your thing) to.

It frees up token bandwidth on GPUs running larger models for actual complex work. It saves a lot of time and power to delegate simple tasks to smaller models rather than having it all run by mid-tier ones when you're running mostly local ones.

What I'm currently testing is using 9-Bs (also trying Gemma4 E2B / E4B) to interface the agentic workflow with external models while aliasing sensitive informations and filtering out potentially harmful commands and code.

Still a WiP, but a promising one.

[-]

dreamer_2142@reddit

Cool! Thanks for the detailed answer.

[-]

MaxKruse96@reddit

im pretty sure the silent majority that doesnt have 24gb VRAM uses the 35b all day everyday, or the lesser informed people use the 4b and 9b still (because "it must fit in vram")

[-]

Costed14@reddit

lesser informed people use the 4b and 9b still (because "it must fit in vram")

Can you elaborate further on this? I have 24GB of VRAM (+32 GB DDR5) and always go for stuff that fully fits in VRAM since the generation speed is so much greater. That means I can run at most a 27B model with nothing else running, but usually 9B (or gpt oss 20B) if I need to use my PC.

Am I doing it wrong?

[-]

MaxKruse96@reddit

if your target is, idk, 50t/s, a MoE model offloaded halfway to gpu and cpu will still likely reach that.

[-]

winnen@reddit

2x 3090 and DDR4 system here (Threadripper 1950X):

Offloading to RAM isn't a great option for me. Massive bottleneck getting data from RAM to CPU to PCIe on my platform. Haven't tested it much though, so I could be wrong.

While I could buy a new system, I can't afford one with the same number of PCIe lanes and RAM quality/quantity I have now. I'm well-off, but not 'DDR5 RDIMM' well-off.

[-]

blakeman8192@reddit

With enough VRAM (whole MoE fits), PCIe bandwidth becomes totally unnecessary as you're just passing a few kilobytes of state between layers.

Source: am running DeepSeek V4 Flash, MiniMax M2.7, etc reasonably (~200+ prefill, 20-30 gen) on a busted cheap-ass Octominer + 12x CMP100-210 rig at PCIe 3.0 1x speeds

[-]

chiwawa_42@reddit

So what you're saying is that for my 4+ gfx1030 rig, I should stop splitting hairs stabilising PCIe switches and just split lanes with plain ol' bifurcation ?

[-]

blakeman8192@reddit

Yeah, you just have to accept the fact that your system will act like one gfx1030 with a ton of VRAM as opposed to a bunch of them working together - it will not be 4x faster. Make sure you offload everything to vram properly, have a CLI agent help you get llama.cpp set up.

[-]

chiwawa_42@reddit

Well, this actually suits my need : moar VRAM while still sucking not too much power per card, so I can still cook pizzas off the microwave oven. Alright then, I'm boing to dig my old ETH/XMR mining rig out of the basement, so I could add a fourth GPU for cheap. Tensor-level splitting is so unstable right now, and I'm too lazy fixing recent builds, that'll probably do the job anyway.

[-]

blakeman8192@reddit

Yep exactly, most of my GPUs sit around 50W during inference with one at a time spiking to 100-150W. The whole box uses 300-400W at idle with a model loaded and maybe 800-900W when chooching at full power. Overall I'm pretty happy with the whole setup costing less than two 3090s.

[-]

a_beautiful_rhind@reddit

If running 27b is a "minority", may as well pack it up. That mid range is where it starts getting competitive on generalist models.

[-]

chiwawa_42@reddit

Spot on. I've been playing early on with toy-sized 7-9Bs, then decided to breach the gap with Qwen3.6-27B and it made my day(s). I'm still trying out bigger models running in unsafe public platforms, such as owl-alpha, and I get as much speed and as many wrong answers as with local Qwen3.5-9B. So I've switched all my inference to local 2*gfx1030 then bought two more. It's slower but far more accurate than whatever is available on free tiers, and I don't have the extra step of dedicating another card to run a 9B to filter communications with external models.

[-]

HelloSummer99@reddit

You're not in the minority, in real world very few people rock 128GB Mac setups. 16-32GB is what most devs have still

[-]

my_name_isnt_clever@reddit

It's Macs, DGX Spark, and Halo Strix systems. I agree regular GPU is more popular, but there are plenty of us here with unified memory systems.

[-]

yeah-ok@reddit

a better 35B(if that's possible) *Qwen team - hold my Baijiu for just moment"

[-]

Snoo_27681@reddit

9B! 9B! 9B!

[-]

HistoricalStrength21@reddit

I would love to see a new Qwen Model too, but whats the usecase of a 9B model? Its slower than the 35B A3B and dumber than the 27B. What am I missing?

[-]

Snoo_27681@reddit

Qwen3.5-9B is a sweet spot of model for lower end tasks that I can count on the model to do ok work and call tools. I've got a version of 9b solving a ton of firmware problems for me and doing web search and stuff.

I'd say one thing I run into even with a 64Gb laptop is I have a ton of other apps. So 35B is fast but takes like 20-30Gb before KV cache. Then I have 4-6 parallel claude code sessions doing who knows what. And mixed in there is a lot of image building. So even with 64Gb space I'm not comfortably running the model and working uninhibited.

9B I can work uninhibited and offload real tasks for it, especially if you make good tools for your usual tasks.

[-]

DeepOrangeSky@reddit

whats the usecase of a 9B model? Its slower than the 35B A3B and dumber than the 27B. What am I missing?

Not everyone has 32GB of DRAM. Lots of people have macs that have 16GB of unified memory (the base amount for the mac minis, mac laptops, etc that tons of people use). So they can run the 9b, but not the 35b. Even though its active parameters would be fine on the mac, the inactive (total parameters) is too big to fit in memory at all, and would go into memory swap, or need to be set up to stream from NVMe, which would be brutal compared to the 9b that fits in memory.

Plus I guess there are also tons of people with just normie laptops and normie desktops that only have 16GB DRAM and just some crappy igpu, although not sure how relevant that scenario is since presumably those suck at running practically any AI (not sure, never tried on one of those). But maybe it matters for those too in terms of terrible vs ultra terrible or something.

[-]

chiwawa_42@reddit

Streaming to NVMe could look like a nice option, but it's deadly. You'll ruin your gear in just a few months if left unchecked. Also oMLX is shitty at best as of yesterday's releases. It's soo unstable it crashed my agentic setup while working on a live engine migration on other nodes and failed so hard I had to rebuild everything from scratch over last week-end. So big nope, oMLX isn't useable yet and definitely don't use your SSD as swap over an MoE not fitting into UMA, you'll ruin it sooner you might think.

[-]

sylverCode@reddit

9B is faster than 35B on 8GB/16GB VRAM since it can fit entirely onto the GPU. Prefill speed also suffers a lot ln the 35B since you have to offload it to RAM

[-]

HistoricalStrength21@reddit

Okay, nice. Do you feel any difference in the quality of the answers? Can a 9B model be good for coding? Is it good for generally answering questions? Thanks in advance.

[-]

Long_comment_san@reddit

The benefit of 9b-12b range can be seen on HF. Qwen 9b has literally thousands of finetunes where 35ba3b doesnt have even hundreds to my memory. Personally I think 9b is a relatively stupid choice because 12b is just so much better. If you can't do 9b locally, you'd go to 4b anyway. If you can do 9b, you can 100% do 12b with a bit smaller quant. It's as if Qwen explicitly targets 12gb vram where they should target 16gb vram

[-]

a_beautiful_rhind@reddit

MoE is notoriously hard for regular people to finetune.

[-]

sylverCode@reddit

I've been using one of the variants of Qwopus 9B coder for coding at 262K context and it's been quite decent. Qwopus is fine tuned for agentic coding. It's been running alongside 35B for reviews since I have some leftover VRAM to fit both, with 35B experts offloaded to RAM with --cpu-moe flag

[-]

while-1-fork@reddit

3 times the factorial of 9 billion may be kinda large.

[-]

OMG_IM_A_GIRL@reddit

This would be (9B!)^3. That’s something like 10^235,000,000,000.

[-]

Cool-Chemical-5629@reddit

3 x 9B = 27B. I guess they can do that for you...

[-]

Nahxiee@reddit

I wonder how the new Qwen models will be for writing 🤔

[-]

NoobMLDude@reddit

As usual Qwen is cooking hard.
I wonder how big is the Qwen team. Their release frequency is insane.

[-]

Confident-Aerie-6222@reddit

Looking forward for improved 4b and 9b models for my potato laptop

[-]

Wildnimal@reddit

If you can run 9B then 35A3B should run with ease.

[-]

cibernox@reddit

That was not true when i had an eGPU. Either the model fit entirely in memory or performance was atrocious

[-]

ithilelda@reddit

finger crossed🤞

[-]

Separate-Antelope188@reddit

Potatoes crossed.

[-]

Valuable_Relation634@reddit

Same. I've been holding off upgrading my homelab GPU setup hoping the efficiency gains from these new architectures would let me skip the power bill jump. My current 72B runs at 6t/s which is... usable for offline tasks but painful for anything interactive.Are you expecting the 122B to be a MoE? I've seen conflicting rumors.

[-]

HavenTerminal_com@reddit

these guys haven't taken a lunch break since 2023

[-]

j_lyf@reddit

996 Culture.

[-]

remeh@reddit

I would really love to see a Qwen 3.7 122B released, but the same person ran a poll foe 3.6 where 122B was mentioned, and we never saw it come out, so I'm a little worried that it might never happen...

[-]

the-username-is-here@reddit

Sad, but true. It doesn't feel like they will be releasing 122+ models any more (hope i'm wrong). 35B is genuinely good, but 122b is still smarter and can be run on low-end hardware.

Would eat huge chunk out of their model hosting business, with all them openrouter providers.

[-]

GiGiGus@reddit

Well, I wouldn't call a 64GB RAM + 16GB VRAM a "low-end hardware", I mean, if we compare it to local millionaires with RTX 6000s, then yes, it is indeed low end.

[-]

derekp7@reddit

But the 122b-a10b is perfect for something like strix halo (faster than cpu-only compute, slower than gpu, so the MoE makes up for that). And you can't compare the cost of a strix halo MB with the cost of a dedicated GPU, as you get a whole workstation class computer out of it too, so it is multi purpose (when not doing LLM inference tasks, I can spin up a farm of VMs or other ram-hungry tasks).

[-]

the-username-is-here@reddit

I run 122b on Spark, next to several containers with Postgres and embedding models (had to memory-manage the shit out it thougn). 50 TPS all the way, baby!

It's waaaay slower that 6K would do of course, but still half the price of 6K alone and probably one third of a workstation.

[-]

Swimming-Chip9582@reddit

what quant and setup do you have on your spark? ive got a couple at work to play around with but a bit unsure whats the best for agentic stuff atm - just got qwen3.6 fp8 a3b wired up which is pretty great

[-]

the-username-is-here@reddit

These days Spark or something comparable is "low-end". Sadly.

[-]

Swimming-Chip9582@reddit

it is indeed low-end - i get reminded every time i see recommended specs on huggingface say models need "a few b200s" 😭

[-]

RayHell666@reddit

Yeah same happened with Wan 2.5 and now Qwen-Image 2.0 in the image/video sphere. Alibaba is abandoning Open Source slowly.

[-]

ManySugar5156@reddit

lol same, 122B and that new 27B feel like theyre gonna be the real deal. hope it don’t take forever to drop weights.

[-]

korino11@reddit

3.7 some kind as 48B A6B will be nice

[-]

SnooPeripherals5499@reddit

Yesss please. Everyone stuck at A3B which is sad, A8b for 48 or 70B would be a dream

[-]

OMG_IM_A_GIRL@reddit

70B A8b or A10b would be so goddamned amazing on MacBook Pro M5Max. Even the 64GB model would support Q4.

[-]

redditscraperbot2@reddit

I wonder if it will be cooking my GPU soon.

[-]

Clean_Hyena7172@reddit

I doubt it will be open-source

[-]

Borkato@reddit

Remindme! One month

[-]

Clean_Hyena7172@reddit

lol they still haven't open-sourced qwen3.6 max or plus so one month for 3.7 to be open-sourced seems optimistic to say the least.

[-]

Borkato@reddit

You specifically meant max/plus?

[-]

Clean_Hyena7172@reddit

well yeah, that's what the screenshot in the post is talking about? Were you guys talking about the smaller models?

[-]

RemindMeBot@reddit

I will be messaging you in 1 month on 2026-06-19 17:11:51 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

[-]

Dany0@reddit

give it between 3-6 weeks. this time no one is recovering from chinese new year, but also it took them a month last time

[-]

OriginalPlayerHater@reddit

I honestly don't care anymore.

[-]

steny007@reddit

Yet, you care to tell us.

[-]

OriginalPlayerHater@reddit

you had to know, otherwise i'd feel incomplete

[-]

jacek2023@reddit (OP)

what do you use?

[-]

OriginalPlayerHater@reddit

I'm just gonna leave the sub I think. I use Claude through the free web interface and a paid copilot subscription. Sometimes I'll use google api's for my projects.

Local LLMs are great, they are important they are fun. I just don't give a shit about the OH MY GOD hype train of the models themselves. I did and now I don't.

I'm not going to spend 10k on a local setup, I'm not going to burn out my current GPU with inference tasks, and as neat as it is to have these models do their thing, it makes no worldly difference to me.

Anyways have fun with it

[-]

tictactoehunter@reddit

Hard cooking is not always everyone's dish...

[-]

Ifihadanameofme@reddit

Not because I want it to run on my 6gigs of ~~gpu poor~~ slum poor vram card, But because I think they can probably break the internet by releasing a MOE smaller than the 35B-a3b but better than qwen3.6 MOE .

That model is already so freaking good and "fast enough" (and even faster with MTP now) .

[-]

szansky@reddit

What are the difference between 3.7 and 3.6?

[-]

madhan4u@reddit

According to Qwen3.7-Max

Neither Qwen3.6 nor Qwen3.7 exist. There is no "Qwen 3" generation yet, and therefore no 3.6 or 3.7 point releases.

[-]

danihend@reddit

According to my neighbor's cat qwen 3.5 doesn't exist either. Fuck him though.

[-]

switchbanned@reddit

wow what a pussy

[-]

danihend@reddit

Agreed 👍

[-]

Paradigmind@reddit

About 0.1

[-]

Altruistic-Dust-2565@reddit

Not "about", exactly 0.1 unless you encounter floating point errors

[-]

Nyghtbynger@reddit

oh yeah.
if x - y < 0.000001 { return true; }
Good old days

[-]

szansky@reddit

nice but for programmers we know some changes / differences?

[-]

initalSlide@reddit

Best answer

[-]

Brief-Effect9065@reddit

now it can make working 3d cubic rubik

[-]

TurnOffAutoCorrect@reddit

They're waiting for Google IO to happen later today and then upstage them, aren't they?

[-]

BrewHog@reddit

Considering it took like a year to go from Gemma 3 to Gemma 4, I doubt Google will release another version of Gemma

[-]

DinoAmino@reddit

Exactly. Maybe another Gemma 4 model though. Given it took a year v4 was all done up from scratch, yeah? Qwen 3 came out a year ago. Are all qwen point releases using the same v3 base models, but with new post trainings? Same training cutoff across the board?

[-]

BrewHog@reddit

That would be most welcome

[-]

nacholunchable@reddit

Unless what they're "cooking" is 122b or 80b.. Im starting to get the feeling the team swap really did change things despite their public reassurance.

[-]

ego100trique@reddit

If they manage to greatly improve 9b or get an in-between 9 and 27 for the mass that performs marginally worst than 27b that would really be huge for them tbh

[-]

switchbanned@reddit

RIP 14b models

[-]

cafedude@reddit

Hopefully people on X are asking Chujie about the 122B. Multiple times. Every day.

[-]

QuackerEnte@reddit

Am I the only one who wants 80B-A3B MoE size?

[-]

SnooPeripherals5499@reddit

No more A3B please. Bring back 70B but with A8B

[-]

QuackerEnte@reddit

I would like A5B too Because I'm VRAM limited (at home) and my normal RAM is slow, 3200MT/s only

[-]

hesperaux@reddit

You're not the only one. That would be a perfect size for me. But that could only be qwen 4 since the param sizes they've trained are fixed. You won't see a qwen 3.x at 80b unless they've been severely training one all along.

[-]

jacek2023@reddit (OP)

No, I would love this size too

[-]

vogelvogelvogelvogel@reddit

tbf Qwen and DeepSeek resulted in an improvement of my views about China, not that they were like bad but still. I am very thankful they are open weighting their models

[-]

Dramatic_Entry_3830@reddit

Qwen especially. I also think the Chinese models think differently because they trained excessively on mandarin content. And that alters it's thinking to the better.

[-]

vogelvogelvogelvogel@reddit

why the downvotes, it is no new finding that language influences thinking.

[-]

Silver-Champion-4846@reddit

It does. Slightly off-topic, but the renaissance that happened to my language had two sides, one sought consolidating and drawing from heritage while the other wanted to drink from the west's whell. The second side won and now the majority of writers think in English/French even if they don't know the words and grammar because it's just embedded in the dominant institutional lingo. Also that makes western models better capable of learning that westernized Arabic, and if you discuss it with them a certain way they'll lean hard on how English works even when you tell them to be 'classical', or when they're trying to explain the difference between 'classical' and 'modern' Arabic. Point is, it's just my experience regarding that statement

[-]

Legitimate-Pumpkin@reddit

You would be surprised how different china is from what we are told in the news…

And how different we are too, btw.

[-]

vogelvogelvogelvogel@reddit

besides the good impression regarding open weights models i got i took that as a starting point to dig deeper in forums and youtube and so on, expats living in china etc..

also i worked with colleagues from china, always a good experience.

[-]

Hour_Bit_5183@reddit

How do you cook hard? Oh yeah they are making breaking bad stuff. Probably KET

[-]

while-1-fork@reddit

You get a hardon , then start cooking while keeping it hard.

[-]

Miserable-Dare5090@reddit

radeon hardon lfg

[-]

switchbanned@reddit

I hope the crockpot set to high and not low.

[-]

WebOsmotic_official@reddit

qwen is cooking so hard the local crowd is already negotiating with their GPU temps.

everyone wants the 122B until the download finishes and the fans start speaking in tongues.

[-]

iaNCURdehunedoara@reddit

Qwen is cooking but i wish they hadn't removed the free 2000 requests on their qwen-coder. It was extremely fun to play around with.

[-]

jacek2023@reddit (OP)

local LLMs are free

[-]

iaNCURdehunedoara@reddit

They're not free if you don't have the hardware. I can't run a decent LLM on 4060TI unfortunately.

[-]

MerePotato@reddit

Yes you can, Gemma 26BA4B Q8 with MoE layer offloading

[-]

jacek2023@reddit (OP)

but this is r/LocalLLaMA

[-]

a_beautiful_rhind@reddit

My electric says otherwise:

=============================================
POWER CONSUMPTION REPORT
=============================================
Period Covered:        1434 hours, 30 minutes
---------------------------------------------
Base System (IPMI):      247.21 kWh  ($49.44)
GPU Array (Nvidia):      134.84 kWh  ($26.97)
---------------------------------------------
TOTAL (Wall Draw):       382.05 kWh  ($76.41)
=============================================
GPUs account for 35.3% of total power bill.
Projected 24h Usage:   6.39 kWh  ($1.28)

[-]

alphapussycat@reddit

But these, 3.7 are probably not open weight.

[-]

jacek2023@reddit (OP)

We are waiting for open releases

[-]

alphapussycat@reddit

But they've said a month ago that they won't be doing open weights anymore, and are going to focus on monetization.

I hope I'm wrong, but it's like expecting another gpt oss because gpt went from 5.4 to 5.5.

[-]

jacek2023@reddit (OP)

Could you show the exact quote?

[-]

alphapussycat@reddit

You can find it by just googling. The new leader is starting some token section thing, don't remember what it's called, and the new direction is monetization.

They do say they'll keep the spirit of open weights to attract/keep people interested in qwen. But realistically, releasing open weights works against themselves.

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

Kahvana@reddit

If that cooking results in local runnable models for all previous released ranges, then im interested.

[-]

AmoebaDue6638@reddit

The new 27B is what I'm most excited about. If it matches the jump from Qwen2 to Qwen2.5, it'll be the best model you can run on a single 3090.

[-]

Due_Ebb_3245@reddit

I want 9b please

[-]

Journeyj012@reddit

0.6B for the pre-release hype. just would be funny.

[-]

IndianITCell@reddit

Claude is crying in the corner xD

[-]

Long_comment_san@reddit

I hope they make 12b instead of 9b eventually. 12b is much more smart than 9b.

[-]

Long_comment_san@reddit

This avatar fits amazingly well

[-]

Divyansh3021@reddit

I have been using Qwen 3.6 for past few days and I am genuinely impressed by its performance.

[-]

PromptInjection_@reddit

Hm let's see if we get open models.

[-]

datbackup@reddit

Remember top qwen bro left and ppl were in hysterics, then we got 3.6 two of the best local models ever?

Maybe i’ll eat these words but qwen team looking good still

[-]

Better-Struggle9958@reddit

qwen marketing is hard, fully bots

[-]

jacek2023@reddit (OP)

I am fighting bots and "Chinese hype" but this is an important info, Qwen 3.7 for local setups will be a big milestone

[-]

Better-Struggle9958@reddit

3.6 was just fix for 3.5

[-]

jacek2023@reddit (OP)

What models do you use locally each day?

[-]

Better-Struggle9958@reddit

gemma4

[-]

jacek2023@reddit (OP)

I use both gemma4 and qwen3.6, I don't think 3.6 was just a "fix", I would like to see 4.1 from Google

[-]

Better-Struggle9958@reddit

I use my own benchmark for C++ projects, if model is better, she is better, 3.6 was slightly better 3.5, gemma4 anyway better in my benchmarks. BUT I don't post everyday 100 posts with OMG QWEN\GEMMA is amazing

[-]

RuthlessCriticismAll@reddit

I see tons of people shilling Gemma models which convinced me to try them again and I was disappointed once more like every time in the past. Meh, they always underperform.

[-]

Better-Struggle9958@reddit

Everything is relative of course

[-]

jacek2023@reddit (OP)

How do you run your benchmarks? I am working on Python project with pi and local models (at the same time I use Claude Code for C++ project).

[-]

Better-Struggle9958@reddit

I am the creator of https://github.com/Palm1r/QodeAssist and https://github.com/Palm1r/llmqore and has internal mvp where agents generate code by task. For example creating Qt application with certain feature or C++ code with some some algo, or Qt framework. And comparing result in the end

[-]

jacek2023@reddit (OP)

are you somehow related to Qt?

[-]

Better-Struggle9958@reddit

no, just develop in freetime

[-]

jacek2023@reddit (OP)

I remember Qt was adding some AI stuff, looks like it is also called Assistant https://doc.qt.io/qtcreator/creator-qtaiassistant.html

[-]

Kicks butt on an office laptop

[-]

jacek2023@reddit (OP)

well at least no software updates are needed, I would like to see more qwen 3.x finetunes

[-]