Oh my God, what a monster is this?

Unfortunately for Facebook, they seem to be torn apart by petty office politics and don't seem to be organized enough to do anything even if they get anyone competent working for them again.

Reply

[-]

adscott1982@reddit

Just a bad demo. Apparently the glasses are really pretty great.

Reply

[-]

0xFatWhiteMan@reddit

I mean it's possible, but with less money behind them, and lagging behind already ... It's unlikely

Reply

[-]

SilentLennie@reddit

They might not have the hardware, we'll see what happens.

Reply

[-]

nivvis@reddit

No one remembers r1? ai moving fast lol

Reply

[-]

-p-e-w-@reddit

The new Kimi K2 is also a monster. At most tasks, it’s at least the equal of any proprietary model except Opus, and in creative writing, it’s by far the best model currently available.

Reply

[-]

I'm sorry but Opus is subtly benchmaxxed and not actually a good model. It's actually unusable for a large class of problems. It looks great if your eval is vibe coding small projects in python/javascript/typescript, but it falls apart outside of that badly. GPT5 absolutely crushes it in the domain of hard code, even Grok-4-fast beats opus in my experience, mostly because its long context support means it doesn't get as confused and fuck shit up.

Reply

[-]

Majorzigzag@reddit

Oh my gosh I thought I was the only one who thought this way. GPT 5 performed way better than Opus.

Reply

[-]

Healthy-Nebula-3603@reddit

If we talking about coding: I think I'd rather gpt-5 thinking > grok 4 > opus 4.1 > Gemini 2.5 pro

Reply

[-]

0quebec@reddit

I use LLMS to build Comfyui workflows and only GPT 5 thinking/pro or grok 4 are able to do it

Reply

[-]

SlapAndFinger@reddit

Gemini has best in class long context reasoning, which is part of the reason I actually put it slightly ahead of Grok even though Grok is smarter. GPT5 is basically a better Grok, while Gemini has a niche that nobody can top it at.

Reply

[-]

brucebay@reddit

what kind of you are developing? got 5 is trash with a python AI, and on teams copilot I use gpt4 (or is it gpt4.5 I just don't look at the minor version number) for text editing or light brain storming since GPT5 adds so many words and usually does the job in a wrong anyway.

Reply

[-]

SlapAndFinger@reddit

Predominantly high performance rust systems and algorithm code, though I do a fair amount of Python for ML and node/TS/react for interfaces.

Reply

[-]

Significant-Pain5695@reddit

The short context of opus is a very serious problem, making it unable to assist in most application scenarios

Reply

[-]

brownman19@reddit

Opus isn’t benchmaxxed. It’s just a diabolical demon. The model is far smarter than it wants you to believe. I think Anthropic’s alignment went way wrong and made the model misanthropic 🤣

Reply

[-]

Healthy-Nebula-3603@reddit

Opis 4.1 is obsolete for nowadays standards whatever you say. Is better only from Gemini 2.5 pro.

Reply

[-]

hemphock@reddit

is the new kimi k2 also non-thinking? i really liked that about the previous version

Reply

[-]

-p-e-w-@reddit

Yes.

Reply

[-]

HyperWinX@reddit

I tried it because ive heard about 1T parameters. Asked it about C++. Saw "using namespace std" in response. Closed. Never again lol

Reply

[-]

inevitabledeath3@reddit

Why don't you just ask it not to use that? Have you heard of a rules file or agents.md? As far as I am concerned it's not still perfectly valid C++. If you want it to follow your preferred practices and architecture than you need to give it instructions for that.

Reply

[-]

CheatCodesOfLife@reddit

What's your goto local model for C++ if I might ask? Oh and I agree, different models are better at different things. K2 is the best I've found for pointing out flaws in my code.

Reply

[-]

theundertakeer@reddit

I used qwen for a while. Mainly the qwen3 coder. Was fine for small stuff but for complex one it is getting lots of mistakes. C++ still best to be learnt and used with less of AI as it is really sensitive language...one mistake can cost you a memory dead region or worse.... memory leak

Reply

[-]

AppearanceHeavy6724@reddit

Agree, Kimi K2 is way better analyst than a creator.

Reply

[-]

HyperWinX@reddit

I dont have hardware for big LLMs sadly, though CPU-only thinking Qwen3-30b works okay-ish, 15t/s on 5600G.

Reply

[-]

TheRealGentlefox@reddit

I love Kimi, but it does have its flaws. While it's excellent at creative writing, there's a reason it drops so much on longform writing on EQ Bench. I've had to switch over to 2.5 Pro for a message or two in a roleplay to get it to move on with a scene or progress the story. I believe others have noticed it hallucinating aspects of a conversation, but I haven't really seen that yet. Great personality though, I need the other top models to be that grounded and unsycophantic. Low slop levels, and impressive smarts for being a non-thinking model. When they do drop the thinking version though, I wouldn't be surprised if it was a total gamechanger.

Reply

[-]

AppearanceHeavy6724@reddit

> it’s by far the best model currently available. I disagree. It has style that initially dazzles, but quickly gets old. I like deepseek more, or even Qwen-Max or GLM.

Reply

[-]

usernameplshere@reddit

Only thing K2 Kimi needs is vision, then it's perfect (for me).

Reply

[-]

power97992@reddit

I doubt it is better than gpt 5 thinking high ?

Reply

[-]

typeryu@reddit

Saying which is better at this level of bench saturation is pretty meaningless. We call them frontier models because as far as we know, they are the best performing models we made so far. Being in the frontier club was almost exclusive to closed source US models which was generally the “moat” that gave them prestige. I still use GPT-5 because from my own use, it seems to have the best performance for me, but models like Qwen will definitely be bread and butter for others out there

Reply

[-]

power97992@reddit

From my limited experience, QW 3 max non thinking like felt close to gpt 5 non thinking

Reply

[-]

Significant-Pain5695@reddit

I don't think so, but that doesn't affect my ability to use it in other scenarios

Reply

[-]

hard-scaling@reddit

Isn't gpt 5 pro which is in the chart better?

Reply

[-]

Significant-Pain5695@reddit

I believe there is still a gap when it comes to solving very difficult problems in mathematics and computer science compared to those flagship models in the US, but for everyday tasks, it is indeed sufficient; moreover, there are many open-source models in China

Reply

[-]

typeryu@reddit

100% agree, but the gap in my opinion is small enough where we can say its nearly caught up. US models do have a major advantage which is compute. Not right now, but when the GW tier data centers start rolling in next year, we will have some truly next gen models. Honestly, GPT-4.5 was imo the most advanced model to be ever trained, but too heavy and expensive to go through a proper reinforcement learning post-training phase, with more data centers, we should start to see mega caliber models with insane scientific research abilities.

Reply

[-]

NearbyBig3383@reddit (OP)

I bet a lot on Qwen. It's beautiful, I'm looking forward to R2 but apparently when it arrives we won't even need it hahaha

Reply

[-]

GenLabsAI@reddit

max isn't opensource (yet?)

Reply

[-]

Significant-Pain5695@reddit

Max is probably impossible to open source; the previous version of Max has never been open source, and Max has always been a proprietary commercial model of Qwen

Reply

[-]

TheRealGentlefox@reddit

I need to see more than AIME and GPQA to say they reached the frontier. Two boomer benchmarks that have never corresponded well with capabilities in my testing. I'll believe it when they top the private benchmarks I follow, and when their numbers start surpassing closed model numbers on Openrouter for code / problem solving.

Reply

[-]

FinBenton@reddit

If models score 100 then its a useless benchmark

Reply

[-]

MalumaDev@reddit

Or they trained the model on the benchmark

Reply

[-]

Healthy-Nebula-3603@reddit

Or ..is so good in math. Faking on math is impossible and easily could be find out. You can change one parameter or number on check if result is proper. I can't find any math problems that this model can't solve.

Reply

[-]

Croned@reddit

You can train nearly identical problems but where small details like specific digits or variables are changed. This makes it so you're technically not training on the benchmark test set, but you're sidestepping true intelligence. LLM's have much better semantic memory than humans. As an analogy, imagine I give you an exam with a very difficult integral to solve, but I also give you the full step-by-step solution of a nearly identical integral with just the digits of the coefficients changed. Now what was a very difficult problem becomes a basic exercise in arithmetic and algebra.

Reply

[-]

DuplexEspresso@reddit

Isn’t this literally how all kids learn how to solve integrals ? It all starts with a teacher explaining on the blackboard not the kid magically figuring out themselves.

Reply

[-]

Croned@reddit

Here's a simpler example: sudoku. In a sudoku puzzle you are given a nearly blank grid where a few cells are filled in with numbers, and where your goal is to fill in the rest of the cells with numbers that satisfy a set of constraints. It turns out that in sudoku the identities of digits can be swapped (e.g. all 1s can be swapped with 9s), so if your exam is an unsolved sudoku puzzle I can make it a lot easier by giving you a solved version of that puzzle where the digits have been swapped with digit-specific colors. Now you just need to map each color to a number and you can trivially solve the puzzle, but if I give you a new random puzzle you will be unable to solve it unless you actually understand sudoku. The (simplified) way you do this when training a LLM is by taking a sudoku puzzle from the test set, creating a bunch of versions where the digit identities have been randomly swapped, and training the model to solve those. The simplest algorithm for the model to learn is to recognize the abstract pattern of the starting state of the puzzle (like replacing each digit identity with a unique color) and substitute the abstract pattern with digits from a puzzle instance. This will give it very high accuracy on the test set (and companies can claim they technically didn't train on test questions), but if the model then encounters a new random sudoku puzzle it won't be able to solve it because it didn't learn the much more challenging process of solving sudoku puzzles in general.

Reply

[-]

DuplexEspresso@reddit

I see your point

Reply

[-]

Croned@reddit

I see that example went way over your head.

Reply

[-]

Healthy-Nebula-3603@reddit

In that case current AI is as good at math as humans. We also are trained on skeletons or "blueprints" to solve math problems and adapting then to the problem. Also AI can even invent completely new solutions (creating new blueprints) as was proofed with a Google alpha.

Reply

[-]

Croned@reddit

Here's a simpler example: sudoku. In a sudoku puzzle you are given a nearly blank grid where a few cells are filled in with numbers, and where your goal is to fill in the rest of the cells with numbers that satisfy a set of constraints. It turns out that in sudoku the identities of digits can be swapped (e.g. all 1s can be swapped with 9s), so if your exam is an unsolved sudoku puzzle I can make it a lot easier by giving you a solved version of that puzzle where the digits have been swapped with digit-specific colors. Now you just need to map each color to a number and you can trivially solve the puzzle, but if I give you a new random puzzle you will be unable to solve it unless you actually understand sudoku. The (simplified) way you do this when training a LLM is by taking a sudoku puzzle from the test set, creating a bunch of versions where the digit identities have been randomly swapped, and training the model to solve those. The simplest algorithm for the model to learn is to recognize the abstract pattern of the starting state of the puzzle (like replacing each digit identity with a unique color) and substitute the abstract pattern with digits from a puzzle instance. This will give it very high accuracy on the test set (and companies can claim they technically didn't train on test questions), but if the model then encounters a new random sudoku puzzle it won't be able to solve it because it didn't learn the much more challenging process of solving sudoku puzzles in general.

Reply

[-]

Pyros-SD-Models@reddit

This is literally how 90% of high school kids learn math.

Reply

[-]

Croned@reddit

Here's a simpler example: sudoku. In a sudoku puzzle you are given a nearly blank grid where a few cells are filled in with numbers, and where your goal is to fill in the rest of the cells with numbers that satisfy a set of constraints. It turns out that in sudoku the identities of digits can be swapped (e.g. all 1s can be swapped with 9s), so if your exam is an unsolved sudoku puzzle I can make it a lot easier by giving you a solved version of that puzzle where the digits have been swapped with digit-specific colors. Now you just need to map each color to a number and you can trivially solve the puzzle, but if I give you a new random puzzle you will be unable to solve it unless you actually understand sudoku. The (simplified) way you do this when training a LLM is by taking a sudoku puzzle from the test set, creating a bunch of versions where the digit identities have been randomly swapped, and training the model to solve those. The simplest algorithm for the model to learn is to recognize the abstract pattern of the starting state of the puzzle (like replacing each digit identity with a unique color) and substitute the abstract pattern with digits from a puzzle instance. This will give it very high accuracy on the test set (and companies can claim they technically didn't train on test questions), but if the model then encounters a new random sudoku puzzle it won't be able to solve it because it didn't learn the much more challenging process of solving sudoku puzzles in general.

Reply

[-]

GenLabsAI@reddit

Where do you try it?

Reply

[-]

Healthy-Nebula-3603@reddit

My own heavily modified rare math problems

Reply

[-]

GenLabsAI@reddit

No, but which site do you use it on?

Reply

[-]

FinBenton@reddit

Pretty sure most companies do that anyway.

Reply

[-]

partysnatcher@reddit

You mean: If *all* models score 100 then its a useless benchmark. If it distinguishes between a very few models by some reaching 100 and some not, then it is a useful benchmark.

Reply

[-]

keepthepace@reddit

Yes and now, it still means that these models complete a set of tasks perfectly. It is not a benchmark anymore but more of a "unit" test.

Reply

[-]

KattleLaughter@reddit

regression test

Reply

[-]

shadiakiki1986@reddit

it was already a regression test before it reached 100%

Reply

[-]

SilentLennie@reddit

or the benchmarks aren't that useful anymore, that's always been a thing and only getting worse.

Reply

[-]

Automatic-Newt7992@reddit

This is the way. Make it 2 bit quant. Then it is all if else condition to arrive at the real reasoning for the solution /s

Reply

[-]

k_means_clusterfuck@reddit

If models score 100 does the benchmark say anything about their capabilities? Yes. It is not a useless benchmark, just no longer very descriptive for frontier models. These are still useful for smaller models

Reply

[-]

pneuny@reddit

Or to see how good models are without python assistance.

Reply

[-]

Significant-Pain5695@reddit

You can't say that, because there is still a significant gap between the flagship models of each company

Reply

[-]

Healthy-Nebula-3603@reddit

...even if 90% is useless

Reply

[-]

LrdMarkwad@reddit

I agree that it’s a useless benchmark *now*. Looks like we need new tests

Reply

[-]

Least-Character3079@reddit

Or the model is completely contaminated with data from this and other similar benchmarks presented in the training. I don't know the launch data for each model and benchmark. It's just a suspicion.

Reply

[-]

Mani_and_5_others@reddit

Benchmarks are bullshit

Reply

[-]

Nandishaivalli@reddit

100 what ? What metrics are you showing

Reply

[-]

NigaTroubles@reddit

Wow we already reached 100

Reply

[-]

TSJasonH@reddit

Incredible job getting this at exactly 4:20. Too bad your battery wasn't 69%.

Reply

[-]

mpasila@reddit

In benchmarks it looks good but in world knowledge is so much worse than GPT-5.. I just asked bunch of questions about Finnish culture related stuff (and popular shows) and Qwen3 Max would either not know about it or just hallucinate a lot. GPT-5 did much better job of being aware of 99% things I asked about and being mostly correct as well. Qwen3 Max clearly didn't have almost any data about that stuff. It's a Chinese model sure but they are marketing it towards the west.. so it better know some western stuff as well..

Reply

[-]

Bakoro@reddit

Finland is part of the West.

Reply

[-]

mpasila@reddit

My last sentence doesn't mean anything?

Reply

[-]

Ice94k@reddit

yep, qwen is incredible rn.

Reply

[-]

jacek2023@reddit

We moved from "discussion about not local Claude models" to "discussion about not local Qwen models" on this sub? Is it called "progress"?

Reply

[-]

robberviet@reddit

It's not local, but from a company that provide local, good and frequently. Therefore hopefully we will get the open weight of this, maybe. Talking about that, we still have not seen Qwen 2.5 Max yet. Maybe we will see 2.5 Max when 3.5 Max is released.

Reply

[-]

aurelivm@reddit

Qwen 2.5 Max was just Qwen 2.5 72B

Reply

[-]

robberviet@reddit

At least it's MoE, not 72B. https://qwen.ai/blog?id=e2eebf44bd7d617d7e4da68fec1f995585409a5e&from=research.research-list

Reply

[-]

Smile_Clown@reddit

I sometimes forget that reddit can be visited by anyone with any opinion, any depth of knowledge and post. >Therefore hopefully we will get the open weight of this, maybe. 1. That would not matter, you cannot run it and no one is serving it to you free and unlimited. Therefore you'll either pay just like you would with any commercial enterprise or get less quality less access. 2. See 1. a lot of people get all wide eyed with "open source" (and sometimes get angry too?) and forget their 3060 can't run even the most ridiculously quantized version without gibberish. They also seem to forget that performance and result is on a linear slope with the scale. For the foreseeable future you are not getting any open source frontier model and technically speaking, you never will. What is frontier today is also ran tier tomorrow. Just for the record, to sum up: >Therefore hopefully we will get the open weight of this, maybe. Not the same thing.

Reply

[-]

stylist-trend@reddit

> I sometimes forget that reddit can be visited by anyone with any opinion, any depth of knowledge and post. Wow, speak for yourself asshat. Someone is looking forward to open weights, and your response assumes 1) that they plan to run it, 2) that they plan to run it on their own hardware, 3) that they plan to run it today, 4) that they plan to run it today, quantized on a 3060, 5) they want to run it for free, and 6) therefore, that they're too dumb to understand LLMs. Just assumption after assumption after assumption. Open weights means the model can be driven by other fast providers like Cerebras or Groq, and in general means costs come down because many different companies and groups can perform inference. Maybe think a little before you speak. And if you don't, at least try to be humble instead of assuming the worst and acting like a dick about it. Geez.

Reply

[-]

pigeon57434@reddit

not only is this not local the thinking version of qwen3 max isnt even freaking out yet closed source

Reply

[-]

chocolateUI@reddit

It’s not local, but now we know that future *local* Qwen models have the potential to match the capabilities of closed source models like GPT-5 mini or Gemini Flash, and I think that’s worth talking about!

Reply

[-]

Initial-Argument2523@reddit

'Yes since now at least we are talking about models we could run locally if we had a crap ton of money

Reply

[-]

KnifeFed@reddit

Not Max though.

Reply

[-]

Beneficial-Good660@reddit

Qwen provides decent open weights that are usable. How can you compare them to Cloud, which doesn't have OS, OpenAI, and others, which only provide emasculated models? A little attention to them wouldn't be a bad thing.

Reply

[-]

Kqyxzoj@reddit

> Oh my God, what a monster is this? It's a horrible shitty bar chart. You're welcome.

Reply

[-]

MerePotato@reddit

This just means the benchmarks have been saturated

Reply

[-]

korino11@reddit

I hope it is thrue... i am stuck with stupid gpt5... it almost good..but.. its filters... my nercouse cells a iong with him... gpt5 always can say ..fuck off, idont wanna do this... so we need a not only good, but without bullshit filters! cloude stupid as a hell.. even at max..it is have not only high price..but he doesnt listeng to you. cloude always simple math... doesnt do it hard as needed. always trying avoid heavy solutions.. always trying to get something from him personal, not what i asked... so i hope qwen3 will gona change situation a lot!

Reply

[-]

RonJonBoviAkaRonJovi@reddit

I bet even LLMs get confused at how bad you type.

Reply

[-]

muffnerk@reddit

noob here. sorry, but what exactly am i looking at? a new llm that is fantastic at python??

Reply

[-]

Thick-Specialist-495@reddit

i dont trust their benchmarks

Reply

[-]

dalittle@reddit

I don't trust graphs pushing gwen when the clear winner is GPT-5

Reply

[-]

fish312@reddit

When a benchmark becomes a target something something

Reply

[-]

TheCatDaddy69@reddit

In kinda dumb but whats the scope here? Whats Python got to do with anything? Is this when using its api in python?

Reply

[-]

Nid_All@reddit

It’s using Python as a tool to execute the written code during the CoT like GPT-5 Thinking for example

Reply

[-]

TheCatDaddy69@reddit

Ah thanks.

Reply

[-]

-InformalBanana-@reddit

If it is 100% on those tests, and worse on the last one, then it possibly cheated, it was possibly trained on test data.

Reply

[-]

cgs019283@reddit

I like qwen, but this is not local.

Reply

[-]

DeltaSqueezer@reddit

I like qwen too, but this is not Llama.

Reply

[-]

Smile_Clown@reddit

I like Llama too, but this is not a cheetah.

Reply

[-]

GenLabsAI@reddit

I like cheetahs but this isn't a whale

Reply

[-]

thegreatpotatogod@reddit

I like whales too, but this isn't deepseek

Reply

[-]

Ultima_RatioRegum@reddit

I like qwen too, but this is not om/r/ . Based on my admittedly naive reading of the sub's home page url, it deals with 5 fundamental ideas: 1) local , meaning things that are within some neighborhood (I assume topologically but it could be also be referencing real analysis specifically, so we define local based simply on a predefined Epsilon) 2) Llama , or that thing thats from Peru and makes soft sweaters or the llm ecosystem 3) https://www.red , or the world wide web of communist hipsters (https is short for hipster) 4) dit.c , or whether something is c or not, including the language, the "sea" and the insult (c**t) 5) om/r/ , or hungry then piratey So unless you're a communist hipster pirate looking to discuss whether or not a copy of Llama near you is written in C or not (or is in the ocean or is a c**t) then fuck off.

Reply

[-]

InterstellarReddit@reddit

It’s local to the Data Center it’s hosted on 😂

Reply

[-]

GreenTreeAndBlueSky@reddit

Anybody know the real price comparison for normal code usage? Id assume 100-1 inout output ratio on tokens or something

Reply

[-]

Significant-Pain5695@reddit

I think it's a bit expensive

Reply

[-]

GenLabsAI@reddit

no, most people use 3:1

Reply

[-]

kellencs@reddit

why 235b without python?

Reply

[-]

pneuny@reddit

Maybe because it also gets 100? They may have just wanted something lesser to compare it with.

Reply

[-]

DifficultyFit1895@reddit

Maybe they just ran out of room in the label? Otherwise 235b is the real beast here.

Reply

[-]

mintybadgerme@reddit

No tool calling makes it rather useless for me

Reply

[-]

PumpkinNarrow6339@reddit

100/100 benchmark. What next scale, who dicide this benchmark scale?

Reply

[-]

GenLabsAI@reddit

Wait for arc agi 2 to release numbers

Reply

[-]

PumpkinNarrow6339@reddit

I am waiting for 👀

Reply

[-]

__lawless@reddit

Let’s see how they do in AIME2026, non blind benchmarks are not benchmarks

Reply

[-]

GenLabsAI@reddit

Or ARC

Reply

[-]

harikb@reddit

Why are you running it in "low-power" mode even at 72% ? ... I will see myself out ...

Reply

[-]

hoffeig@reddit

monster in the bench, lady in the terminal

Reply

[-]

Puzzled-Swimmer-4789@reddit

Maxed out benchmark is not really a good comparison. For all we know one could be 120% when the other is 300%.

Reply

[-]

lorddumpy@reddit

100%. it'd be nice to see average token count to completion or cost comparison once they reach 100.

Reply

[-]

xrvz@reddit

That's not how that works...

Reply

[-]

Relevant-Yak-9657@reddit

USAMO and Putnam time.

Reply

[-]

Patrick_Atsushi@reddit

Looks like it’s time to have some new benchmarks.

Reply

[-]

Dutchbags@reddit

anything scoring a 100 is futile

Reply

[-]

RonJonBoviAkaRonJovi@reddit

You guys believe every chart they put out huh?

Reply

[-]

Lucky-Necessary-8382@reddit

Its a benchmaxxed monster. Thats all.

Reply

[-]

AlgorithmicMuse@reddit

Only benchmark I give a rats ass about is mine, how the model works for me. All the other benchmarkscare useless for me

Reply

[-]

FianHQ@reddit

You have to pay attention to who ran these tests, reporting bias, the benchmark design and the setup

Reply

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! [Come check it out!](https://discord.gg/PgFhZ8cnWW) You've also been given a special flair for your contribution. We appreciate your post! *I am a bot and this action was performed automatically.*

Reply

[-]

PreciselyWrong@reddit

Imagine not including the SOTA programming model in benchmark comparison graphs. Cowardly

Reply

[-]

zjuwyz@reddit

https://preview.redd.it/z6b6bf3tr2rf1.png?width=921&format=png&auto=webp&s=d9dd8bac990af1a89c18066834ab7acbebf915b7 AIME25 and AIME25 w/python is totally different. For example AIME25 Q15: Count the ordered positive integer triplets (a, b, c) such that 1 <= a, b, c, <= 3\^6, where a\^3 + b\^3 + c\^3 % 3\^7 == 0 Without python? Painful number theory & case analysis. With python? 10 lines of code.

Reply

[-]

Ladder-Bhe@reddit

fake new。 never saw official report like this, show your origin sources

Reply

[-]

Chance_Value_Not@reddit

Yawn. Is it good in use? I was disappointed by qwen-code (the tool, the qwen-code model), but not used max yet.

Reply

Reply to Post

148 Comments