Wow anthropic and Google losing coding share bc of qwen 3 coder

[-]

Melodic_Reality_646@reddit

hmmm someone pointed out that people are more likely to consume closed model using official apis. And it makes sense that enthusiasts will go for open router to try qwen exclusively. So we’re really only seeing part of the picture here. Growth on official apis probably more than compensates this here, folds…

Reply

[-]

entsnack@reddit

Also ironic that /r/LocalLLaMa is essentially /r/RemoteLLaMa when it comes to useful models. https://preview.redd.it/9y26p47n8ljf1.jpeg?width=1105&format=pjpg&auto=webp&s=03f836e5e95826a6d589e552f1db73ffa96460d1

Reply

[-]

ortegaalfredo@reddit

I run GLM-4.5 Locally, on GPUs at Q4, fast. Yes, it gets hot in here.

Reply

[-]

entsnack@reddit

Lesgoo! Is there much of an overlap between /r/homelab and here? Seems like they're still working on downloading the internet.

Reply

[-]

DealingWithIt202s@reddit

Sounds like sweet sweet training data to me

Reply

[-]

GuildCalamitousNtent@reddit

I’m curious what’s the stack to do this.

Reply

[-]

No_Afternoon_4260@reddit

Vllm

Reply

[-]

GuildCalamitousNtent@reddit

🤦🏻‍♂️ he said that, I meant his full setup (hardware included).

Reply

[-]

No_Afternoon_4260@reddit

Sry was thinking software stack

Reply

[-]

ortegaalfredo@reddit

A stack of 12x3090

Reply

[-]

Commercial-Celery769@reddit

2x 3090's in a room makes it very toasty

Reply

[-]

Western_Objective209@reddit

If a professional camera cost $50k to own but you could rent a camera for less then a penny per photo I imagine not a lot of photographers would own cameras

Reply

[-]

entsnack@reddit

I'm talking about /r/photography not photographers. You can also apply this to /r/audiophile, another expensive hobby community. The ones who cant stomach it go to /r/budgetaudiophile instead of posting their budget builds on /r/audiophile.

Reply

[-]

ttkciar@reddit

You're kind of being an ass, and as far as I can tell it's entirely gratuitous.

Reply

[-]

entsnack@reddit

I think people should rent GPUs on Runpod like the folks at /r/stablediffusion do, not use sketchy Openrouter APIs and complain about being underserved. But somehow Openrouter has become the go-to here.

Reply

[-]

Western_Objective209@reddit

Yeah I'm just talking about the economics of renting vs buying. I jumped through the stupid signup hoops for the first llama release to run it locally, kept up with llama.cpp for a while, and it's just hard to justify when my 3k computer with 32GB of VRAM can hardly run anything yet I can get a million tokens for $1. Working on LLMs is not particularly expensive, but the price goes up a couple orders of magnitude if you want to own the equipment, and it's not immediately obvious that there's any benefit to doing it. Even if you just rent full VMs with nvidia data center cards, it's so cheap compared to buying

Reply

[-]

Lissanro@reddit

I consider R1 0528 and Kimi K2 useful model, and I run them locally daily (IQ4 quants with ik\_llama.cpp).

Reply

[-]

Any_Pressure4251@reddit

This!? You would be stupid to use Open Router for anything other than tests, but there are much cheaper options for Enterprise and Enthusiasts.

Reply

[-]

Specter_Origin@reddit

How do you use official api's considering they have very low usage limits, while open-router has unlimited...

Reply

[-]

Ansible32@reddit

The official APIs you can pay for dollars per million tokens. If openrouter is unlimited they're probably using the models that are not as good and cost pennies per million tokens.

Reply

[-]

Specter_Origin@reddit

Lol, that is not how that works, the official API's even after you pay per dollars have cap on how many request per day and they have tier limits (please read official api documentations, what I say it true for gemini, chatGpt and Claude) Also "using the models that are not as good and cost pennies per million tokens" is not true as you can chose anthropic or OpenAI as provider for their own models and you are being served by OpenAI and Anthropic...

Reply

[-]

Ansible32@reddit

Google Vertex quotes like 2 requests per second on the low end, some things are higher. That's... quite a lot and I really don't know what you're doing that 2 RPS is a problem. https://cloud.google.com/vertex-ai/generative-ai/docs/dynamic-shared-quota

Reply

[-]

Former-Ad-5757@reddit

2 reqs a sec is not a lot, it is practically nothing. 2 reqs a sec seems only a lot if you are doing it manually, use API and it is nothing. Practically it is not a real problem either if you have to set up you workflow first, just try the workflow and your dsq goes up and up and up. It is only a real problem if you want to switch providers and just change a single prompt.

Reply

[-]

Ansible32@reddit

yeah, sure, calling the API in a loop is trivial. That doesn't mean you're doing something that warrants that much usage, and again, it costs $$. If you are actually happy spending that much money they will accommodate you, but at 2RPS you could spend $200 in a minute, the idea that they should support the kind of traffic you want all-you-can-eat is absurd.

Reply

[-]

Former-Ad-5757@reddit

you could spend $200 in a minute? How? Just sending a 1M context won't get you best or even good results. I mainly see people have millions of q's which can be expressed in 2k or 4k. And with API you are not talking about all-you-can-eat at least for the api's I know.

Reply

[-]

Ansible32@reddit

Gemini 2.5 Pro is $10/200K output tokens, which includes thinking. A 10K token query can easily eat 20K output tokens, so that's like 2.4M output tokens if you're doing 2RPS. Which is $120/minute. But higher is certainly possible. And you're not talking about asking questions, you're talking about a collection of automated models that are sending a bunch of data scattershot

Reply

[-]

Former-Ad-5757@reddit

I don't know who you are paying, but for the rest of the world it is $ 10 or $ 15 / 1 M tokens. So basically 5 times less, so basically not $120/min but more like $24/minute. $24 is a far distance away from your claimed $200. But as you say : all your numbers are just numbers you throw out there, they have no base in any reality.

Reply

[-]

Specter_Origin@reddit

If you have ever done tool use via any of the coding tools, like cline, roo code, cascade etc they will consume this limits like a chump change.

Reply

[-]

Ansible32@reddit

If it's hitting the limits on Gemini 2.5 Pro I would be more worried about the bill.

Reply

[-]

agentzappo@reddit

I don’t understand your logic here. Why is it stupid to use OR if you’re using paid endpoints that don’t retain your data? Speaking from a convenience standpoint, I’ve found it’s much easier to issue OR tokens to my teams so I can monitor cost per person/project and allow them access to all of the commercially-available models

Reply

[-]

Ansible32@reddit

You're maximizing the likelihood that someone is retaining your data and not telling you. And most (all?) of the closed models straight-up say they review every thing you write for malicious content and will store and review everything at their discretion, so generally speaking you should assume anything you send over these things is not private.

Reply

[-]

No_Efficiency_1144@reddit

Official Azure, AWS and GCP endpoints are widely considered secure but nowhere else.

Reply

[-]

Ansible32@reddit

What is considered secure has only a passing relationship to what is actually secure. The question with security though is, secure against whom? With the AI models this is evolving so fast it's very hard to be sure that's what's true today will be true tomorrow.

Reply

[-]

ciaguyforeal@reddit

theyre secure in the sense that they are already-bitten bullets. theyve already entangled themselves with microsoft, so whats the difference, would be the thinking. not that its 'more secure' but that its inside your existing security relationships.

Reply

[-]

Ansible32@reddit

Sure, yes, using a single cloud in a business context makes sense. OP was talking about OpenRouter and using everyone and everyone who says "Just trust me bro."

Reply

[-]

ciaguyforeal@reddit

definitely agree you cant just default trust open router. they could be doing anything.

Reply

[-]

CommunityTough1@reddit

This. People misunderstand the providers on OpenRouter labeled as "**As far as we know**, this provider doesn't log data **for training purposes**". First of all, OpenRouter has a built in disclaimer there indicating that it's not a sure thing. Secondly, it also clearly says "for training purposes", which is NOT equivalent to "no logging at all". One such provider with this label, and I'm not picking on them, is Deep Infra. The endpoint is labeled on OR with the "...no logging..." tag, but go to their privacy policy and it clearly says the data may be retained for law enforcement or other legal purposes. Just not "for training" which is all that's required to get that yeah on OR.

Reply

[-]

Any_Pressure4251@reddit

Oh really so you can get a better private enterprise endpoint from Open router than the providers themselves?

Reply

[-]

purplepsych@reddit

But why did anthropic share went down then?

Reply

[-]

illkeepthatinmind@reddit

Yes, but that's separate from the changes within the models used by users of Open Router.

Reply

[-]

o5mfiHTNsH748KVq@reddit

What are yall using to code with open router? Do you use a reverse proxy and cursor or a different tool?

Reply

[-]

llmentry@reddit

I'm old-school, and I upload a JSON of the code repository, using CherryStudio as the interface. I like screening changes, and I don't like giving LLM-driven software access to my actual files. Colour me conservative :) But there plenty of agentic solutions that work with API keys, if that's your thing.

Reply

[-]

unrulywind@reddit

I have been using GitHub branches as checkpoints. Save to branch > play with llm > check > correct > send stable to branch > repeat.

Reply

[-]

llmentry@reddit

I of course use git for development, but I still worry that you're always just one `git branch -D main` away from disaster. I'm probably paranoid, as it clearly doesn't happen in the wild (people would be screaming if it did). But, also -- I *like* understanding and vetting every code change, otherwise it just doesn't feel like my code any more. Plus I can spot any stupid errors/bugs/assumptions the LLM has made before they happen this way. Nobody understands my codebase the way I do, not even an LLM. And it still massively increases my productivity. But, hey, I'm old-school, like I said :/

Reply

[-]

x86rip@reddit

i use RooCode

Reply

[-]

scragz@reddit

I was using cline

Reply

[-]

Ok_Librarian_7841@reddit

Correct but we're talking about the change herez not the absolute usage.

Reply

[-]

one-wandering-mind@reddit

Yeah. This doesn't seem like it tells much. I use openrouter to play with models. My API usage is mostly Gemini these days. For Google and OpenAI , I use through their APIs directly. But then for actual use of tokens, it is either Claude 4 sonnet via Claude code or GitHub copilot that top my usage or o3 via the chatgpt app. My openrouter usage typically has newer models and open weights models. Qwen, deepseek, gpt-oss, Gemma. Maybe 1 percent of my total usage of models is via openrouter. I'm sure there are those that use openrouter as their primary source, but I doubt that is the bulk.

Reply

[-]

claythearc@reddit

I think it’s also true for the inverse where people are way less likely to use an official Chinese api so inflates open router

Reply

[-]

nullmove@reddit

For my personal use it's the opposite. OpenRouter provides a layer of (pseudo)anonimity, which I am less likely to forego when it comes to big corps.

Reply

[-]

MoMoneyMoStudy@reddit

Would like to see comparison of volume of usage (tokens, etc) for the LLMs for all coding use, including CLIs, Code editing GUIs, etc. Cursor alone was at an annual Sonnet API spending rate at $1Bil annually based on usage, much of that from customers using their free limit budget allowed by Cursor's subscription plans.

Reply

[-]

Down_The_Rabbithole@reddit

This is true for me. I use claude at work through official API while I experiment with OpenRouter at home to test new models for a while.

Reply

[-]

usernameplshere@reddit

Love to see it

Reply

[-]

maikuthe1@reddit

I contributed to that lol. I've pretty much been using qwen exclusively lately. I tried it like a week or 2 ago just to see how it is and it started getting stuff done right away so I just stuck with it.

Reply

[-]

Far_Buyer_7281@reddit

what language? is it any good in c++?

Reply

[-]

maikuthe1@reddit

Mostly python but I run a 2d MMO that's written in c++ and I added fishing to it the other day. I wrote the basic fishing system myself and then had qwen fill in the other features of it and flesh it out and it one shotted everything and kept everything consistent with my style. Obviously not conclusive but it did very well.

Reply

[-]

ParthProLegend@reddit

How do you do it? Like making a whole ahh game?

Reply

[-]

maikuthe1@reddit

Umm I'm not sure what you're asking exactly. If you're asking how to make a whole game with AI: I made this game and have been working on it for years, long before ChatGPT came out, I didn't use AI to make it. I'm just now using AI to add features. If you're asking how to make a whole game in general: you just start working on it and don't stop working on it... Gotta chug through the burnout and feature creep.

Reply

[-]

ParthProLegend@reddit

Without AI. What did you learn, language framework and other skills in the process.

Reply

[-]

MoMoneyMoStudy@reddit

But but Replit, bro ! Bolt, bro !!!

Reply

[-]

llmentry@reddit

Well, **GPT-5 is still BYOK on Open Router,** so it's not really a fair comparison for that model. It's also not surprising that the over-priced Anthropic model would massively lose share, now that there are cheaper models that work so well. Would be interesting to see the *total* market share, though, not the relative change.

Reply

[-]

Original_Alps23@reddit

You see both in the chart. Limited to OR of course.

Reply

[-]

runner2012@reddit

People using anthropic use Claude Code anyway, not openrouter.

Reply

[-]

RentedTuxedo@reddit

I really don’t understand the point of the byok. The whole point of open router is that I pay for access to all the models I want. Byok defeats the purpose completely. Why does it even exist?

Reply

[-]

llmentry@reddit

It's OpenAI's decision, not Open Router's. OAI has effectively said they're struggling to serve the requests they're getting as it is, so I'm not entirely surprised they're applying this. They've done it before. Also, I'd guess they like knowing the identity of their users, and the provider lock-in it generates.

Reply

[-]

RentedTuxedo@reddit

I’m aware it’s OpenAIs decision. Im saying it goes against the spirit of openrouter as a service in my opinion. I’m worried that it’s a trend that will continue and then we’ll be back to needing multiple different accounts and keys for each model provider because they would rather have total vendor lock in.

Reply

[-]

llmentry@reddit

Hopefully not. I think o3 was byok before this, though, so they may just feel their flagship model is "special". It just hasn't been as much of an issue before, since 4o / 4.1 weren't regulated this way. I don't like it either :( OTOH, I've not been using OAI for inference since the requirement to permanently retain all prompts was placed on them. I'm very happy with my current mix of models on OR (Gemini 2.5 Pro, Gemini 2.5 Flash, GPT 4.1 and GLM 4.5), plus GPT-OSS-120B, Qwen3 30B A3B and Gemma3 locally.

Reply

[-]

ParthProLegend@reddit

Byok?

Reply

[-]

RentedTuxedo@reddit

Bring your own key

Reply

[-]

MoMoneyMoStudy@reddit

Pairs nicely w byob

Reply

[-]

Specter_Origin@reddit

I agree and hope this trend does not pick up cause basically now you are bound by usage limits etc

Reply

[-]

55501xx@reddit

The single payment is a convenience for sure, but I more like the ability to try a bunch of models by just changing a string. Once you load up enough money on the underlying provider, it becomes a non issue. Plus you might have some special arrangement with the underlying provider (credits, contracts) that OpenRouter wouldn’t be able to support.

Reply

[-]

MoMoneyMoStudy@reddit

Cursor CEO bro now pushing BFF Sam's LLM over Sonnet for his customers. Follow the money - not always purely a tech choice, especially when a startup needs to start moving to profitability and OpenAI's investment side gig owns a lot of shares and influence. Cursor: $50OMil in ARR, $1Bil spend rate on Claude API.

Reply

[-]

lanfan675@reddit

Anthropic have GOT to get their prices down. I'm willing to use Claude at work, when someone else is paying, but if it's coming out of my pocket, I'll make do with slightly worse results from any of the cheaper models. Even Gemini Pro makes a significant difference.

Reply

[-]

piizeus@reddit

No, Codex CLI, Gemini-Cli, Claude Code all give direct access via their own APIs or subscriptions. I mean openrouter is not really "industry standard" for this.

Reply

[-]

LiquidGunay@reddit

This can also be explained by Cursor / Claude Code / Windsurf gaining market share.

Reply

[-]

balianone@reddit

That's because it's available for free over there.

Reply

[-]

ParthProLegend@reddit

What is?

Reply

[-]

GreenHell@reddit

Qwen3, DeepSeek, and a whole slew of other models

Reply

[-]

ParthProLegend@reddit

Ohhkk thanks

Reply

[-]

laserborg@reddit

how is you guys' experience with python and typescript in qwen3, GPT-5, o3, Gemini-2.5 Pro etc compared to Sonnet 4? I've heard different opinions but for me Sonnet 4 is unbeaten, never tried Claude Code and Opus 4.1 thou.

Reply

[-]

MoMoneyMoStudy@reddit

Know anyone that Vibe Coded a React Native mobile app? Advice for best stack and best approaches?

Reply

[-]

RageshAntony@reddit

I vibe code an entire Flutter app. Qwen 3 coder is good at Flutter. The best is Claude.

Reply

[-]

oxygen_addiction@reddit

Claude all the way.

Reply

[-]

brahh85@reddit

[https://github.com/QwenLM/qwen-code](https://github.com/QwenLM/qwen-code) # 🌏 Regional Free Tiers * **Mainland China**: ModelScope offers **2,000 free API calls per day** * **International**: OpenRouter provides **up to 1,000 free API calls per day** worldwide this means that qwen coder is free so people use anthropic and google models as architects, and then qwen coder for the coding the result is qwen giving people free inference in exchange of anthropic and google outputs , to make next qwen better planner and more compatible to anthropic and google outputs and the other result is anthropic and google losing income and power.

Reply

[-]

Electronic-Air5728@reddit

I tried it a week ago, and it couldn't complete a single task in my small Vue.js project. Maybe it needs to be prompted in a completely different way compared to calude code.

Reply

[-]

OmarBessa@reddit

Anthropic's worst nightmare

Reply

[-]

No_Efficiency_1144@reddit

Why isn’t Opus there? Do people prefer Sonnet?

Reply

[-]

Down_The_Rabbithole@reddit

Sonnet is actually better for coding. It's about equivalent in output but significantly faster so you can iterate quicker on whatever your workload is.

Reply

[-]

mrjackspade@reddit

I guess that only matters if you need to iterate. I use opus, but then I usually only need one version of the code I'm requesting.

Reply

[-]

AaronFeng47@reddit

Sonnet is cheaper

Reply

[-]

No_Efficiency_1144@reddit

Yeah but normally for code people went for the biggest model around in the past. I wonder if we have finally reached the point where we can use a smaller model. It feels unlikely as the models are still not performing that great.

Reply

[-]

scragz@reddit

opus is so much more expensive it's rarely worth it.

Reply

[-]

No_Efficiency_1144@reddit

Okay I see so in this case it is a situation of the price increase being so much more than the quality increase that users are looking to maximise benefit per dollar.

Reply

[-]

scragz@reddit

from what I can tell it sounds like opus is about 2x as good but 5x as expensive. it should really only be used when claude is absolutely stuck on something and you've already tried gemini and chatgpt.

Reply

[-]

MoMoneyMoStudy@reddit

Everything is a trade off between cost savings vs. time. If the paid tool and/or LLM API usage is under $100 a month but saves u at least a couple hours when factoring in accuracy, then it's a no brainer. Getting to the quantitative comparison w your choices out there is what can be hard when emotions are involved. But beware the 1 button does all Vibe coders like Replit and Bolt. YC bro Paul Graham really pushing his Replit investment on the AI buzz crowd.

Reply

[-]

cyber_harsh@reddit

Is the qwen3 coder good , I didn't find it better than the claude code.

Reply

[-]

LocoLanguageModel@reddit

It's great but some people don't find it better than Claude code.

Reply

[-]

Trick_Ad_4388@reddit

isn't it super obvious that it is due to claude code? nobody in they're right mind, if they are informed, will use claude models via API when you get thousands of dollars of value of API cost for the 20 dollar plan. or 5k-10k of. API value for the 200 max plan. ofc probably no one is productive with all of that "value" but it is still much much cheaper than the API for whatever they're task is. this graph only reflects this or am I missing something?

Reply

[-]

svantana@reddit

Sonnet 4 is the [number one model on OpenRouter](https://openrouter.ai/rankings?view=month), so a lot of people clearly think it's worth it

Reply

[-]

Trick_Ad_4388@reddit

I don't see that as clear. not everyone uses LLMs for coding. and not everyone uses claude code or knows of the value you get from it

Reply

[-]

bobith5@reddit

Even beyond that, this is specifically market share just on Openrouter. It's an interesting but incomplete dataset.

Reply

[-]

AppealSame4367@reddit

Good. Since Qwen Coder and GPT-5 came out Claude Opus got reliable again.

Reply

[-]

vinigrae@reddit

Qwen models are highly impressive

Reply

[-]

randomqhacker@reddit

All of those (aside from GPT-5) are offering free usage on OpenRouter right now. I'm sure that helps!

Reply

[-]

ortegaalfredo@reddit

Tried using Qwen3-235B for roo-code but it don't work, gets confused, can't use the tools, etc. GLM-4.5-Air work perfectly but when I finally managed to get full GLM-4.5 to work it is amazing, I don't think I need any cloud AI now. I would like to run Qwen3-Coder but it's just too big.

Reply

[-]

Secure_Reflection409@reddit

My top 3 models are all Qwen.

Reply

[-]

silenceimpaired@reddit

Which ones are they?

Reply

[-]

Secure_Reflection409@reddit

30b 2507 Thinking, 32b and 235b 2507 Thinking.

Reply

[-]

silenceimpaired@reddit

What’s your quant for 235b? I ended up deleting it because I didn’t think 150gb was worth what it gave (speed/performance) compared to GLM 4.5 Air and GPT OSS 120b.

Reply

[-]

Secure_Reflection409@reddit

Bartowski's IQ4.

Reply

[-]

silenceimpaired@reddit

Agreed. If GPT OSS 120b cost me money, I wouldn’t be using it.

Reply

[-]

lastrosade@reddit

I have just noticed that I've been using the wrong qwen 3 for weeks using the regular one instead of the coder one.

Reply

[-]

MoMoneyMoStudy@reddit

Your OSS GitHub PR code reviewer agent is "shocked". The AI Agent arguments over code superiority will now melt the GPUs, worse than a Discord human mocking by Linus or Hotz.

Reply

[-]

adel_b@reddit

you are finding out that smalle fine tuned model is better than generate purpose and bigger models

Reply

[-]

silenceimpaired@reddit

I was so excited to be able to run this locally until I realized what people are probably using (Qwen3-Coder-480B-A35B-Instruct).

Reply

[-]

Different_Fix_2217@reddit

Yea I found qwen code quite good, near sonnet 4 level but for much cheaper.

Reply

[-]

dhamaniasad@reddit

I’ve tried to like open source coding models. I didn’t like R1 and I didn’t like any other open models that people were raving about. Qwen 3 coder is genuinely a good coding model, not just a good _open_ coding model

Reply

[-]

das_war_ein_Befehl@reddit

I’m not getting your point because it’s open weights

Reply

[-]

noneabove1182@reddit

I think the implication is that qwen 3 coder isn't just a good compared to open, it's a good model even when compared to closed ones

Reply

[-]

dhamaniasad@reddit

That’s right

Reply

[-]

No_Efficiency_1144@reddit

Qwen is the first one he liked

Reply

[-]

Specter_Origin@reddit

"R1" was long time ago, and I would try something like Qwen Coder or deepseek v3 for coding as R1 would omit to many use less token for thinking which is not ideal for coding... if you are on cline or something you would use thinking model for planning and non-reasoning model for actual execution or 'act' mode.

Reply

[-]

Infamous_Jaguar_2151@reddit

Good. Claude terms and services are unacceptable for me. Forbids using it for machine learning in 2025!

Reply

[-]

beedunc@reddit

Qwen 2.5 variants were already high on my capabilities tests, and qw3 is even better.

Reply

[-]

MrDevGuyMcCoder@reddit

That is some creative bullshit statical backflips to get a chart to look like its saying what you want it to....

Reply

[-]

strangescript@reddit

I love that there are still people convinced 3.7 is a better model.

Reply

[-]

this-just_in@reddit

This just shows how subscriptions are impacting OpenRouter. As people using Opus/Sonnet realize they would be better off paying for a flat rate sub than per token through OpenRouter, they move into subs. This is the cheapest way to use those models. Models with cheaper per token costs or without an equivalent sub continue to be price-effective to use through OpenRouter. Separately, now that OpenRouter requires you to insert your OpenAI API key to use the latest OpenAI models, they will not have accurate metrics for them.

Reply

Reply to Post

128 Comments