Open source models are going to be the future on Cursor, OpenCode etc.

[-]

jacek2023@reddit

Prices will go up at least 10x. People on this sub are delusional, they think they are being "smart" by using cloud models. There will be more and more crying about prices and limits.

[-]

Comfortable-Rock-498@reddit

We are currently in the $7 uber rides phase. Which is why it is also helpful to consumers if no single provider dominates

[-]

misanthrophiccunt@reddit

nor a cartel of 5 dominant ones either. The real disruptor will be fully open source models, not just the weights.

[-]

Far_Cat9782@reddit

I hope alibabaa keeps it up wirh qwen I'm almost completely off cloud already

[-]

Karyo_Ten@reddit

Given that they shelved their Qwen Code plan and fired the opensource team ...

[-]

Ladnil@reddit

This logic applies just as much to the practice of releasing free open weights models as it does to the pricing of the frontier models on cloud hosting.

There's no guarantee the free models get any better in the future.

[-]

My company was already bringing up concerns "we have used almost all of our tokens for the year in only the first quarter". I'm like,...um that's what happens when you are encouraging everyone to use AI, bro.

[-]

bnightstars@reddit

tell me you work in Uber without telling me :D

[-]

yopla@reddit

Nah, they're just going to go bust if they do. China is a few month behind with nearly equivalent models that comparatively run on shoestring budgets.

If they are smart they are spending half of their R&D budget on lowering the inference cost because that's the only thing that will keep them in the race. You know google is, as demonstrated by Gemma and the polarquant kv paper.

[-]

bnightstars@reddit

What you think all the China GPU crypto mining farms are doing now that GPU mining don't exists ?

[-]

Zombiecidialfreak@reddit

Fuck paying anything for something I can run myself

[-]

DavidOrzc@reddit

Not exactly 10x. Maybe 3x, which is still a lot of money. Just planning a task with Sonnet 4.6 today cost me 3 dollars. Using Opus at 3 times the price would have cost 15 dollars or more.

[-]

sn2006gy@reddit

It is smart to take advantage of pricing today.

Hardware costs too much right now. I die inside seeing people spend nearly 10k for 2 DGX sparks and the best they can do is have the output of a $40.00 plan from MiniMax direct that is still slower overall.

I'm super excited about locallms, but it doesn't have to come at the cost of taking advantage of developer APIs we have today even if the risk is future price increases - which we face no matter what if we want to be real about any of this.

[-]

sonicnerd14@reddit

It's probably a similar reason why the big boys are eating up all the hardware, and causing the costs to go up. It's FOMO and kneejerk reactions. It's weird to me because even now smaller LLM's are getting more efficient and more capable, enough to run on a single GPU relatively comfortably with the right quants and param optimizations. Of course, more hardware is going to be good, but you don't need to spend nearly as much as what people think to run models with agents. Most likely in a years time we'll have a 30b moe that can run on a 24gb or even 16gb card with the performance of OPUS 4. People should be aiming their sights at optimizations and efficiencies, and not so much at throwing their money at hardware grunt unless you are trying to train or something along those lines.

[-]

Infamous_Mud482@reddit

Most people still don't have, and can't afford, the GPU(s) necessary to support the local models minimally sufficient for a tolerable coding agent experience with reasonably large context. People want coding agents. If you don't already have such a GPU, these are the components that are scarce. No idea what your point is, you say it's FOMO but your entire post relies on owning the thing that people are experiencing FOMO over.

[-]

sonicnerd14@reddit

No, you miss my point. People are buying 4x GPU rigs to run massive models like deepseek v4, glm 5.1, etc. Im saying that there are already models right now that proves you dont need these large models to have a competent agent. The model doesn't exist in a vacuum. You harness needs to be just as smartly designed as the model behind it. One lacks something, and the whole thing suffers.

This is why qwen3.6 27b and Gemma 4 31b, and even the smaller moe's are outperforming models in agentic workflow many times their size. These can run on even a 16gb laptop. Larger /= smarter, and there's tons of research papers that you can read up to confirm this for yourself if you dont have the hardware to test for yourself.

[-]

IrisColt@reddit

This is why qwen3.6 27b and Gemma 4 31b, and even the smaller moe's are outperforming models in agentic workflow

And I can prototype faster than ever, and at a lower cost.

[-]

ddchbr@reddit

local hardware is getting more expensive day by day faster than developer APIs

Depends on the API. GHCP is skyrocketing (my current daily driver), and I don't like the per-token cost of any frontier model I've seen.

So it seems we (I) are being driven to open-weight model APIs and/or local hardware.

[-]

05032-MendicantBias@reddit

I like to use free credits to get venture capital burned earlier ;)

[-]

Mickenfox@reddit

I copy-paste all my questions to 3 or 4 free AI assistants just to make sure.

[-]

ea_man@reddit

Don't forget first free month then cancel, that really serves them well.

[-]

civilian_discourse@reddit

I don't think prices will go up. There are two types of cloud models, there's OpenAI/Anthropic and there's the frontier open weight Chinese models. The latter are not very far behind the former in terms of intelligence while being significantly cheaper without any VC money subsidizing. The VC reality distortion field is not going to work if Kimi K3 or DeepSeek v5 match today's Claude Opus, but that's exactly where the trendline is moving. I think that's why there's a race to go public. VCs need an exit before open Chinese models blow their little bubble. They have maybe until the end of the year.

[-]

Perfect-Campaign9551@reddit

Yep VCs want shareholders holding the bag

[-]

maycomesinlikealion@reddit

China is fucking screwed bro they’re desperate enough to revoke Meta acquiring Manus

[-]

civilian_discourse@reddit

I don't see desperation, I see geopolitical antagonism. Make no mistake, China is allowing open weight models to be released because it will pop the US bubble and make them look good at the same time. However, as soon as Chinese models look like they've advanced ahead of American ones, they will suddenly become closed weights too. Which, could mean Anthropic/OpenAI become tied up even more in national security... and the VC bubble gets paid out by tax payers? Who fucking knows.

[-]

ea_man@reddit

I think that smaller open models are a way for the provider to lock in customers and attrack new customers without even spending money in compute for free tiers.

[-]

RedParaglider@reddit

Open weight models best help manufacturers. I think that one reason Nvidia is starting to build open source models more is that they see that as well. Give people great models and they will buy your hardware to run them on.

[-]

Pleasant-Shallot-707@reddit

lol that’s not desperation

[-]

Django_McFly@reddit

It's hard to imagine the basic plan being $200 a month.

[-]

Bhumi1979@reddit

Agreed. Its delusional to think this way.

[-]

_maverick98@reddit (OP)

prices of top tier models or open-source models? thats my thesis here, that top tier (closed-source) models will continue to go up

[-]

jacek2023@reddit

Price of cloud access.

[-]

05032-MendicantBias@reddit

I connected VS Code to LM Studio via continue plugin, and use Qwen 3.6.

It's a faster than cloud models. I have 60 000 tokens with 110 tokens per second on my 7900XTX

big boi models might be better, but in an incremental way. They have the same failure modes. Both will fail at building sensible architectures. Both will succeed building well defined classes.

And the era of subsidized token is coming to an end. Venture capital are running out of money.

The best solution is going to be local LLM inference servers, guys, we won :3

[-]

Sketch0z@reddit

If you have well defined requirements at the class level... Why not just go the last 1% of the way and type directly into your file? Why bother writing to the LLM?

Old IDE auto-complete makes the typing pretty fast already.

[-]

05032-MendicantBias@reddit

That's selling LLM short.

Here the kind of classes that Qwen 3.6 can easily one shot. It was an adapter to PIL image in this case.

What could have been an hour job reading the docs, and building and testing the class, was one shot in ten seconds and with decent code documentation.

[-]

DeProgrammer99@reddit

Qwen3.6-35B-A3B, I guess! Using my 7900 XTX alone for the 35B, and using an RTX 4060 Ti as the secondary for the 27B, I estimated my costs based on power draw and cost of electricity:

(With batch size 8 and about 20k context shared between all of them, though, I can get around 4x token generation for 25% more power usage on the 27B, and I've seen the machine use 450W at times instead of 400W. And I have solar panels, and sometimes I'm cold, so that's why it's still only an estimate...)

[-]

BubrivKo@reddit

Bro, I’ve found that Roo Code works way better for agentic tasks in VSCode than Continue/Cline.

That said, the real issue is Qwen 3.6 (and small models in general) are just terrible. Yeah, it’s awesome knowing you’ve got a model you can run infinitely and always on standby without paying a dime - but does it even matter when it’s stupid af? 😃

And no, don’t tell me I’m not using it right - I know exactly how to prompt it. Qwen just gives me wrong code most of the time. Even when it runs syntactically fine, the actual logic is wrong...

So, compare these small models to Opus is like day and night.

[-]

Thomasedv@reddit

They can be blazingly fast and give you a rough program. I managed to port a rough UI from another but Qwen 3.6 35B can mange a layout st all in the UI framework I chose. And yeah, they still struggle with following instructions over time and getting things right, and doing anything it's not well trained on.

They do really good with very bounded direct tasks though, and with 130 t/s you can try three times over before claude manages a single try.

It's not a perfect fix but demanding planning phases, review loops, post code review etc, goes a long way to help it narrow things down. I do all that even with Opus. Still dumb though, especially on the edges of what it knows.

[-]

misanthrophiccunt@reddit

I'm interested in this. Why did you choose Continue? It looked to me the last time I checked as if it were unmaintained and flimsy (hence I defaulted to Roo).

[-]

05032-MendicantBias@reddit

It wasn't really a choice, I tried it and it worked competently, so I didn't have an incentive to try something else.

There is a true deluge of things to try.

[-]

misanthrophiccunt@reddit

I do that very asme thing a lot because no AI generated code is worth a thing unless you manually review it yourself. So here is like the friction of using cash instead of card to force better habits.

[-]

WhopperitoJr@reddit

I think organizations who are using AI in their workflows are also realizing that, if a human has to check over everything anyway, it makes more sense to use a smaller model that is free and requires some extra fixes than use a frontier model at cost which you still have to tweak anyway. Plus they want consistency; if Opus is great today and lobotomized tomorrow, firms in risk-adverse areas may consider the stability of using the same open weight model over and over.

[-]

Apprehensive_Side219@reddit

That and we're a couple more autoresearch style karpathy releases away from open source RSI. If frontier models are hitting it now and open source is 6 months (ish) behind, stands to reason if everybody shuts the doors on open source, the community can prompt its own soon enough.

[-]

_maverick98@reddit (OP)

what kind of mac do you have? I have Macbook Pro M4 16GB so I am RAM poor

[-]

Randomblock1@reddit

By the end of the year? It's already happening! Deepseek v4 and Kimi K2.6 are good enough for all but the most demanding tasks... and even then, you just have the closed source model do the planning, so the open source one can do the implementation. Like even Composer 2, Cursor's fine tune of K2.5, is really good, and really cheap.

[-]

AbjectBug5885@reddit

$80 in one week is absolutely insane for a dev tool. The problem isn't even just cost-it's the unpredictability. You can't budget when a single prompt might be $5.. Open models with proper context management will win purely on the predictability angle, even if they're slightly worse.

[-]

Turbulent_Onion1741@reddit

$80 in a week is nothing 🤣 you can blow through that in an hour easy using Opus.

[-]

SeaAstronomer4446@reddit

Curious are u guyz vibe coding?

[-]

Pleasant-Shallot-707@reddit

Yeah…that’s insane. There’s so much waste by shitty context users

[-]

Kuresov@reddit

A huge part of the problem is companies incentivizing token spend as “performance” and tracking internal leaderboards on usage.

I don’t think hitting top 5 is required to be safe at work, but when your manager says that it’s being tracked and to spend money on the tool, people are gonna do it, and some are gonna go way too far.

[-]

Pleasant-Shallot-707@reddit

Yeah. I am sure at stupid companies like the ones who do as you said you will soon see the worst quality code getting generated and the throughput will throttle because they’re incentivizing waste

[-]

lnvariant@reddit

Any reason you’re not using Codex 20x or Claude Code max 20x plans instead?

[-]

AttitudeImportant585@reddit

you think companies will hesitate between an unlimited vs capped engineer productivity to save a few grands?

[-]

Turbulent_Onion1741@reddit

Company decision 🤷🏻‍♀️. I would, yeah if it was my money.

[-]

Maleficent-Ad5999@reddit

Would you still pay 1000s of dollars every month on cloud subscriptions? At that spending range it makes more sense to setup hardware and run kimi or deepseek

[-]

Turbulent_Onion1741@reddit

But …. I don’t think those plans are long for this world in their current state either. It has to move to reflect the true underlign compute used.

[-]

sexy_silver_grandpa@reddit

Brother, I'm a professional open source software engineer, I can burn $80 easily before lunch.

[-]

LeucisticBear@reddit

Claude ate up $50 in extra use i had without finishing a single prompt

[-]

Freonr2@reddit

$80/week is nothing if multiplies your productivity by several times.

VS Enterprise is $250/mo and it doesn't even write all of your code for you.

https://visualstudio.microsoft.com/vs/pricing/?tab=paid-subscriptions

[-]

wurst_katastrophe@reddit

Unless they stop open sourcing them in the future.

[-]

OliveTreeFounder@reddit

They will do it asap they have killed all US AI company. This is a real economical war. China now as the chips, the researchers, the best university, the energy ressource. So it is just a matter of time.

[-]

More-Curious816@reddit

Noooo, Xi, don’t do it, DON’T. You are the savior, not supposed to join the dark side.

But we know that open weight models were the Chinese government’s idea to curb the American AI providers and make them lose as much money as possible. The next step for the government of usa is to ban businesses from using Chinese AI under the pretext of national security. Later, they’ll probably ban us, the average users, too, for protect the children, deepfake, security, biohazard, you name it.
I could talk more about this dark future, but you would call me a conspiracy theorist.

[-]

OliveTreeFounder@reddit

That is not a conspiracy theroy. This is precisely what is happening in the area of the telecommunication infrastructure hardware.

[-]

More-Curious816@reddit

Tell me more

[-]

OliveTreeFounder@reddit

For exemple Huawei has been banned for cell phone telecom infrastructure in USA and EU.

Afterter that I am more septical about prohibition for standard. On the other hand that would be stupid, counter productive and authoritarian, which seems perfectly aligned with the current politic of the white house.

[-]

_maverick98@reddit (OP)

This will be truly sad. I hope China doesn't stop. But even if they closed sourced them I would still select them on Cursor if the pricing was 5x-10x lower with comparable (even slightly worse) performance to the top tier models.

[-]

b3081a@reddit

In the long term, who will be paying for the training cost if the users don't do? If you use them for work you should pay for what's truly useful for you especially for those helped you earn the money, instead of seeking for cheap alternative all the time.

I personally really wish LLM would eventually become a sustainable and healthy market rather than endless burning of VC/government fund and a dotcom-bubble-burst style disaster at the end leaving no future of development. That would require every LLM user to contribute part of their earnings back to the ecosystem instead of relying on freely available options.

[-]

fastheadcrab@reddit

I think Nvidia will do significantly more model training going forward to create open-weight models. They are already starting with Nemotron

They have significant vested interest in people continuing to buy their hardware, including consumers and corporations that use models rather than just train them.

[-]

b3081a@reddit

That's a fair way to keep the community moving forward. GPU/NPU vendors pay for the model training and earn the money back selling hardware running them.

[-]

soshulmedia@reddit

In the long term, who will be paying for the training cost if the users don't do?

Agreed, but it seems crowd-funding would be very viable?

[-]

ea_man@reddit

Aye, I would pay to have a trained 25B coding model that the community can finetune and customize. Even better if it comes with harness that is vertical optimized for it, open source.

[-]

hugo-the-second@reddit

Valid point.
May be hard to implement rn, but valid point.

Who knows, maybe we, as humanity, will manage to transcend the capitalist paradigm in the not so distant future, and arrive at something that is more similar to how an organism allocates resources.

Even then, it would probably make sense to create some kind of feedback loop between the usefulness of some given unit for the whole, or larger parts of the whole, and the resources allocated to this unit.

[-]

misanthrophiccunt@reddit

of you think the prompts you enter into commercial models are not the actual VALUE they are gaining from you? Oh sweet summer child.

[-]

SquareKaleidoscope49@reddit

It makes no sense for China to stop. Luckily they know what they're doing.

If they stop releasing the models completely, every single part of the world will go to US for the models. Them releasing Open Source means that every single entity, including American companies, does not have to buy into the US monopoly.

[-]

thibautrey@reddit

Yeah but what is opened already will still be alive. And some of the models are already more than what the common user need

[-]

Fedor_Doc@reddit

I think OP talks about programmers, not "common users".

Also, I strongly believe that there are no such thing as a "common user". Someone will use LLM for translation, another one for literature summary and analysis. Even big frontier models sometimes fail in these tasks in specific languages.

Gemma-4-31b cannot replace Gemini-3.1-Pro in that regard

[-]

yaboyyoungairvent@reddit

Kimi is like the chinese version of Gemini. It could compete. GLM is similar to claude.

[-]

thibautrey@reddit

Kimi k2.6 can

[-]

wurst_katastrophe@reddit

Then download them all before they disappear

[-]

thibautrey@reddit

Heard of something called P2P. You know that once something is on the web it is pretty much impossible de completely make it unavailable. I can guarantee you that the day huggingface is not able to provide the open source models someone will make a website that allows you to find torrent magnets to download them

[-]

srigi@reddit

Atleast one more full update cycle would be very appreciated.

[-]

misanthrophiccunt@reddit

agreed

[-]

misanthrophiccunt@reddit

for coding I would dare to say the models have peaked, speciailsed models on your coding language and chosen libraries documentation, that would probably matter A LOT more.

I code in Elixir, I can download a few packages I use often, render their documentation, feed it to Unsloth and make decent 9 to 14B coding models a shitload more useful with up-to-date docs than larger ones. Not like now that most of them don't even know "unless" has been depreacted for ages.

So if they closed the weights, is not as if we don't have enough variety already to carry on just refining what already exists.

[-]

Acrobatic-Desk3266@reddit

Could you say more on how you feed docs to unsloth?

[-]

misanthrophiccunt@reddit

There is documentation for that and everything else

[-]

Acrobatic-Desk3266@reddit

Thanks so much! I'm new to local models and thought it was a custom workflow

[-]

XTCaddict@reddit

There are tools for this and it’s arguably a bit of an anti pattern to fine tune on latest docs given docs shift constantly, context7 etc

[-]

misanthrophiccunt@reddit

I disagree, Rust core language docs and syntax don't change daily, and on the Elixir world, core language manuals don't change that often either, add on top Phoenix + Ecto and suddenly your LLM doesn't produce

unless this do
that
end

Nor

% somevariable %

instead of

{ some_variable }

when using Phoenix. Not even this subs beloved Qwen3.6 gets that right.

[-]

Both_Opportunity5327@reddit

They have not peaked, just the jump of Qwen 3.5 to 3.6 shows that.

You are correct though we have a lot of room to optimise what we have now.

Do you finetune these Unsloth models?

[-]

misanthrophiccunt@reddit

I installed it, I figured out how it worked, but never had the actual patience to actually fine tune one, just needed to know how to do it in case I need it in the future, because for my use case RAGs are so far good enough.

[-]

Dimix-@reddit

I think they will keep making open source models. It is how they fight the competition they can't beat directly

[-]

adobo_cake@reddit

Probably best to get a drive and store all the open models we can, just as a backup if they start taking down the models already open today.

[-]

Fastest_light@reddit

The computing cost is there, how could possibly open sourced model avoid that.

Most likey those dirty cheap open source models are subsidized by foreign government(s), either for strategic AI dominance, or for your data. I would be very careful.

[-]

Revolutionalredstone@reddit

Anthropic and openai trying to desperately go public before Qwen 2.7 etc eats their lunch (it's happening fast) main people who need a subscription now is just people with tiny GPUs.

[-]

Main_Secretary_8827@reddit

You mean qwen 3.6

[-]

Revolutionalredstone@reddit

Did I FU&&ING stutter?

before as in now son ;)

[-]

Iory1998@reddit

Guys, use deepseek-v4 models. They are dirt cheap, especially the pro. There is a 75% promotion on it until the end of the month.

[-]

Main_Secretary_8827@reddit

Qwen 3.6 27b and qwen3.6 35b kick ass

[-]

Microsort@reddit

the cost problem is going to force this transition whether people want it or not. $10 for two prompts is genuinely unsustainable even for enterprise. the gap between local models and API models for most coding tasks has been closing fast, especially with the latest qwen and gemma releases. once the tooling catches up (and it's getting there), there's no economic argument for API-first anymore for day-to-day work

[-]

Chris279m@reddit

What’s a good extension for it to have agentic capabilities and navigate workspaces ? Continue isn’t doing it… for me at least

[-]

Kahvana@reddit

Open weight you mean, yeah agreed. Enjoy the API prices while they last, it will get more expensive as it becomes more clear how much loss the current plans generate.

[-]

Euphoric_Emotion5397@reddit

if you are a hobbyist, then ya, jsut go with the local LLM .
But if you are working professional , if that $200 monthly subscription can 10x your productivity (ie, 1 day job shrink to 1 hour job leaving you 9 hours free for example), i think it's worth it.

[-]

misanthrophiccunt@reddit

Those 200 USD would be next month 400 USD, next month 600 USD, the one after 800 USD.

Also, as a working proffesional, if your 1 day worth of work can be shrinked to 1 hour, you're not reviewing the code the LLM is adding enough OR you don't know the syntax enough of your coding language of choice to be efficient.

Not a single LLM can write elegant, sustainable, bug-free code without human review and intervention.

[-]

draconic_tongue@reddit

you'd have to be some kind of autist to think that. reviewing doesn't mean you waste your entire life reading it, it takes like 5 minutes

[-]

SadBBTumblrPizza@reddit

Idk if I'd go that far but really the move is to use TDD and adversarial review loops to remove slop. I also think that as models get better we're just also gonna see less and less slop

[-]

Freonr2@reddit

The amount of supervision required has dropped substantially, particular with Opus 4.5 dropped. It was tangible.

[-]

Freonr2@reddit

At work, productivity is everything. I place Opus 4.7 at something like 4-10x productivity vs hard banging everything. This is real world in complex systems, projects spanning often multiple repos and the entire stack.

The best open models, maybe 1.5x at best. Error rate and ability to truly understand large context (large code bases) is absolutely critical to get real work done, Errors are logarithmic drain on any productivity gains and error rate quickly turns them into tools you need to hand off only carefully crafted and isolated function writing, then interate back and interate on more slowly.

I've been a software engineer for over 20 years. I write virtually zero code now.

[-]

Zanion@reddit

I'm a working professional and this opinion is a skill issue.

[-]

Snoo_57113@reddit

If i am a working professional and a subscription makes me 10X more productive i would expect to be paid 10 times more, or work three days a month.

This is not true, the LLMs hallucinate, create bugs and even delete prod database if you look them wrong, should i pay $200 for a maps app that get me the wrong directions 10% of the time?.

In my opinion with the current state of AI, the max subscription price should be $20, or even free. They are getting our data and trade secrets after all.

[-]

_maverick98@reddit (OP)

i am both hobbyist and professional. I agree but even as a professional I would rather to have more tokens per month

[-]

codehamr@reddit

The pricing trajectory makes self-hosting more attractive every quarter, but I think you're underweighting the gap on agentic tasks. Open weights are very competitive on single-shot quality, but in long tool loops the frontier models still pull ahead noticeably, fewer wrong tool calls, better recovery, less context drift.

For a lot of work that gap doesn't matter and a Qwen3 or GLM running locally is more than enough. For deep refactors across a large codebase it still does. The realistic future is probably hybrid, cheap local model for 80% of calls, frontier API for the hard ones.

[-]

letsgoiowa@reddit

I can make Sonnet cost $2 with a simple query. Same query with Grok 4.30 cost me 5 cents instead somehow. Web search does wild things on OWUI. Deepseek v4 flash cost me 2 cents. Literally 1% of Anthropic's mid tier model.

[-]

Yes_but_I_think@reddit

I disagree with "which is necessary since they can't subsidize anymore". They are not subsidizing. They are enjoying a 80% profit margin- that's 5x their cost price in bare token cost. You think they have some magic model which takes 10x more compute than open weights models- zilch. That's also a transformer running interference - no moat. So, yes, please let's all switch to open models and pay only for server costs.

[-]

_maverick98@reddit (OP)

how do you know they have 80% margins?

[-]

entsnack@reddit

Cursor's Komposer is a Kimi K2.6 fine-tune so technically you're already right.

[-]

Enough_Big4191@reddit

cost pressure is real, but the tradeoff shows up in consistency more than raw capability. open models can be cheaper, but u end up spending time handling weird edge cases, especially when outputs need to line up across steps or tools. if it’s just coding prompts, fine, but once it touches real workflows, the hidden cost is debugging “looks right but isn’t.”

[-]

misanthrophiccunt@reddit

Cursor needs to make it easier to plug itself to local models and work fully offline. There are too many obscure functions that connect to the internet. e.g.: you can add in the settings per project URLs with relevant documentation, but how does that work? Does it get the cached and processed docs from Context7? I have NDAs signed, I cannot upload client code anywhere without being liable.

To me sure it is the one that works the best (I also use RooCode but breaks too often, and ZedEditor but lacks all the familiar stuff of VSCode taht Cursor already has) but...it seirously needs to be easier to plug to local models.

The main problem I see, is that would kill their business model.

[-]

_maverick98@reddit (OP)

what about opencode? its fully open-source, could be useful for your case

[-]

misanthrophiccunt@reddit

I see no advantage of a CLI agent when it becomes two pieces: CLI + extension of an IDE. I'm not going to use the CLI version, so why not go directly to an extension (e.g.: Roo Code / Clide / Zoo Code once is finally released and hopefully free of whatever has made the latest releases of Roo Code ...for me...unusable).

[-]

Nutsack_VS_Acetylene@reddit

It's about the quality and stability of the harness, the interface doesn't matter too much in comparison. They have a desktop app with a much better UI as well.

[-]

ixdx@reddit

I currently use OpenCode Web as an agent and the Zed editor to check changes and easily create git commits.

OpenCode runs on Ubuntu as a systemd service, and its HOME directory is set to the projects directory (this is done for ease of opening projects).

Zed is configured to integrate with the local LLM, and I generate git commits using it. The built-in agent doesn't work very well with local models, so I rarely use it.

Just sharing my experience.

[-]

misanthrophiccunt@reddit

interesting, thank you

[-]

Prudent-Ad4509@reddit

I simply do not use any kind of ide integration (even if it is available). I use cli and ide, the first one to code and the second to review changes and to pick what to commit.

[-]

misanthrophiccunt@reddit

it's a personal choice. You've got to go for whatever works for you the best today. If that works for you, keep doing it.

[-]

Prudent-Ad4509@reddit

I might. It worked so far. I prefer not to touch code in the ide mostly to avoid messing up opencode indices. I’ll switch to whatever else that works the moment I need it, but this combo turned out to be mighty powerful so far. I.e. it is not something unfortunate or crippled.

[-]

misanthrophiccunt@reddit

> crippled

Yes, that's the main problem I see with all of them, they start fine, then they bloat so much they stop being useful. Probably the main reason I find Zed editor refershing and sometimes use it.

[-]

Zanion@reddit

If Cursor makes it easier they can't fleece the shit out of you with their token tax and 2x-3x API billing.

[-]

misanthrophiccunt@reddit

exactly,

[-]

DrDisintegrator@reddit

I sporadically use the free tier of Claude, but mostly use local LLM models. I don't use Cursor or any fancy IDE, but just use the AI as a 'consultant' once in a while. I find it quite helpful, but not earth shattering.

I'm sure if I was a vibe-coder or someone trying to refactor a mountain of code this would be different, but for now it works for me.

[-]

Turbulent_Onion1741@reddit

Those are rookie numbers mate. It’s very easy with MCPs etc attached to pull context to blow through $100/200 in a day or a hew hours depending on what you are doing with opus and 5.5. Believe me, I’ve seen people at my place wrack up bills you wouldn’t believe.

But you are totally on the money.

The right thing to do is some kinda pipeline of compute. Cheap local, cheap cloud, frontier only when needed.

[-]

CatConfuser2022@reddit

Out of curiosity, what kind of things do those people build with all the burned token money? I hope they have a nice ROI coming up.

[-]

dadidutdut@reddit

poor workflow, no planning, no token optimization and calling MCP's like crazy.

[-]

Turbulent_Onion1741@reddit

To be honest - I don’t know! And this measuring of value - ROI is an equation that no one seems to want to truly touch. It’s got so many variables. Think of it this way

- a highly expensive but average engineer blows through a lot of tokens, lets say equivalent to their comp. They get 3x output that they would have otherwise. Technically - good enough.

- a genius, low paid engineer in a geo-arbitraged location does the same thing, but it only speeds them up 2x. But their output is 100x as valuable and their cost 300x cheaper.

Do leadership understand the maths of this? We will see.

[-]

Euphoric_North_745@reddit

Prices are going up for some reason, until the market is flooded, then the traditional crash

[-]

Pleasant-Shallot-707@reddit

They’re going up because the cost of compute was never covered by the plans that they offered

[-]

Euphoric_North_745@reddit

That is the propaganda part, a few million dollars are exchange between 3 or 4 companies to keep the prices 100x inflated, a few more years, the competition will join and it is over .

it is a few graphics cards that can be manufactured in factories like phones, 8 billion people we all have them

[-]

ttkciar@reddit

Glad to be using GLM-4.5-Air locally, and not give a single fuck about commercial API prices.

[-]

_maverick98@reddit (OP)

what kind of resources does this need? what is the difference you've noticed compared to frontier models

[-]

Ereptile-Disruption@reddit

I bought minimax 2.7 for an year on discount (90$) just to have no worry on surprise charges

[-]

_maverick98@reddit (OP)

how is minimax 2.7 working for you compared to claude-opus-4.7?

[-]

Ereptile-Disruption@reddit

Not as good, but not as limited in use.

As an hobbyst, is a fair trade

[-]

charmander_cha@reddit

A China é o futuro