anthropic launched a managed agent runtime as an API. anyone else evaluating build vs buy for agent infrastructure
Posted by Mental-Telephone3496@reddit | ExperiencedDevs | View on Reddit | 39 comments
Anthropic released Claude Managed Agents this week. not a new model, its a hosted agent runtime. you define an agent config (model, system prompt, tools, MCP servers), they spin up a container with whatever packages you need, claude gets bash access, file ops, web search. sessions are stateful and persist across interactions. you can steer or interrupt mid-execution.
Basically they packaged the entire agent loop (tool execution, sandboxing, error recovery, context management) as a managed service.
Ive been maintaining a custom agent loop for about 8 months now. python, langchain, docker containers for sandboxing, custom retry logic, context window management. its maybe 4k lines of code that i spend a few hours a week keeping alive. works fine but its plumbing that adds zero product value.
The managed agents pitch is compelling on paper. skip all that infra, just define your agent and go. pay for compute and tokens, not for the runtime itself. for internal tooling or non-critical features this seems like an obvious win.
But for anything in the critical path im hesitant. single vendor dependency on anthropic. cant swap models if pricing changes or a better option shows up. limited visibility into the execution environment. their branding guidelines explicitly prohibit calling your product "claude code" which tells you they want to be invisible infra, but invisible infra you cant inspect makes me nervous.
Right now my stack is verdent for development work (planning, parallel tasks, code review) and the custom loop for production agent features. verdent handles the dev side well because it already manages the agent orchestration, model routing, verification. but production is different, i need control over retry behavior, logging, cost caps.
The real question is where the line is between "managed is fine" and "we need to own this." for us its probably: internal tools and dev workflows on managed services, customer-facing agent features on our own infra. at least until the managed options mature and offer better observability.
Would be useful to hear how other teams are drawing that line, especially if youre already running agent workloads in production.
PrintfReddit@reddit
I just don't know how it solves giving access to internal APIs and tools without blasting our private service infra on the open internet, until that can be solved securely it's useless.
Redundancy_@reddit
Zero Trust Network Access with some sort of cryptographic signature as workload identification, like a JWT/OIDC.
You could not trust anthropic at all to only sign your workloads as yours, in which case there is no solution, but ZTNA would limit access to specific services for that role.
PrintfReddit@reddit
Your last point is why we’re continuing to have our in house compute
ninetofivedev@reddit
You run them in your network boundary and you poke holes that you secure behind auth?
What do you mean?
PrintfReddit@reddit
The managed agent runtime runs on Anthropic's infra, right? Did I miss something?
ninetofivedev@reddit
No you’re right. I thought it was self hosted.
But the solution is still the same as how people handle cloud runners for CICD
Secure access behind auth.
That might give some companies heartburn, in which case I’m sure Anthropic will come up with a self hosted model, but this is hardly a non-starter.
PrintfReddit@reddit
Oh yeah for sure, I mostly meant for my organisation it's a tough sell. Our gitlab runners are within our infra too.
Unlikely_Secret_5018@reddit
Where do you currently run your agent loop?
Celery? Cloud run jobs? How well does it work?
Curious I'm running into this same dilemma.
NANO56@reddit
Airflow 😂 - we already know how to run airflow. Developers know how to work with airflow. “agentic” tasks orchestrated with airflow. Whats the difference between an “agentic loop” and a DAG
Unlikely_Secret_5018@reddit
How do you stream the immediate response back to users, like "Ok, let me do this slow thing for you..." ?
NANO56@reddit
This is an extremely high level overview.
For us, latency isn't an issue. Most workflows are batch jobs and the end user is just presented with the results.
For systems where end user latency is important we use SSE to update the end user. These can be as verbose as you would like. We keep ti simple for the end user. It mimics reasoning UI/UX. Then for the final generation step we stream the response.
neuronexmachina@reddit
One difference is that an agent loop is likely cyclic, while a DAG is by definition acyclic. I imagine passing agent context using Airflow might also be a little tricky.
NANO56@reddit
Im writing from the perspective of building highly specific scalable, automation. Simplicity is key.
My comment about agentic loops and DAGs was a little tongue in cheek about “agentic orchestration” like LangGraph. I think their abstractions are a negative and add needless complexity. I want my engineers to write pragmatic code they understand fully. I am biased because I have built the platform which our ML and data pipelines run on.
The cyclic thing is a fair point. We mitigate it by failing fast. In production our “agents” are really just fine-tuned LLMs with access to a narrowly scoped set of tools and context per task. Outputs must pass validation, if not it’s usually human intervention. This is traditional machine learning problem.
For passing agent context, ideally the agent is scoped narrowly no additional context is needed. In reality, we built a context management platform which is fundamentally a repository of specific requirements available to the “agent” builders to pull into their tasks.
neuronexmachina@reddit
Good points, that's a really interesting approach.
neuronexmachina@reddit
I think it'd depend on what problems you're trying to solve. Depending on the problem, you could implement custom orchestration using something like PydanticAI or LangGraph, and there's also open-source platforms like Eigent and Multica.
siscia@reddit
At work we have or ouw thing that is homegrown and growing.
However, I do manage few passion projects and to manage them I created a GitHub application.
I basically just open an issue and the GitHub application in the background spawn a VM, push an agent inside, and start the agentic loop.
When everything is done it creates a PR that I can review.
I usually do it with a 2 step workflow, first create a design and then implement it. I can comment on the design and the loop starts again.
Granted, it has no access to MCP or any other kind of runtime information. But it seems to me a better design. It cannot mess it up.
For small tweaks, and small improvement is amazing.
Moreover it can be used also by non technical people.
You can see an example here: https://github.com/RedBeardLab/2llteacher/issues/63
The issue was created by someone that has no context on the technical details, the bot generated a plan has a markdown document, I can update the markdown or just leave comments like I did. The bot then generates a PR that I can review, comment on, or merge.
Megamygdala@reddit
IMO only worth it if you need agents to execute code in a sandbox, everything else is easy to own yourself
pr0cess1ng@reddit
Vendor lock in is ultra maxxing. Crazy 180 we've done as an industry
Leading_Yoghurt_5323@reddit
your split makes sense, managed for internal stuff, custom for anything critical… most teams i’ve seen land there
DeterminedQuokka@reddit
I mean yes, but I’ve been having this conversations for months because tons of other companies also offer it.
No one has brought up switching any of it to anthropic though.
OkRub3026@reddit
CLAUDE MAKE THIS AI SLOP MORE HUMAN LIKE BY SOMETIMES LOWERCASING THE BEGINNING OF A SENTENCE. NO OTHER GRAMMATICAL ERRORS.
little_breeze@reddit
If the agents are core to your business/product, I wouldn't recommend outsourcing that infra to them. These big labs don't have the best reputation for reliability (I think Anthropic has one 9 of uptime right now), and they can easily rugpull you in various ways: availability, model quality, and pricing, to name a few.
JorgJorgJorg@reddit
for people who can’t do this on their own and want to pay the premium to have it all from one vendor, sure.
ninetofivedev@reddit
If the last 10-15 years has taught us anything, there is a massive market for that.
JorgJorgJorg@reddit
yup, but probably not for enterprise. So it won’t make massive revenue. And others can perhaps offer it without model lock-in for cheaper.
andreortigao@reddit
Most enterprises outside of tech would choose that. Those deals are made by non technical people in a club over a bottle of expensive whisky
JorgJorgJorg@reddit
what enterprises are buying vercel over raw gcp/aws/azure? thats what I mean
andreortigao@reddit
Asics, Underarmor, Paige, Bose, Johnson&Johnson...
Vercel being costlier than AWS is a non-issue for those companies.
Also, for upper management, if you build a team under your umbrella and something goes wrong, it's your fault. If you hire a company with a decent reputation and something goes wrong, they can shift the blame. It's way more valuable for them.
JorgJorgJorg@reddit
Maybe they use them somewhat but I would like to compare their vercel spend against their hyperscaler spend
ninetofivedev@reddit
There is plenty of enterprises that go with the managed versions of software, despite being enterprise.
JorgJorgJorg@reddit
it depends. This one imo is a little too easy to just ask claude to deploy the same to AWS and own all your governance as well as be able to switch to other company’s models
gjionergqwebrlkbjg@reddit
It's not trivial to develop a sandbox where you can safely run shell commands.
crustyeng@reddit
Our team has been building our own entire stack of agentic ‘stuff’.. mcp, the agentic loop itself, orchestration, stateful runtime environment… everything on top of the bedrock converse api. It’s bought us a lot of flexibility and portability (it’s all just rust we can deploy like anything else).
ML_DL_RL@reddit
Too expensive and there are a million SDKs out there that give more flexibility to a good dev. Effectively paying a premium for infra. Maybe to standup something quick to demo to someone? But not a good viable solution for long term use.
ninetofivedev@reddit
I do think this is a great point. It's really hard to sell the SaaS model while also promoting tools that make self hosted easier than ever.
Icy-Buffalo-1015@reddit
It feels incomplete as a platform. I’m evaluating Ona at work and imo it’s probably where anthropic is going with it.
Background agents are getting more popular and business just want a platform to use that does it all. Even better if the runners can be self hosted.
RainbowSovietPagan@reddit
What about video editors who are skilled but not experienced?
ninetofivedev@reddit
Same problem, different domain.
As far as value, there is so much important but not critical software that operates across the world that there will absolutely be a lot of opportunity for this market.
Internal tooling is the big one. We have a suite of services that run on our platform for our internal users. Dev tools. Reporting. Documentation. Workflow managers. Etc.
These are all important enough because our devs and support and whoever use them day to day to increase their productivity.
However they’re not critical. If they go down, it’s not a massive PR nightmare or lost dollars in revenue.
Plugging these ai agents into these workflows is pretty powerful. We’ve already done it with a number of services.
rover_G@reddit
Anthropic has realized the money won’t be in running the models so they’re trying to grab a piece of the agent infrastructure pie