Do you see 'AI Agents' as a meaningful improvement to the AI tooling of the last couple of years.
Posted by Podgietaru@reddit | ExperiencedDevs | View on Reddit | 84 comments
I know this topic is done to death. And I apologise to adding to the deluge of it. But as someone who is not using AI in a lot of meaningful ways beyond querying it occasionally as an alternative to Stack Overflow, I find it hard to find opinions on where the latest state of the art lies.
Between all the 'Vibe coding' stuff, the AI true believers, and indeed on the other side the negative opinions of AI I never know where to look for whether new things have made meaningful changes to the AI landscape.
In the last few days we have seen releases of Github Copilot Agents, and OpenAIs agent. And I'm curious to hear peoples opinion on these tools. Do they make meaningful changes to how people work? Do they have the same issues that AI Tooling has had for a while?
Empty_Geologist9645@reddit
It’s an RPC.
BanaenaeBread@reddit
I've been learning at a crazy pace by using cursor carefully on home projects and cross referencing the code and concepts it gives me with reddit posts and google.
I think if used right, you can learn faster than ever before with AI and that's something good
uriejejejdjbejxijehd@reddit
Well, it’s a candid admission that AI doesn’t know how to do things right, it needs human oversight, in this case in the form of a dialogue.
It’s a slight UX improvement of a flawed process.
Zestyclose_Ad8420@reddit
the way I see it an agent is just an LLM piped to /bin/bash.
if you have worked out how to include an LLM in your workflow it's just another layer to facilitate the integration, but the fundamental issues/advantages are still there.
eslof685@reddit
AlphaEvolve is basically an agent on top of Gemini. Considering its achievements I think it's fair to say that a useful implementation is possible.
I think a big part is that we need smart models that are 10x faster and 10x cheaper so that real work can be done behind the scenes to solve real problems without waiting for hours and spending a significant cost per task.
RiverRoll@reddit
Yes just the ability to make changes that span multiple files is an essential feature in my opinion, having to copy paste things from the chat was cumbersome and it's not uncommon that a relatively simple change spans multiple files, e..g. create a new orm entity with the corresponding dto and mapper. AI can handle that fine.
ElonIsMyDaddy420@reddit
Agents still don’t work. If they don’t work soon, the AI hype will die out because 90% of valuable applications for AI kind of require agents.
bteam3r@reddit
I have to disagree. The nightly builds of Github Copilot's IntelliJ plugin have Agent mode. I've been using it heavily for the last few weeks. Its ability to self-determine what context it needs, and use it accurately, is an absolute game changer. It has made me a believer.
For one example, it was able to create new unit test classes in a 20 year old product that uses a homegrown framework. There are no examples online that could've trained it on what to do here. It did it by reading context within the codebase. And it did a damn good job. Tested all of the things I would've tested, got about 98% coverage on the class I needed to test.
It was not perfect on the first try, but it did about 99% of what needed to be done, with me just doing some minor tweaking. This one instance alone saved me hours.
I've been in this industry almost 15 years, I have seen my share of empty bubbles come and go (as we all have). There is real substance in some of these AI products.
Perfect-Campaign9551@reddit
Using a full blown AI to make unit tests is a waste of time
shared_ptr@reddit
Had similar experiences in our team where Claude code is able to read from the surrounding codebase and make modifications to/fix bugs in a system where the majority of the abstractions are homegrown.
We’re a Go shop with a pretty large ~4 year old monolith. Go didn’t have any good framework-esque solutions back then (or now arguably) so everything from how we route requests to our database migrations is built from scratch and it figures them out pretty well just from repurposing READMEs into CLAUDE.md files.
I can reliably get it to fix bugs from a comprehensive ticket description, an explanation of what I can see in the logs, perhaps a screenshot picture of the trace and a bit of a “I have a hunch it’s this”
Only been doing this for 12 years but it’s a real shock to me, no tools have worked like this before.
ba1948@reddit
So you basically assigned the ticket to your junior, baby sat him and fed him the solution, only thing left to do is actually write a piece of code to fix a bug. He writes the code pushes and claims all credit to the fix... How does it make you feel in this context?
Also, if you actually went and fixed the code yourself, wouldn't it have been way faster, considering you know, you already know where the problem is?
shared_ptr@reddit
In general no, it’s not faster. Not adding all the tests, confirming the edge cases, fixing them up, checking for common errors (security, data handling, etc) building storybook fixtures so we have it in our component library, writing out decent sample data for our fixtures.
Good rule of thumb is no ticket actually building things takes less than an hour from start to finish. You can get AI to handle it in 5m and check it in 10m, that’s a big time saving.
It’s also great at finding the source of really nasty bugs because it can check all 100 possible callsites at once and doesn’t get tired. Got a 80% hit rate of Claude code being able to diagnose nasty concurrency errors which would have taken much longer for me to properly trace and find, and if nothing else gives a good second opinion.
So yeah, it’s much faster. You can choose not to believe me and thats fine, it’s working well for me though!
ba1948@reddit
Nah it's okay I use it as part of my work flow.
But honestly sometimes it just makes things more complex than going the official docs. Debugging it's code is not that simple no.
But still you missed my point. You had to give it and explain to it exactly what to do, and pin pointed where the bug is, you might aswell have delegated that ticket to your junior to give him an opportunity to learn the code base, or just do it yourself anyway
shared_ptr@reddit
Ah I see. In this case we’re a team of mostly senior engineers and AI is allowing us to do a bunch of junior level tasks for much less time, allowing us to be more productive.
This has translated into us raising salaries for our existing developers which feels like a decent outcome.
We’ll have to figure out junior onboarding when we need it but for now we’re hiring senior and above only.
caffeinated_wizard@reddit
I think creating unit tests is the lowest of bars for an agent. Where it would have the biggest impact would be implementing new features and it’s genuinely terrible at it. We were asked to start using AI more and report back and I must have spent a few hours doing some back and forth, explaining things, asking questions for all of it to be scrapped and doing the work myself in an hour.
Unless things get dramatically better, I don’t see myself using it for actual work.
Dry_Author8849@reddit
Hey there, I will be trying the agent mode. Up to now I'm facing context limits with the code base size. Some things get too complex I think and the answers are plain wrong.
Is your code base big? Have you noticed it can take better knowledge of your code base?
Are you feeding the agent with in depth prompts?
I have a similar use case with a custom framework. We are using visual studio and the solution has projects with react/typescript, C# and SQL. Is your project on Java or you have mixed languages too?
Sorry for so many questions! But your project sounds like it is facing the same problems as ours.
Cheers!
ClydePossumfoot@reddit
I’m not sure that you can say this across the board. At my job we certainly have internal agents running that are providing real value.
marx-was-right-@reddit
Like what?
ClydePossumfoot@reddit
How is it better than cron? I’m not sure how to answer that as it’s kinda apples to oranges. You could happily trigger an agent invocation from a cron job or another workflow.
The last 6 months have been great in this space. Better structured output and improved tool calling primitives have made it possible to make things that actually work now. It’s not perfect, but its accuracy is within the range that I needed to make it work for a lot of internal tools.
A lot of folks don’t realize that it doesn’t have to be a DAG. You can, and should, have cycles. Feedback loops are an important part of a lot of successful agent designs.
The worst thing right now in my experience is latency and cost.
marx-was-right-@reddit
You didnt provide a use case that makes money though, you just spat out a bunch of buzz words
ClydePossumfoot@reddit
If you think those are buzzwords then I feel pretty bad for your future as a software engineer/developer.
marx-was-right-@reddit
Its ok agentic AI is taking over amiright?
ClydePossumfoot@reddit
It will get some of us faster than others, and you seem prime for replacing
BuddyNathan@reddit
ffs, just give the example already, you seem to much focused on convince rather than describe.
ClydePossumfoot@reddit
I can talk about the technical details all day long, and I’m happy to talk to others about their work.
I’m not giving “money making” examples of things I’m doing at work that are under NDA in a quickly evolving and highly competitive space 🤷♂️
marx-was-right-@reddit
Yeah, cuz it doesnt exist. Youre just bullshitting.
ClydePossumfoot@reddit
I guess if I were in your shoes I’d be really worried and acting out too. How is it down there in Plato’s cave?
Western_Objective209@reddit
They don't always work, but when they do it's pretty magical. Someone gave me a de-compiled mess of a project, and I just asked cursor/claude to write unit tests and refactor it with a more modern and clean coding style, and it did it with minimal input from me. Trying to understand the project was torture before I tried using cursor
the_pwnererXx@reddit
Have you even tried?
caprica71@reddit
The vendors are trying to keep the peak of inflated expectations going. Meanwhile the rest of us are in the trough of disappointment,
I am glad, cause my job is safe for at least a year
Perfect-Campaign9551@reddit
I think that they are lame and way overblown. AI makes too many mistakes to be trusted to be an agent like this
metaphorm@reddit
two thoughts:
first, I strongly recommend you start learning some basic LLM-integrated coding workflows. it's a genuinely useful tool for writing small, sufficiently specified code snippets. if you apply this iteratively you can greatly speed up the rate at which you write first draft implementation code even across larger code changes. LLMs are also very good at cranking out boiler-plate heavy unit tests and e2e tests.
fully autonomous coding agents are not ready for prime time. my company is heavily involved in developing these in a specific business and data domain and it's very difficult to get them to perform adequately in terms of both response time and accuracy. the more you can constrain the domain the better they work, but that constraining process requires A LOT of human supervision, so the model we're using for this at the moment is "centaur" or "cyborg". the system is designed to blend together user input with LLM output as seamlessly as we can manager. it's still hard.
I don't think the hype about agents is well grounded at the moment. there's a huge potential here and a lot of companies racing to unlock it, but there are no clear winners, no obvious "right way to do it", and quite a few problems that are genuinely intractable for agents.
JustinsWorking@reddit
I get mileage out of internal tooling - stuff thats simple but requires many lines of cookie cutter code.
It’s lowered the bar for when I bother to add a GUI tool - things that “might” live long enough to warrant some tooling are now getting GUIs and if nothing else its a nice QOL upgrade for me.
It’s replacing 1-2 hours of really boring code with fiddly debugging and tweaking, with 5-10min of tweaking.
I’ve never managed to really speed up the bulk if my work - it either can’t handle the complexity or specificity of the problem, or it requires so much tweaking it’s not really a tome save by the time it works.
VelvetBlackmoon@reddit
The problem is how much you have to specify to still get a subpar result.
If you already know what you're doing, imo something like copilot will make you fly in comparison to this slow BS machine
Awkward_Past8758@reddit
I’ve been using Cline in VS Code running Claude 3.7 the last few months and it’s definitely helpful. It speeds me up when I get stuck on something, or want to quickly generate some tests for a ticket I need to get done in a rush. Also good for basic refactors. There are a bunch of class based components in my codebase for instance and we have a policy of improve and modernize when we touch old code, so it saves me mental fatigue and 45 minutes of work whenever I can ask it to reference a few of our modern components and translate them over.
Nothing it puts out is ever PR ready, but it can save me a lot of time doing the boring part of the job. I view it as an enhancement to my toolkit, not a replacement for anything or anyone, JR devs included.
Also as a note, I hate writing REGEX and it’s saved me SO much time writing patterns
U4-EA@reddit
REGEX is the one thing I always use AI for but I also thoroughly test the REGEX.
Hot-Profession4091@reddit
Long term, getting agents right is important to engineering AIs that actually work.
Right now, companies are using you as guinea pigs to figure out how to actually do that.
xDannyS_@reddit
Just look at Klarna. They cut half of their staff to replace with AI in an area where AI should work the best: online customer support. Not only has the quality of their services plummetted since then, they are now also backtracking. Oh, and what they dubbed as their new revolutionary AI agent is literally no better than the garbage AI chatbots we had 10 years ago.
As for the vibe coders: all the people who preach vibe coding as the next big thing are people who have no or little experience when it comes to programming. They also all seem to share on trait: they are insecure in themselves due to their lack of skill, ability, and knowledge and they would absolutely love for vibe coding to become the next big thing because it would mean they wouldn't be inferior anymore. When other people who are better than you make you feel insecure, ofc you are going to cheer on anything that would tear down those people and bring them down to your level.
worst_protagonist@reddit
I have decades of experience, and am totally secure in my skills, ability and knowledge. Vibe coding is pretty cool!
This week I spent three hours vibe coding a complete solution that solves a real problem for my company. It is a complete standalone app and I can deploy it and automate needless toil for another department. The code is not as great as if I had lovingly handwritten each line, but its fine.
Last year I couldn't have built the thing I wanted with vibe coding. Even a few months ago I couldn't have built it as well. If I try to do something complex in our multi-million line legacy applications it is going to fail. In a few months or a year, it might succeed! Or its capabilities might hit a plateau, and it'll never improve over today, but that doesn't mean it doesn't have its place.
I think its smart to explore new tools as they come out and see what they are capable of. Staying on top of things that make you a better engineer is the best way to have a good career.
griffin1987@reddit
Not sure what you used, but I recently tried chatpgt 4o-mini-high and basically wasted 6 hours, after which I gave up and just built the thing I wanted myself in around 30 minutes.
It was a browser tool to iterate through a zip file locally and reencode certain JPEG files using googles new Jpegli, being run in a web worker and compiling Jpegli to WASM using emscripten.
Should have been pretty easy for ChatGPT, but it failed horribly.
Especially bad was the fact that everytime it failed, it made a suggestion that was already implemented . Like "I see you did A, but if you do A, it will fix your problem." Note: "A" stands for one and the same thing in this. And then it kept cycling around instead of actually fixing the issue, concentrating on stuff that wasn't related to the issue.
I've tried various LLMs over the years time and again, and it always failed me. I like the autocomplete IntellIJ now has, though it can be quite annoying as well and build horrible bugs if you don't make sure to check everything it does.
Around 3+ decades of experience, and I'd say that LLMs suck and always will. They are text generators, and programming operates on a promise of mathematical exactness. That's quite the opposite of writing an article or painting a picture, none of which actually requires you to be "correct" in a mathematical way.
worst_protagonist@reddit
I used cursor in full agentic mode so I also don't know which model I used. I don't recommend that approach for existing solutions or critical features.
I dunno. "LLMs suck and always will." Maybe? But it seems like deciding in advance that any given tool can never be helpful for any use case ever is a pretty limiting outlook.
griffin1987@reddit
LLMs are text generators, so they won't ever be able to do anything that needs mathematical exactness or correctness. For anything else, like painting a picture, where "correctness" and "exactness" don't matter, they are fine already and will most definitely continue to improve.
We might at some point see "AI" stuff that's not LLM based - at that point I can imagine them being able to actually code. But AFAIK we're very far away from that. But who knows, developments have been pretty fast the past 10 years, so might not be so far away actually ...
tango650@reddit
Any reference for the info on Klarna ? My experience with the new llm based chatbots is just so much superior compared to humans in my accounting app that i find it surprising. I also remember a real human support experience with Revolut which was like talking to a 5 year old so even the worst augmented llm could do better.
baldyd@reddit
Vibe coding reminds me of the growth of visual scripting in my industry, such as Blueprint in Unreal Engine. My job has evolved from programming solid, efficient systems into fixing the dreadful visual "code" created by designers and artists who have no background in programming. The company still has to pay me and I'm becoming more valuable as the tech debt from this stuff mounts up so I struggle to see how anyone is benefitting from this.
beaverusiv@reddit
Marketing, Sales, and other people with financial interest in AI
RunWithSharpStuff@reddit
Yep yep, I the two target demos:
AlexFromOmaha@reddit
It's more like a standardization of a thing we've been able to do for a while now. Agentic workflows are an architectural decision more than an increase in intelligence.
The answer to the implied question is nuanced and quickly changing. As a dev, it's kinda irresponsible to not know how to work with LLMs outside of consumer products and have defensible opinions on them individually. The writing is on the wall. Update your skills.
If you're looking for a less nuanced answer, every time the luddites have rolled in and said "AI will never be able to do X," where X is a well-defined task that a single human can do entirely on one computer, they've been proven wrong in months, not years. Similarly, every time a company has said "AI can do Y better/cheaper by cutting humans out of the loop," they've never proven it. Obviously these two trends will converge into something less like vaporware eventually, and we've gotten some cool toys along the way, but the process has been gross, and whoever sticks the landing will be richer than Musk.
notWithoutMyCabbages@reddit
I'm working on some proof of concept stuff at my boss' request that I expect will turn into a perfect demonstration of why this stuff isn't really quite practical yet, but hey, gotta give it my best either way.
Hairy-Caregiver-5811@reddit
Yea, it's still no silver bullet, but it's increasingly effective for specific tasks that doesn't require creative or flexible contexts.
I assume it's just like having a nurse by your side while you perform surgery
SatisfactionGood1307@reddit
No. The entire term agent is marketing BS. If I see anyone unironically using it I remind them its very little to do with the definition of agent from reinforcement learning and it's just a bot.
Like you would use to scrape a website or do QA tests - same kind of fundamental unreliability. They are good at failing and costing tokens tho. Makes $$$ for the AI company selling the silver bullet dream to your boss.
AngusAlThor@reddit
They still make mistakes a lot of the time, so at least for my job (Data Engineering) they do not solve the fundamental problems that stop AI from being integrated. And I have mates in other areas, from Frontend to Chat-Bots to Security, who say the same thing; Agents are different handling, but they have the same basic flaws that make them unusable on anything client-facing.
For me, the big, unanswered question is still price; None of the AI companies are turning a profit on these tools, and the new architectures just keep getting bigger and more expensive to run (I think reasoning models consume 70x as much power as transformers?). So at some point the price is going to have to shoot up if these companies are going to continue; So how much do these products really cost?
BushLeagueResearch@reddit
Agreed with you, but note inferencing is profitable for most ai co. R&D is just offsetting any profits tenfold
ICanHazTehCookie@reddit
Take a look at r/cursor and you'll see loads of unhappiness with Cursor's recent usage/pricing changes for that reason.
cbusmatty@reddit
Cursor doesn’t at all fit this model as it’s still super cheap.
metaphorm@reddit
to the contrary, I think Cursor's cheap pricing is exactly the problem being highlighted here. The company is subsidizing the cost of the LLM backend in order to offer a product at market-beating prices. They're probably bleeding a lot of capital because of this and writing it off as "cost to acquire customer" but its not long term sustainable.
cbusmatty@reddit
Sure, but again, their cost is still low. which doesn't fit the model at all
sanbikinoraion@reddit
Cursor is sort of cheap but I ran out of the agent requests 10 days into my month.
metaphorm@reddit
Cursor's "moat" is shallow at the moment but getting deeper by the day. there are three factors that I can think of:
the more users are on Cursor, the more Cursor gets fed high value proprietary data about real customer usage, which gives them a huge edge in fine tuning their models and agents, especially w.r.t to accuracy
the more a user engages with Cursor, the more it learns their habits and preferences and gets better at assisting their workflow, making switching costs higher for competing products
economies of scale kick in. the price of an API call to an LLM goes down the more of them you pre-purchase upfront. a larger user base means they can economize their deals with LLM providers to a greater degree than their competitors.
cbusmatty@reddit
I agree with all of this, but I think we will draw different conclusions on how this will play out based on how they've handled the big money coming to buy them so far.
darkrose3333@reddit
AI itself is in an interesting spot where customer adoption leads to increased CapEx due to each prompt costing companies more money than what they charge for usage
One-Pudding-1710@reddit
Not sure about "meaningful changes", but what is sure is that they allow you to 1) save a lot of time and 2) get more accurate insights more often
For eg, tools like https://withluna.ai/ uses AI for sprint retro, sprint insights during sprint, ... which takes away low leverage work and bring you back time to work on more important activities
fkukHMS@reddit
We are at the point where the only remaining question is "When". And the answer is likely in months, not years.
About a year ago AI could be expected to write code at the method scope, and even then tended to make silly mistakes. 6 months ago it was writing entire classes at the quality of a junior coder. Currently it can write entire components at near-production quality.
6-12 months from now ? who knows.
recursing_noether@reddit
Just use ChatGPT like your grandpa does
Candid_Art2155@reddit
Yes - there’s a few things going on with agents. One is the visible tool calling part - we see the code agent type in the terminal or run the code. The second part is the agent making multiple LLM calls. The second part is the bigger deal - currently LLM attention is really limited compared to where we need it. This is why we see small coding examples work but then things will break down with a bigger codebase. Intelligently designed agents can leverage multiple LLM calls to break down the codebase into more consumable chunks. We’re throwing more compute at the problem, essentially.
jakesboy2@reddit
Been experimenting with agents the past week. I’m not sure how much time it’s really saving, but for a repetitive task I enjoyed walking it through the changes with some intentional prompting (have it plan first before coding, tell it to ask clarifying questions, etc). I was able to watch videos while it did its thing and steer it between changes.
church-rosser@reddit
Some tools are actually weapons.
kAHACHE@reddit
You can see nlp part like an opening for different interfaces to communicate with. We have visual interfaces for users, apis with specific protocols.. now the nlp part allow interacting with voice or chat with what you have to offer by guessing intent (a search through content, a specific action to execute, whatever). It is a bit different because it not as precise but it is way more flexible. Recent developments seems to indicate this is the way as standards like MCP are coming up. It lower the entry bar for non tech users but also remove the needs for juniors right now. I think it evolves too fast for people to find other more useful tasks for juniors.
Ok_Slide4905@reddit
Great for small, well defined and rote tasks. Love the ability to summarize dense technical documentation and to adjust writing styles for tone.
Untrustworthy at best for anything else. The worst is junior and mid level devs who take AI at its word and cannot see the forest for the trees.
marx-was-right-@reddit
Nope. Same fundamental flaws of LLMs. Its just a new bullet point for the hucksters to sell before they run out of ammunition
justUseAnSvm@reddit
They work, but you still need to confirm it with some automatic validation step if you are going to chain together operations.
We use them now, mostly with you in the loop, but that validation can be quick and cheap, which opens up a lot of doors.
The next generation, is trustable output, then the question will be: well what do we tell them to do, and in what order?
It’s exciting, my teams goals for the next year are based off LLM capabilities not present last year. No doubt there’s hype, but there’s a there there!
the_pwnererXx@reddit
Using Cline - a vs code extension that let's you turn copilot or any model into an agent.
LLM's are getting better every few months, and I've found them useful since their release. They are solving my problems more and more frequently.
At the minimum, I'm well aware of what they can and can't do. I know what prompts I can give it, and get a good response out of.
Agent mode aka vibe coding let's me do all that in editor rather than in browser and let's the llm do almost all the work - including commit and make a pr / run tests.
Senior dev - python/django/terraform devops too
The people in this thread saying ai is not capable of doing their job are demonstrating their own ineptitude at using the tools. Either that or they hate ai on principle and refuse to even attempt to try it.
sam-sp@reddit
Yes, the capabilities of agents to call tools via MCP and write code are nascent but getting better every month. For example, giving VS Copilot an MCP for dotnet-dump and using Claude 3.7, its able to correctly diagnose common issues from a dump file. That is probably better than >90% of .NET developers.
AI coding is great when you can give it enough context for what it needs to do. For example when creating the MCP wrapper for dotnet-dump, I created the wrapper method for one command by hand. I then gave the LLM a list of commands and told it to create wrapper methods for each of them based on the existing one. It was 99% correct (removed
Async
from one of the names).What is missing from the current AI experiences for coding is more of a pair development experience. Vibe coding is too fire and forget - rather than jumping directly to code, it should be creating a plan for what code will be produced, and then having an iteration experience with the developer for that plan. The problem with LLMs is they don't know what they don't know - and don't know to ask the developer for the additional context.
Things like github padawan will be good where the problem is easily described, such as a callstack for an exception.
08148694@reddit
They have their uses. The value comes from their ability to detect their own errors and iterate to fix
A specific example I used one for the other day was fixing mock objects. I changed an interface somewhere which broke a bunch of tests and storybooks. These errors could all be easily found with a build command
I’m pretty lazy so instead of tediously going through each error and fixing the mock I just asked Claude code to fix my build errors. It ran the build command, made changes, reran the build command, made more changes, repeat until build pass
It was probably a bit slower than if I had spent the time to do it but it freed me up to do something else while it worked in the background
Would I trust it with building a feature or fixing a big? No. That doesn’t mean it’s useless though
lowwalker@reddit
No
HauntingAd5380@reddit
I’m really unimpressed by the first wave of agent products I’ve had demod and think they are 100% unviable for production use cases at this point but I think I can clearly see where they will be in two or three generations. Not a fad, definitely something you have to learn but not there yet.
driftking428@reddit
Yes I do.
I've recently started using the agent mode. It being able to run a test and read the output on it's own and make the fixes means it might actually get the task done.
It's definitely a step above just me asking it over and over.
DualActiveBridgeLLC@reddit
For my real-time test projects it is pretty much unusable. I guess the application space is too niche to train very well. For me AI is good from small stand alone scripts, and first draft emails or documentation. Other than that you can't trust anything.
ninseicowboy@reddit
Nope not yet. Sure eventually
forgottenHedgehog@reddit
I mean it's not that complicated.
Where those tools started is basically a raw prompt with no context beyond what was in the model derived from its training set.
Due to this limitation of static knowledge, those tools were enhanced by providing more context (via RAG) from external sources, for example from your codebase (your open tabs, recently touched files, recently ran tests etc) or from some knowledge repositories (ex your libraries might provide MCP to get current information about available functions).
Now those tools were also enhanced so that they have ability to perform actions - it could be calling APIs or whatever. This allows you to complete certain tasks outside of just providing you with text.
Up until now, those tools were only prompted by humans, but you can build workflows - static or dynamic, on top of that.
Agentic AI is pretty much those dynamic workflows, where the tool makes its own plan and executes on it by performing certain actions with access to certain knowledge.
It's fundamentally the same tool as it's used before. If you are at the point where you're just opening
chatgpt.com
and asking it some questions, you are more or less two generations behind here - you lack specific context, and you lack ability to run actions.It doesn't solve hallucination, but if you give it a reasonably well-defined task, in my experience most of the time it will execute on it reasonably well, ESPECIALLY if you give it some good context (like examples of how you like things structured). But you need decent tools with good integration for that. And it will eat a lot of tokens. And it can still fail, but significantly less than without those recent advances.
VelvetBlackmoon@reddit
I had principal engineers show how removing one line was a great example of productivity gains, while needing to babysit it a while to make that happen... yeah.
rdem341@reddit
No, I still don't see the application of agents...
I feel like if I rely on agents, I give up predictability, reliability and my costs increases...
I am seeing people use agents to handle some simple if else logic and layout formatting.
Efficient_Sector_870@reddit
We started using copilot and its dogshit. Chatgpt and gemini are good for research tho
ketsebum@reddit
I recommend trying out Gemini Research for an understanding of where the tool is at. If you want essentially an intern like research partner that will collate information for you, it can be valuable.
IMO - it is a bit more verbose than it is needed, but the few tasks that I have used it for, it was able to do well.
TainoCuyaya@reddit
NOPE