What are your thoughts on AI agents? Have you seen any legit applications for them?
Posted by almost1it@reddit | ExperiencedDevs | View on Reddit | 246 comments
Feels like I've been hearing about "AI agents" everywhere and how its a paradigm shift. Yet I haven't seen an application of them that has given me that "oh shit" moment. Instead I've only seen a bunch of new dev tools for building these agents.
The sceptical side of me thinks that a lot of potential applications for AI agents are forced and could be better solved with simpler deterministic algorithms. For example, I've been seeing a lot of the crypto bros drone on about "AI x crypto" and how agents could automate your portfolio. But it feels like marketing fluff since we could have already done that in both crypto and traditional finance without having to rely on an AI's probabilistic models.
Anyone in this sub gone down the rabbit hole here? Maybe I just haven't come across any solid application of AI agents yet and am open to being shilled.
engineer_1998@reddit
VC firms are rushing and investing hugely in the firms that are working on ideas like this.
This article really summarized how AI Agents could be used in Finance.
https://open.substack.com/pub/saranshmittal/p/ai-agents-in-finance?r=9e06z&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
nova_zen1@reddit
I recently found an awesome AI agent and that is $SHAFT
Frosty-Variation-619@reddit
AI agents are getting pretty interesting lately — moving from just being tools that react to prompts (like me) to becoming more autonomous, goal-driven systems that can plan, take actions, and even coordinate with other agents or APIs.
Bakoro@reddit
It's easy to get overwhelmed with the hype and forget about the reality of where we are at.
There are a thousand companies all trying to bandwagon onto the AI thing, and most of them are selling half assed solutions, trying to get those sweet venture capital dollars.
There is also an extremely vocal set of "tech enthusiasts" who don't actually have any meaningful, professional, tech knowledge or skills, who are basically getting high off their near-future sci-fi speculation. These are the same kind of people who back in the 1920s were promising that they'd all have flying cars, personal robot maids.
If you were around for the 95 era dot com bubble, today's atmosphere should be feeling very similar. The Internet was and is real, it has real uses, it has real value, but there were a thousand overvalued companies who were promising the moon while offering no actual utility. Pop went the bubble, yet the Internet stayed and Internet based companies which offered meaningful utility flourished.
AI everything is the same as that, there are a lot of dollars flowing, and a lot of promising, and a lot of overvalued companies who aren't well founded in providing meaningful goods and services.
The reality is that all these AI tools are in their infancy and toddlerhood.
Google put out their paper on transformers in 2017, and I think it was maybe 2020 when OpenAI released a GPT API.
People have been working on AI agents for a while, but LLM based AI agents have been getting hyped over some months, that's it.
You haven't seen any major products because nobody but researchers have had the time and resources to make anything worth half a crap.
You haven't seen all the best stuff because the multi-billion dollar company who can afford to make the foundation models are keep their most capable tools for themselves, and only release the most controlled, sanitized versions they have (and rightly so, given how much people are freaking out, and give how bad faith actors are trying to get these models to do bad things so they can sue the companies).
The pace of improvement in this sphere has people going nuts, but it's a tiny, tiny fraction of the software development population which has any significant ability to push the tech forward.
There are a bunch of people who know enough to be able to fine-tune an existing model, or who can cobble together pre-existing tools into a product that kinda-sorta works. Those are the people who are making most software for second and third tier companies who can't afford to make their own foundation models, and rely on API access.
That's not an insult aimed at the general developer population, it's just that we are talking about work that is still heavily PhD level, and it takes absolutely stupid amounts of resources to do foundation level work. It takes years to get up to speed on the underlying theory and all the tools, and all the papers, and while you are learning, the field keeps surging forward.
The absolute core attraction of AI agents is being able to do automation without having to do traditional development where you have to have 100% of the relevant information and think of nearly 100% of the weird edge cases and problems that might happen.
Traditional automation is tedious, buggy, error prone, and it's essentially always incomplete. Whenever you change any part of the process, you may have to redo parts of the automation. It's also way too expensive for many companies to do, and frankly, it's kinda stupid for 100 companies to all try to independently automate the same stuff.
Whether it's stacking boxes or making burgers, you're never getting be able to account for every eventuality through classical programming. An AI agent is ideally going to be able to deal with the weird little stuff without catastrophe.
The likely short term uses are more boring (and dystopic) than most people want to hear.
AI agents are probably mostly going to be making a lot of reports. Take the company's data, make reports and spreadsheets. Take data and find interesting correlations.
Monitoring security cameras is a huge one. You can't hire enough humans to monitor all the humans and monitor all the cameras. Most of the cameras record nothing interesting 24/7 and you don't want to save that garbage data. The AI agents monitors all your thousands of cameras 24/7 and they don't just set off an alarm just because they see a person, they have the semantic intelligence to see that an unauthorized person is in an area doing specific things, and they can track that person over many cameras.
gamingLogic1@reddit
What about Anti-Ai Agent? A means to fight back!
Adorable-Boot-3970@reddit
Reminds me of a few years ago when on delivering to a client a modelling package designed to find inefficiencies in compressed air use within the automotive manufacturing industry the bosses said “that’s great! But it needs blockchain - we have to get blockchain in there, no one will buy this system unless it has blockchain”.
Massive hype cycle, that’s all. Some useful stuff will emerge, most will be forgotten, and in 10 years time people will spend hours explaining why the next big thing is totally, totally different to the AI bubble of 2025!
WiseNeighborhood2393@reddit
I think people are not aware of trillion of dollars spend for nothing, It could easily trigger a major(2008 will be a joke compare to what is coming), no sensible scientist say something science they are getting fund produce nothing, buckle up people, when It burts, you want to have lots of savings lots of...
PermabearsEatBeets@reddit
It absolutely will create an economic crisis, there's simply no way whatsoever that the business model of the big players is at all sustainable, and the productivity gains would need to be an order of magnitude higher to make them worth anything like the fee needed to change that
https://www.wheresyoured.at/subprimeai/
wwww4all@reddit
Everyone's trying to sell the shovels during the AI gold rush.
The you'll have to start asking, if AI shovels can dig for gold, then why not skip the middlemen and just use AI to get the gold.
But most people are not ready for that conversation yet.
VinceRussoIsA@reddit
Recently the usual crazies have been pumping hard with:
I'm waiting for this cycle to come to a painful end and some perspective to return with the full return to on_prem -> rehire workforce, but its still somehow still going even though our technical capability seems to diminish hourly at this rate. There are signs of it starting to shake however.
As my boss told me many years ago... sometimes you just have to let it burn. I'm just waiting for the next contract negotiation with the cloud vendors - then the realization that they have let all the talent go and employed a team of call center agents with AI bots to assist them.
wwww4all@reddit
There’s an apt phrase in investing, the market can remain irrational longer than you can remain solvent.
You may be totally right, but at what cost and how long for the cycle to burn out?
When in Rome, do as Romans do. Follow the trends and dig that AI gold. Just hedge with basic fundamental skill sets.
SirPizzaTheThird@reddit
Because making the shovel is easier and you still make money even if they don't find gold.
However big tech is already on that path and even Nvidia is trying to get closer and closer to business problems not just hardware.
AngusAlThor@reddit
Spam is literally the only use I can think of that they would be good at. Other people will say chatbots, but I have a mate who develops chatbots, and she says the model's tendency to make shit up is completely unacceptable for 99.99% of chatbots.
However, that is if we are thinking of what these models will actually be good at, but the sad truth is that doesn't matter; Truth is that AI will end up getting shoved in anywhere it is more profitable than the alternative. A LLM chatbot may never solve the problem you are calling your bank about, but if it costs $25,000 a year for the bot and $50,000 for a human, the bosses might not give a shit that the chatbot is useless.
Sad_Subject_3761@reddit
Spam is a valid and practical use case, that I have seen solved quite well. It is not perfect, but some AI startups like https://betula.ai seem to address it quite efficiently (must be good prompting?) - almost all of my spam calls are filtered out now. Its funny, when I look at call logs on their site for my account, spam callers call and just hang up as soon as agent starts talking. Or soon after when agent asks them for reason for the call. It is nice to see, for a change, I have an upper hand over spam callers!
ayananda@reddit
I think there is few legit use cases one creditor kicked out 75% of theit customer service. In case where you basically just give client information and few pre determined option I think this will work. But yes anything more complicated is difficult...
Fantastic_Elk_4757@reddit
Not sure exactly what’s unacceptable and to who?
Businesses are adding chatbots everywhere. Typical accuracy for a RAG chat bot will be like 85%+ if made correctly.
We just launched one and are making it agentic now. I can’t get into specifics but function calls will improve accuracy significantly in areas that can be programmatically solved which the LLM struggles with.
For instance tables. LLMs aren’t that great with tables so you give the LLM agency to determine whether a table is involved and your application will then trigger a function to do the look up. The bot has 100% accuracy in this case and can answer the question normally.
Function calls is a huge improvement IMO. And this is “an agent”.
marx-was-right-@reddit
85% accuracy for a customer facing tool is horrible?
terrany@reddit
The only “AI” chatbot I used recently that I really liked was Chipotle’s, because it gave me a free entree after I complained in less than 3 messages
AngusAlThor@reddit
That occurs because in the short-term it is most profitable if the chatbots are overly generous and make errors in favour of the customer; The business stops paying humans, and customers don't complain because they get free stuff.
However, once you have adjusted to accepting AI chatbots, the enshitification cycle will begin; Businesses will make the chatbots steadily less and less generous and more and more frustrating, until they are miserly and arcane, as once they are over the sugar hit of firing their human workers they will still need to find more growth for next quarter, and if they can't cut wages then the money has to be taken from customers.
thekwoka@reddit
Generally this is most profitable in the long term to, even with humans.
Companies that are generous when there are complaints typically just do better. They have more loyal customers who spend more with them.
I'd almost say that the company making a mistake, and then solving it generously is better for making a loyal customer than just not making the mistake in the first place.
Sexy_Underpants@reddit
Nah, you just capture the market and enshitify. Case in point: https://www.theatlantic.com/technology/archive/2024/05/amazon-returns-have-gone-hell/678518/
thekwoka@reddit
Tbf Amazon barely makes any money from the shopping.
HimbologistPhD@reddit
I've returned to brick and mortar stores because at least there I don't have to sort through twenty pages of dropshipped CHINESE garbage from beloved brands like XIOAWEI, HOUGHBOUGH, and ZINPNITZIN
Equivalent_Emotion64@reddit
I think Louis Rossmann aka “the right to repair” guy on YouTube has a video on that subject
AngusAlThor@reddit
That might be true, but it isn't directly trackable in company metrics, while reducing costs is. So, in the long-term, companies will always degrade their services to save money, since those savings are directly measurable in a way the harm it does to their brand isn't; That is what the process of enshitification describes.
Tiskaharish@reddit
"I'm not saying it isn't important, I'm saying it isn't measurable" ==> what isn't measurable gets ignored and thrown in the trash can
ErrorEnthusiast@reddit
I had an issue recently with a plane ticket and the airline customer support kept sending me through different chat bots that was just a glorified FAQ.
It was impossible to contact a human being and in the end I had to go to the airport to talk to somebody who could solve my issue.
Most customer support was already pretty bad without AI, but now they managed to make it even worse.
3meta5u@reddit
Just wait until the customer service bots figure out how to catfish the customers into sending them money.
Spider_pig448@reddit
Customer facing chatbots are all garbage. The value in chatbots comes as internal tools.
curryeater259@reddit
> because it gave me a free entree after I complained in less than 3 messages
To be fair, you could've done that before by sending a single email to their customer support staff
iateadonut@reddit
My credit card (Bank of America) frequently has me call in for new online merchants to verify it was me. Last time, the telephone agent saw the transaction but said they needed to "verify" me over the app by an AI bot. When I tried, the AI bot could not find the transaction, and I was forced to use a different credit card.
AuroraFireflash@reddit
This is the major risk for anything talking to consumers who have not signed anything with your company. We're okay using it for internal staff to summarize things/meetings, or to help search existing documentation.
But the hallucinations are bad. Bad drugs.
Xenasis@reddit
Yep, especially when companies are held to what the chatbots actually say: https://www.bbc.com/travel/article/20240222-air-canada-chatbot-misinformation-what-travellers-should-know
soggyGreyDuck@reddit
If we could measure what percentage of what humans say is made up I wonder what it would be lol
teucros_telamonid@reddit
Does not matter for managers. If they can act upon it and reprimand someone, it is better. AI platforms are not going to react at all, just generic excuses.
soggyGreyDuck@reddit
Yep it's also why engineers are now expected to make decisions the managers should be making. They don't have to live with the decision and then they also figured out the less they understood the less they can be asked so everything now is a presentation by the engineers that the managers grab and combine into a larger PowerPoint they show their managers and I'm sure the pattern continues all the way to the top. I've never been responsible for so much and I took this job as a standard engineer because I was sick of making decisions as a dev lead lol. I'm giving up and going back to a lead or management position because I'll end up with the work anyway and might as well get paid for it.
CNDW@reddit
I don't think using a 100% LLM AI chatbot will ever work because of how it will just make shit up. I think LLM will be the interface point, translating natural language to and from more traditional data based chat resources and RPCs. People just haven't found the right abstractions to bridge the gap yet.
AchillesDev@reddit
What? That's exactly how they're used (at least by competent orgs) now. RAG and GraphRAG are table stakes for LLM-powered chatbots, where the LLM is just the interface.
CNDW@reddit
Yea, that's what I'm saying. Competent orgs have figured it out but not everyone has. A lot of what I've seen people try to do is shove a bunch of internal documents into a chat bot's fine tuning and call it good. This tech is still so new that it's not immediately obvious to most people.
AchillesDev@reddit
This is just a failure of imagination (and of having any knowledge at all of the field) on your part than anything else.
VG_Crimson@reddit
This reminds me of an engineering problem involving moving shipments from A to B separated by water. Building over the river is costly but more direct (human), and land is cheaper to build over but the path is longer to reach A (AI). What's the cheapest option? The answer lay in a mixed ratio of the two.
This is a vastly simplified problem in optimization, however, I think it'll hold true here as well. AI will never replace humans wholely, but it will eventually be normalized as a tool to increase developers' and engineers' efficiency at work.
AngusAlThor@reddit
But that takes for granted that AI would increase developer productivity, and assumes that we want to increase developer productivity by whatever metric is used. Neither of these facts has been established.
Take a commonly cited metric; Lines of code written. Let's say we measure 2 teams for three months, one team with AI and the other without, and at the end of the period measured the AI assisted team wrote twice as many lines of code as the team without AI. At first glance, HUGE win for AI assistance. But while the team was more efficient by the metric we used, we can imagine a number of ways that that metric could give a false impression;
AI assisted code was full of bugs, meaning the AI assisted team wrote more lines because they were fixing and refixing their initial code.
LLMs are bad at context, so the code from the AI team was full of very repetitive sections that should have been abstracted.
The engineers were less engaged with their work, and as such implemented less efficient algorithms than the non-AI team, since they wrote their code with less thought.
The fact is there is no objective measure of developer efficiency, so there is no way to assess (objectively) if AIs make developers more efficient or not.
VG_Crimson@reddit
Was gonna comment this but another user beat me to it, Goodhart's Law.
I said I'm pretty sure that will be the outcome at the end of it. I never said It's what I would want or what we should strive for.
Developers and software engineers are constantly being measured for performance in lots of companies out there. Just because there is no good way to measure efficiency does not mean higher ups wont try. We're disposable to a big company at the end of the day.
19hams@reddit
The problem is, (most) upper management doesn't care about objectivity as long as there is some degree of weak correlation between the metric and the needle they're trying to move
AngusAlThor@reddit
GOODHART'S LAW INTENSIFIES!!!!!
random-engineer-guy@reddit
So which belong to the 00.01% of chatbots
Antique-Echidna-1600@reddit
Tell your friend they are over fitting
Impossible_Way7017@reddit
Or under fitting
EscapeGoat_@reddit
That actually seems completely in line with my own experiences with chatbots.
Distinct_Feature_192@reddit
Actually ai code editor such as Cursor,windsurf , pear or zed are all ai agents and they are useful to us indie hackers . More ai agents here:https://coglist.com/develop/edit-code
overzealous_dentist@reddit
Anything involving googling and parsing random webpages, then doing something with the content is useful, since pages use loads of different content format and the popular AIs understand basically all of them, so a deterministic tool isn't as helpful.
whatever73538@reddit
Imagine i want to buy e.g. „Chewing gum that does not contain sugar nor xylitol“
Currently there is no way to search for that.
I would pay good money (or a commission) for a software that does the online shopping to me up to the final click. I get to check in the end if it makes sense.
There are looots of things that are hard to search for. And current price search engines are shit and sooo often the final prices are wrong, or children‘s clothing are the wrong size, or the item is out of stock
Curious_Start_2546@reddit
I just Googled that chewing gum query and got results. Much quicker than using an agent and waiting for a response.
I guess for complex tasks like "order this shopping list, pick the cheapest best reviewed items", an agent would work well, if you can trust it 100%. But I don't think there's that many tasks like this.
The business case for agents is much stronger than the consumer case to me.
Heroe-D@reddit
And for less objective ad more complex use cases where it'd have to seek from external sources it might suffer from the false marketing and/or divergence of opinions from people that might be biased or just uninformed.
For example if one asked for "A versatile running shoes up to $150 that's good for every type of distance training and competition", if the "agent" just scraps the brands' description even their cheapest shoes are amazing at everything and are for performant athletes, if you scrap blogs and reviews that are often biased because sponsored in one way or the other you won't really find helpful information (some might even be AI generated these days), if you scrap reddit you don't know if the comments has been made by a shoe geek who buys 10 pairs a year and think everything below $300 is trash or by a beginner who thinks running in jordans is fine.
And as you pointed you'd also have to trust the "agent" because nothing can guarantee you that it's not been programmed to favor the pairs they want to get rid of or the ones they make the highest margins with.
Tiskaharish@reddit
oh man the shopping sites would also pay good money to have your shopping cart filled with their chosen goods that favor them over you.
Heroe-D@reddit
And most people would be inclined to trust them and "validate" a bit too easily because of the "one click temptation".
teerre@reddit
Which real job actually involves scrapping random webpages and is also not automated to hell and back already?
Father_Dan@reddit
My job scrapes we pages, small nitche websites.
I've applied LLMs to collecting structured data and had surprising luck. Especially when combined with fuzzy merging / additional post collection data cleaning prompts.
It's not perfect, but it makes a whole class of problems approachable.
edgmnt_net@reddit
What exactly does it help with? In many cases you do want to figure out how the data is supposed to be extracted anyway. Although I guess that can be overshadowed by the lack of a guaranteed stable format and inherently-high error rate if you scrape random web pages. Is that the case?
Father_Dan@reddit
Well, it becomes unapproachable to do this in a one by one basis when the number of sites you are crawling is greater than 100k. It's just not something you could solve and keep stable.
I wouldn't say the error rate is high. Of course there is some error, but it is low enough to be suitable for a production system.
Tiskaharish@reddit
how do you even know if you have an error when your throughput is so high? Do you do it manually so you have a valid control?
Father_Dan@reddit
We xref our existing data-sets when incorporating the new data. Previously, this was still collected manually so starting out I could estimate error rates by comparing to known values.
Additionally, we have domain specific validation so we can throw out hallucinations when they occur.
Mysterious-Rent7233@reddit
All sorts of research jobs. "We are thinking of investing in company X. Build a dossier of everything that's been said about them."
Fidodo@reddit
LLMs have made classification problems nearly trivial, so they are very useful in that regard. Most websites are very poorly structured. Of course LLMs are still very expensive compared to prior methods.
PhilosophyTiger@reddit
ML.net is already a pretty good way of classifying things. That's not even a LLM.
devoutsalsa@reddit
Recruiting on LinkedIn. There are automation tools out there, but they can be really expensive and violate LinkedIn's terms of service. LinkedIn intentionally underdevelops automation on the platform, making using it a very manual process.
maybe_madison@reddit
Actually now that I think of it, I’ve been wanting to build an alternative to TripIt for a while. I’m frustrated with some of their design decisions (and especially the lack of API), but the functionality to parse out trip info from an email sounds like a huge PITA to build and maintain. But maybe I can just pass the email contents to OpenAI and ask it to give me structured output of the data I need? It doesn’t need to be prefect since it’s pretty easy to quickly check if it’s right.
overzealous_dentist@reddit
Entrepreneur, researcher, etc
thekwoka@reddit
Problem is that they can still make mistakes, and you can't actually count on what they give you being accurate.
_predator_@reddit
The problem starts when AI scrapes pages generated by AI. You can see this happen in Bing already, where the top results for some searches are AI slop, and Bing's summary feature generates an answer based on those.
le_christmas@reddit
Any scrapping that is beyond trivial involves getting around anti scrapping features, which AI is really bad at. You can write scripts to scrape unprotected site’s just as easily if not more so than using an AI at scale (and cheaper too)
Konfusedkonvict@reddit
Isn’t this what perplexity does?
According-Analyst983@reddit
If you're looking for a legit application of AI agents, you might want to check out Agent.so. It's an all-in-one platform that lets you create and train AI agents for various tasks. Whether it's for business, education, or personal use, it offers a range of features that can help streamline your workflow. Plus, it's free to start, so there's no harm in trying it out.
lhfvii@reddit
Some people are pushing the idea of agents as a way to replace GUIs which I think is very bad idea, I like to click buttons and get a "ARE YOU SURE???" pop up when doing important shit like handling my personal funds
AftyOfTheUK@reddit
There's nothing that stops an agent from summarizing the actions it is about to undertake on your behalf, and asking you to confirm them.
lhfvii@reddit
While that may be true, that summary could have hallucinations, or the parsing of that summary into the execution could have hallucinations. The non-deterministic component is quite problematic.
thekwoka@reddit
I woudn't mind an agent that can get the transfer/transaction set up for you and then asks to verify.
overall my biggest concern with this AI tools is for uses that impact people that already have really bad research/verification skills, where the AIs can make wild mistakes and then those mistakes propagate.
It already happens with humans doing the work, where one source says something that is wrong and others just parrot it many many times.
LetterBoxSnatch@reddit
The amount of times I've seen people say things like "GPT says" as a source on what should be highly tech literate forums is super disturbing. It's worse than getting the opinion from a human (or used to be), because at least most humans will sell l f censor or provide weasel words when they're not sure of an answer. But humans citing AI hallucinations is disturbing because you know we're going to be surrounded by humans who believe what they are saying is truth based on the confident assertions of AI hallucinations, and finding the real signal is going to be incredibly difficult.
Apart_Palpitation949@reddit
SimplAI: Empowering Enterprises in an AI-Native World
The Simplest and Fastest Way to Build Agentic AI Applications
https://simplai.ai/
SimplAI is an enterprise-grade platform designed to help organizations transform into AI-native enterprises. Our platform enables you to build, deploy, and monitor intelligent AI agents and automate complex workflows—securely, scalably, and reliably.
SimplAI: Empowering Enterprises in an AI-Native World
The Simplest and Fastest Way to Build Agentic AI Applications
SimplAI is an enterprise-grade platform designed to help organizations transform into AI-native enterprises. Our platform enables you to build, deploy, and monitor intelligent AI agents and automate complex workflows—securely, scalably, and reliably.
https://simplai.ai/agentic-process-automation
eat_your_fox2@reddit
Not yet, but The Zuck seems to think they'll replace mid-level engineers in the near future, while having to eat the initial high cost of inefficiency for a bit. It's too early to tell on that front, but Meta could definitely start replacing their CEO & VPs with AI, might be the easier first step.
PracticalBumblebee70@reddit
Zuck thought the future is metaverse, and even changed the company name to Meta. And here we are.
Icy_Monitor3403@reddit
He was right, meta ray bans are taking off and all major tech companies are building AR glasses
dats_cool@reddit
When have you seen anyone with those IRL?
Icy_Monitor3403@reddit
Several times although you really can’t tell them apart from standard ray bans unless you know to look for them.
Regardless a product is successful by several metrics other than number of first-hand encounters with its users.
Really speaks to the quality of this place that I get downvoted - nothing I said was incorrect. It seems that it’s more important to match the sentiment of the group.
LetterBoxSnatch@reddit
Okay but I recently got to try the latest mixed reality headset and the metaverse is legit pretty amazing
DanTheProgrammingMan@reddit
You can't trust a CEO of a public corporation's take on things that will affect their bottom line. It's Zuck's interest to say this even if he knows it's not true, because AI hype makes stock go up.
edgmnt_net@reddit
Assuming investors are dumb, yeah. Overgrowth and near-monopolies help too. Otherwise, no, I wouldn't want to buy into lies and I wouldn't risk saying outrageous stuff for short-term spikes in stock prices, as it damages one's reputation.
thekwoka@reddit
It doesn't need to convince all investors that they will definitely actually do that, just enough that it will be decent enough.
Like he said "mid level" engineers.
What about juniors? Maybe if you're sceptical, you go "no way it will be mid level...but maybe they can get rid of some of the juniors..." which would still be a shareholder benefit.
AlexFromOmaha@reddit
At a place like Meta, "midlevel" means fresh grad. Not senior yet, but not like an intern. The code monkeys. The low-responsibility offshore teams. Productive but not creative or norm-setting.
thekwoka@reddit
But those are ju iors everywhere else...
Is this like title inflation?
Tiskaharish@reddit
TSLA
CpnStumpy@reddit
Reputational harm isn't a problem for investors, they're not the brightest lot, they're constantly hype training from one bubble to another - the short term gain a CEO gets from bullshitting the public, is long forgotten by investors when the bullshit becomes obviously bullshit.
There's no reputational harm when evidence appears because their attention span and memory are too short.
WiseNeighborhood2393@reddit
yeah sure how mertaverse and ntf ended for mba tech bros, the field become rotten when mba monkeys surpass any sensible voice and trick common joe
Agent_03@reddit
I would argue that in ~~Zuckerberg's~~ Zuckerbot's case this might have happened years and years ago.
eat_your_fox2@reddit
lol yeah. Lead by example and such.
farastray@reddit
I can totally see that coming. I can get very far with just struggle-prompting cursor.
If I developed a fancier framework which instructed the agent to use TDD and having an architect and a project owner persona, I would be able to get very far in very short amount of time.
Most of the time, the LLMs will come up with credible solutions but it doesn't have the workflow of an experienced dev. I actually started building a system that I think is a little bit more sane but which builds on these concepts.
thekwoka@reddit
But is it faster than just like...doing it yourself?
CpnStumpy@reddit
I think you're saying the same thing as them when they say instructing the agent to use TDD.
Setting this aside however...
Your comment on documentation driven makes me think of how we've constantly tried to do code generation across years that has been effective, and perhaps is the idea we need with AI too:
Contracts. Write your WSDL, generate the service stubs and client. Write your JSON Schema, generate your service stubs and client. gRPC...
How about write your Software Description Schema, AI generates your software. Maybe the formal language of this "documentation" you describe will bring precision and clarity of test cases which must succeed to our AI overlords.
Or maybe the AI bubble will pop harder than the ad driven .com bubble. Ironically the .com bubble burst because ad revenue was a joke and being wildly overvalued but the model wasn't wrong. Most of the Internet has been developed under the same model since the bubble burst. AI seems like it might hit the same effect - being actually effective and useful, but being massively overvalued right now.
What the hell do I know though, I'm told AI will replace me so I guess I'm just a dunderhead no better than a machine.
thekwoka@reddit
It's good for humans too.
Write the documentation first, fight over it, and then write tests to it and then implement.
edgmnt_net@reddit
I'd say it's actually pretty much the same problem as hiring inexperienced staff and scaling horizontally. Sure, you can hire thousands of juniors to do inconsequential stuff and earn you money that way, but there are limits to scaling that and it doesn't work well for a lot of businesses. In fact, we're seeing this happen with all the layoffs and failed projects, as these things crumble under their own weight beyond some short or mid term gains, due to lack of proper design, implementation, maintenance, scoping etc.. The fact that you can get something working quickly can be very misleading. This isn't the right kind of complexity that software deals with very well.
lhfvii@reddit
Why only mid-levels? Have they already sacked JRs?
eat_your_fox2@reddit
Well...you don't have to sack the engineers you don't hire to begin with.
CANT_TRUST_DONALD@reddit
Meta is hiring E3s.
lhfvii@reddit
hey man replacing means exchanging one for the other, so somebody has to be sacked (?
bafil596@reddit
From a dev's point of view, AI agents replace traditional programming control flows with LLM decisions. Which can be good in that they can handle more complex situations or cover edge cases that were not anticipated, but it can also be bad due to the lack of interpretability, latency, and accumulated errors.
There are definitely hypes at the moment, but this post explains AI agents well and may help you understand them - what they are, their pros/cons, how to build them, and how to design product experiences around them. As the post suggests, the AI agents may not always be the better solution than "simpler deterministic algorithms", it depends on the specific task.
jfo93@reddit
Woo, I can chime in on something. I’m by no means an expert but towards the end of last year moved into a different team at work that is using agents to automate long, complex tax related issues (averages 90 hours of tax advisory work) using 30+ agents, with more being needed. To put it in perspective, while it’s not finished, our project sponsor within the business was so shocked by how effective it is during a demo that she asked us to dumb it down and require additional human intervention as she was worried about the potential impact on her team’s jobs.
Agents are certainly overhyped for simple stuff but when you start trying to automate complex processes it does get quite exciting from a dev perspective. (Though I must say refining prompts kills my soul when it’s just not quite going right haha).
almost1it@reddit (OP)
Interesting. Can you share more about this? Do you have examples of what these complex tax issues are? Also what does the workflow look like or where does the LLM fit in?
Wouldn't be surprised if a lot of agent value is currently in niche backend processes that aren't directly consumer facing. I also imagine there's a lot of regular "CRUD" work to get agents to produce action.
jfo93@reddit
I can’t go into a lot of detail but at a high level it’s to help clients understand their tax situation if they were between multiple countries. So the agents are given custom tools to pull relevant information from legislation from specific countries, take assets and income into account, they’re also given tools for calculating total income etc.
In terms of llms, we pair one model e.g o1 that performs a task, and another lesser model e.g 4o that acts as an evaluator for the response, which helps to reduce hallucinations and also to keep the other agent on point.
You’re spot on there about the backend, most of the work that’s happening isn’t seen on the UI, it just surfaces things that need approval or additional information.
It’s certainly not perfect but I’m interested to see where we’re at in a month.
WiseNeighborhood2393@reddit
and how you verify the information shared through AI? what will happen if ai spit something nonsense
jfo93@reddit
At each key step, we send up the current agent(s) output (which might be broken up into a list of points) for the user to approve or to provide feedback on to get the agent to retry that step.
What we’re currently finding is it’s a bit too thorough in comparison to our human advisors. Not necessarily a bad thing, but they want to be able to pick the key parts that might be of interest to the clients.
pickering_lachute@reddit
I have a very similar use for a customer in South America having to handle state level tax returns.
And love your approach with the evaluator. One of my fave blog posts talks about using LLMs as a Judge.
AchillesDev@reddit
A lot of weird takes on here from people who don't really use or make agents or really do much work with LLMs in general - the constant posts about 'chatbots' makes this clear. I've been on both sides, using them, building frameworks to make them or incorporate third party tools, etc.
The real use case for them is if you have an LLM that is doing something (normally providing some natural language interface to a bunch of data or to some very specific task where a probabilistic response is useful) and it needs some kind of extension to do something deterministic.
For instance, let's say you're creating an activity scheduler. The LLM comes up with activities and gives you some dates to do them. Great! But now you want outdoor activities, and they should adhere to that day's weather. A vanilla LLM can't get the weather for a given week. But if you have an 'agent' (basically a go-between between the LLM and some deterministic code), it will be able to call the tool/function based on the request given to the LLM. So we could write a tool that takes a zip code, retrieves the weather, and returns it in a JSON format, then allows the LLM to incorporate that information when generating its response.
Chip Huyen has a great article on agents that's worth reading for...pretty much everyone in this thread.
-Mobius-Strip-Tease-@reddit
Yea, the takes here seem to be coming from people with no experience standing them up and actually using them as you described. I recently set one up for our order support team. It’s purely internal and mostly just an advanced search engine to a huge sharepoint. Hundreds of pdfs and other documents saying what to do in which situation. Like you said, it’s a natural language interface for the data that is way more ergonomic in some cases than traditional tools. We’re just in the beginning of using it but it seems promising so far.
almost1it@reddit (OP)
This is a pretty good TL;DR for agents. Thanks for the link!
ramenAtMidnight@reddit
Depends on your definition of “legit”. I work at a fintech. One of our teams have recently (around 6 months ago) deployed an agent that helps users record, analyse, give suggestions for personal finance. Solid engagement on that bit so far. But hasn’t shown impact on revenue, or even DAU, MAU, retention.
Personally I don’t have much “thoughts” on it. Initiatives like this come and go. If they don’t bring real business value it’ll prolly fizzle out this year. Doesn’t mean the tech is rubbish. Like any other techs, it’s the business/product application that decides if a thing survives or not.
ravixp@reddit
I might be missing it from the description - what makes it an agent and not a chatbot?
ramenAtMidnight@reddit
To be honest I’m not even sure the formal definition. The way I understand it, an agent can “do things” instead of just Q&A. For instance, logging an expense, updating a record, setting up a budget etc. All that can be done via the normal UI in our app of course.
Rough-Yard5642@reddit
That’s really cool actually. I’m generally bearish on these agents, but this example is one of the first where I think it could be really big.
TruthOf42@reddit
I think a medical assistant for paperwork and such will eventually come out.
It listens to a convo you had with a patient and based on what you both said it asks the doctor if it wants to create a prescription for whatever.
It also auto fills in summaries and other stupid doctor paperwork. It would obviously need to be checked over.
Also, based on conversation, maybe it proposes some other likely alternatives worth considering.
I can also see similar applications for lawyers, where it auto fills documents, and maybe based on input about the case and previous motions and such it suggests other motions to file, or questions to ask witnesses, or other things to consider.
Basically it just becomes an admin assistant that remembers everything that's happened and everything that people in similar situations have done before, and suggests things you might not have thought about
ravixp@reddit
I wish more AI talk was like this - imagining systems that can help people, instead of trying to replace people. It’s just a better fit for what the tech can actually do.
HippyFlipPosters@reddit
This is exactly what I'm building at my company currently. I was skeptical of the idea at first, but with the correct safeties in place it's proven pretty popular with users so far.
No-Ant9517@reddit
Every time I see something like “No seriously I’ve really found how to make LLMs useful for development” I’m like ok cool I am trying to have an open mind I don’t want to be left behind so I read the blog post or whatever and it’s like “I build all day so the chat feature helps out a lot” or “LLAMA has a huge input context so I can ask questions about documents” and I’m like ok but we talked about these use cases already
deadwisdom@reddit
An "agent" is just an AI tool that runs repeatedly. That's all it is. It's a cron job. We have used them for ever.
deZbrownT@reddit
Depth of this thought is underrated.
AchillesDev@reddit
Overrated by having positive upvotes. There's nothing about agents that require them to "run repeatedly" like a cron job, tools and agents are separate abstractions, etc.
deZbrownT@reddit
We are talking about application not how, but why.
AchillesDev@reddit
Even more reason why OP's post is overrated.
deZbrownT@reddit
Ok
WiseNeighborhood2393@reddit
i will short 100,000$ that 99.99% those g'agentic aiss so called tech enthusiast/prompt engineer/mba monkeys to get their money, 3 IQ primate could understand how bayesian optimization/universal approximation theorem, why so called GenAI could create nothing but spam and half baked solutions, but It is easy to lie and tell what people would like hear, pathetic.
Buttleston@reddit
I read an article a few months back about security researchers building AI agents to develop custom exploits for a web site. Point the agent at the url and say "find me an exploit". The success rate seems decent and I can't remember - it was a little cheaper or a little more expensive than paying a black hat russian hacker to do it for you.
So at the moment that's not very ground breaking - it would need to get significantly cheaper. But if it cost, say, 90% less you could say "find me any exploit for any of these urls" and cast a much wider net. I can see malware/ransomware groups etc liking that a lot.
AuroraFireflash@reddit
The big problem with LLMs vs other existing analysis tools is the electricity cost. i.e. they have a huge problem with efficiency
Impossible_Way7017@reddit
I have doubt, I tried using it in CTFs and it wasn’t able to solve anything past an easy challenge.
I’d be curious how it compared to the results of burpsuite or zap automated scan?
whatever73538@reddit
I would guess you could train it up for CTFs, by finding ctftime writeup for a similar challenge. „looks like to have to do chinese reminder theorem, or „house of someshitorother““
Impossible_Way7017@reddit
How would you train an agent? Best you could do is create an embedding a database to try and augment prompts.
Buttleston@reddit
What do you mean when you say "I tried using it"? You used the tool in the paper I mentioned? Or you just used some LLM directly?
I have the article on my work computer, I'll try to remember to grab it tomorrow
Impossible_Way7017@reddit
LLMs directly, I’d be interested in the article.
whatever73538@reddit
I don’t do a lot of web hacking, but some binary exploitation.
Naive approach with source code (is there a potential bug in this context window full of source code) did not work for me.
Ask for any memcpy() in source code: „is the size calculated at runtime?“ did work, but i can also do that with a static analysis approach.
I had decent results in reverse engineering. A binary has 10000 functions all named like sub_0074ff230. If an AI renames them „maybe_sqlite_open_database“, it’s awesome when it is right, and no loss if it is wrong.
wh1t3ros3@reddit
There are automatic tools that did this before LLMs came out lol.
Buttleston@reddit
sure, and I've used those, and I don't think it's really quite on the same level. Granted I only read parts of the paper but they said it was on par for what you'd pay someone to do, which presumably would be after you exhausted existing automated tools. I can probably find the paper if you're interested.
(I'm not any kind of supporter for AI, I think it's completely overblown, I'm not claiming this is any kind of radical outcome. It's either a little better or a little worse than paying someone 20 bucks to do it)
Impossible_Way7017@reddit
I can see it maybe better at writing the report aspect of a pen test, but I’d be curious if it actually identified any actionable vulnerabilities vs. Just provide a nice Write up of some high effort low impact potential issues.
Buttleston@reddit
Allegedly it found vulnerabilities at a pretty high rate, and produced code that exploited it
wh1t3ros3@reddit
Yeah I agree would be a marginal improvement
Buttleston@reddit
This can also be used for good, of course, automating at least SOME level of penetration testing on your own domains.
femio@reddit
I made a post about how I've used it at work and on freelance projects the other day, tried to give as much real-world details as I realistically could
https://www.reddit.com/r/ExperiencedDevs/comments/1hy7pst/has_anyone_else_found_serious_value_in_building/
SherbertResident2222@reddit
A 40% error rate…? May as well be just throwing shit at a wall.
femio@reddit
why don't you try reading what I said again?
SherbertResident2222@reddit
I’m surprised it took you months to realise AIs were a bit shit.
It took me an afternoon.
ezaquarii_com@reddit
All use cases so far are adversarial.
My big worry with consumer AI is that it's good enough to be weaponized, but not good enough to be useful for anything good.
almost1it@reddit (OP)
Weaponized how? Do you mean generating more brainrot content to scroll through for engagement?
ezaquarii_com@reddit
So far it is useful in 4 areas: - summarization - spell and grammar checking - language learning (conversational partner) - search engines
But those are not agents - it's literally ChatGPT input form.
I'll skip hardcore engineering like voice recognition, image recognition, pattern matching, robotics, control and all sorts of other non-linear engineering problems. The impact of AI on those is immense, improving all sorts of products, but it's not directly consumer facing.
Naive-Treat4690@reddit
So on point with vodafone here
Jbentansan@reddit
Gpt4o voice mode is consumer facing though? its integrated in the chat gpt app?
username_or_email@reddit
If you imagine a very simple scenario, like "if bitcoin > x sell else if bitcoin < y buy", sure. But trading models are never that simple. How are you going to build a deterministic algorithm around tabular data with 10, 20, 30+ columns, where some of the cells might be empty or have aberrant values? How many nested if/else statements can you write to cover all cases, and how do you expect to be able to make any intelligent decisions with that many features?
Many people misunderstand the problem that a lot of ML algorithms are trying to solve. At some point, and it doesn't take that much, data becomes unintelligible to humans. ML algorithms automate the process of extracting information from data, which is necessary for even modest datasets. If I give you a table with 100,000 rows and 20 columns and ask you to use this data to assist in trading crypto (or stocks or bonds), how would you go about doing that without ML?
Another thing people misunderstand is the nuance between determinism and non-determinism in ML. All ML models are deterministic. The same input will always give you the exact same output. Chatbots and the like sample over the model output to simulate non-determinism, but that has nothing to do with the underlying model. So in practice, there is no probabilistic model per se. However in theory, models are probabilistic in the sense that they deal with uncertainty. And that is simply the language of most areas of science, including data science. In most cases it just doesn't make sense to think in deterministic terms. How can you claim to know with 100% certainty when to buy and sell crypto?
lordlod@reddit
The term AI Agent seems to be very broadly applied and very overused. The cynic in me suggests that as the general AI hype wave is fading companies are pushing AI Agents as a way to stretch their ride.
That said, I think agents could be useful in spaces where an error rate is tolerable.
An example is level one support where much of it is already scripted or precanned. An AI agent could process the incoming request and preprepare the response for the support worker so they can skim, alter if necessary and send it out in their name. A degree of error is fine, significant errors will be caught by the worker. Long term you can monitor the alteration rate and start auto-responding when the AI is confident and continue to pass the more complex jobs to a human.
Another obvious area where error is fine is anything that involves speculation. Like recommendation engines or ad systems, things where your inherent failure rate is already high are naturally going to be tolerant of AI based errors.
However the hype machine seems to be suggesting that they can build some kind of generic AI agent that you can buy/rent and it will fix your problems like fairy dust. That seems less likely to me, they are going to have to be tuned for the task. Lowering that tuning barrier is probably going to be the key to adoption.
AchillesDev@reddit
Or it is an innovation to improve the issues that vanilla LLMs have.
Thommasc@reddit
> That said, I think agents could be useful in spaces where an error rate is tolerable.
I work in the science field.
According to that statement, we should throw the AI field in the bin.
But then we have another major issue: science reproducibility crisis and that's because we've been using 100% human to do scientific research.
So maybe it's still to use AI to automate some parts of the workflows...
It's really hard to predict the world in 2035 at the moment.
almost1it@reddit (OP)
That's what I'm thinking too. Especially in financial use cases where we expect agents to make transactions on our behalf.
NullPointerJunkie@reddit
I predict at some point in the future someone in fintech is going to go all in with an AI agent and give it the ability trade a large portfolio. The agent will get it all wrong and the portfolio will lose a very large sum of money. Mostly due to lack of guardrails and oversight (because it's AI why would it need guardrails or oversight???)
It's been done before just not with AI. It's almost as if history is destined (doomed??) to repeat itself.
Rumicon@reddit
The only somewhat passable use case I’ve seen is we use them to monitor dev support slack channels. When someone messages they ask a bunch of questions and generate a ticket, then ping an actual dev for review.
Basically a better automated call operator.
DataDecay@reddit
Here's a spell-checked and grammar-corrected version of your text:
That's funny, I noticed a huge uptick in "AI agents" appearing all over the place recently. I work on some AI initiatives myself and thought, "Are these AI agents the same ones I've already been working with?"
Long story short, they both are and aren't. There are many implementations of AI agents, including those built into LLM endpoints like Azure OpenAI and vanilla OpenAI. This latest uptick seems to stem from SDK extensions of the concept via popular tools like Crewai.
At their core, AI agents aren't bad; they offer a way to break up prompt engineering for different systems and build specific tailored tools for each. However, do I think this is some new actual hype wagon? No, it's just another marketing campaign.
bonesingyre@reddit
A team at the place I work atbuilt an AI agent that talks to clients about their claim (healthcare) and can accept documentation to push the claim along. Gives front line workers the slightest reprieve.
hockey3331@reddit
I'm not sure if I understood the question correctly, ecause AI agents already have applications?
Waymo already has a fleet of self driving cars. Thats an applications.
Researchers have used ai agents to help them find new drugs that would have taken much longer for humans to find.
Theres an AI agent sitting in our meetings at work taking notes and sending us a report of the conversation.
I can also see use cases for tutoring, psychotherapy, etc. The AI agent might never replace human to human contact in these roles, but it has the advantage of being available 24/7 and likely be much cheaperto use. So, it could be good help in the day to day.
Those are few applications that jumps to mind. Imo, the tech, especially LLMs, is evolving so quickly that we're mostly limited by our imagination of what these tools can be.
space-beers@reddit
I've tried Sintra to try and automate some bits I don't have time to do but but so far it's just suggestions that I have to do. I don't want suggestions for a social content calendar - I wanted them done for me. Some of the ideas are good but they all need a human to act on which defeated the purpose to me.
eyoung93@reddit
I tried Devin and it did not meet my expectations of a junior dev on 3/3 tasks. I had to babysit it every step of the way and it put up PRs that didn’t build or blatantly didn’t work at all. It was a nightmare, I asked for my money back and they gave it to me.
drumnation@reddit
I can think of a bunch of ways. A better way to explain agent is to compare it to chat. With chat you send a message and you get an ai response. With an agent you provide a goal and the agent sets up a task list for itself, tries to do those tasks, tries to validate that they were done correctly retrying and changing methods if it fails, until it completes the objective. It’s a loop of LLM requests where it interacts with itself and the digital medium it’s working in. Like a fish in the water. The applications are endless for anything where the water is data.
DogAteMyCPU@reddit
my work is trialing cursor and its kind of useless. just hallucinates optimizations that just break my app. its decent at writing tests, but i feel like im not saving time because i have to go through each test. the only thing I like it for is telling it to create plans for completing tasks to an md file
le_christmas@reddit
AI became a sinkhole for snake oil salesmen when it started to be able to sound realistically human, and people were so subconsciously nervous about AGI that they were willing to pump money into advertising and funding rounds. The reality is that without actual agi or at least more flexible models, AI as it stands commercially today is a solution looking for a problem. They haven’t found their problem yet still, and that is why they are pivoting to pushing you “agents”, because they don’t even know what their models can solve. It’s basically them pitching their weakness as a strength
FatStoic@reddit
I worked at a company that made voice-based AI agents and once we were onboarded with a client the agent would handle 60-80% of incoming call volume without ever needing to hand off to a human. This was three years ago before LLMs really took off.
AI isn't a panacea but the base technology and the application are both in their infancy.
le_christmas@reddit
How sure are you that they weren’t using mechanical turks? Haha
FatStoic@reddit
100%. We used mechanical turk for some data prep before it got fed into the model but everything user-facing was done via computers and computers alone.
le_christmas@reddit
Nice! Yeah we’re looking into using AI for robocalling as well actually too and it’s been pretty fruitful. I guess what I mean by what you quoted is that it’s a solution looking for a business-viable problem. AI can definitely do things, but nothing that’s sellable enough to be sustainable and have a big enough market cap it seems. It is a solution, it just doesn’t have market fit right now (I’d say the current applications don’t achieve market fit because it’s still not sustainable for companies like openAI)
FatStoic@reddit
The hype is massively out of scale to where the tech is right now but there are many AI products that are actively being used to replace or enhance humans to deliver real business value and real dollars.
Even-Tomato828@reddit
I recently came across a news story about a hotel chain touting their use of AI for the entire customer experience. While AI holds a lot of promise, I haven't seen it deliver enough to get excited about it. Seeing AI being used everywhere kind of cheapens its value, but we'll see how it goes.
hobbycollector@reddit
I would only trust it for generating unit tests. Full coverage would be the metric for success.
ub3rh4x0rz@reddit
The only legitimate use cases are those where the outputs produced can be quickly verified via traditional means, but the output would otherwise be expensive to produce.
Every other case is about replacing things with inferior products that consumers are willing (if reticent) to accept.
dashingThroughSnow12@reddit
For the automated portfolio management, we’ve already had that for years. As a Canadian without a large asset base, the main hurdle being legality, not technology.
I digress partly. I used to be into a lot of finance and business podcasts and websites. Since 2023 I’ve had to stop by and large because of how many poorly informed tech opinions these allegedly intelligent people have.
An example is about Apple. Before June 2024x you could easily find hundreds of articles and podcasts episodes about how Apple is late to the AI game and has to catch up. Apple, the company that for a decade has AI as part of its two major events each year. Apple, who was putting a dedicated SoC in their phones for a better part of a decade to optimize neural engine tasks.
Here’s what I learned since this whole LLM craze kicked off: non-techies who don’t even understand the current state of the market should just be ignored.
bsenftner@reddit
I've got a lot off AI agents I've written, they are tireless educators that help with understanding and help with communicating to others. I believe that is their best application: not to replace people but to enhance and augment people as they work, operating like a fresh new PhD hire that has been paired with you because they are smart but don't know how things work "around here"; you act as their hallucination prevention, while they help you do your work, not doing it, but advising you as you do your own work. I'm not talking "coding helpers", I'm talking white collar jobs like attorneys, paralegals, anyone in accounting, anyone in finance, anyone in sales, anyone that works on a computer using office software. That's what my agents are: office worker support.
And my agents are in active use, I've got law offices using them, some professional writers using them, and some real estate agents using them for financial work. If you're curious, you can use them too at https://midombot.com/b1/home
path2light17@reddit
the fact we have such a long discourse going on here makes me thing- there isnt a clear cut/big use case yet.
rabbit_core@reddit
I've been trying to automate myself out of a job with one and haven't had much success.
123_666@reddit
What I would like to have an AI agent to do for me would be stuff like:
etc.
almost1it@reddit (OP)
This would be an interesting use case. In other words a generic "life admin" bot. Surely someone or some team is already working on this. Although wonder if this is possible with the current state of the tech.
captain_obvious_here@reddit
To me, "IA agents" is an attempt to squeeze the current IA trend a little more, without any added value.
Most services don't need IA to be useful.
jl2352@reddit
An area they are excelling at is in creative content. There hallucination is a non-issue. Either because it’s encouraged, or it’s a silly one off and doesn’t mess up the user.
Examples include bots for talking to, or for generating stories. Examples in the future could include pod casts or TV with content tweaked for your benefit.
The BBC did a prototype system about fifteen years ago on this idea. Imagine if you’re watching a soap opera, someone puts a song on in the background, and it happens to be a song you like. Another example is daytime radio dramas having the weather in the episode match how it is where you are. If an AI chatbot thinks it’s snowing when actually it’s raining, then who cares.
Another use case is AI agents in games. Think NPCs in Skyrim having an infinite number of lines instead of a set few. I guarantee people will be working on this already, and I can see it being pretty huge when it comes.
A lot of this is pie in the sky for anything beyond generating text, images, or audio replies. Tweaking the content in a radio show per listener will be a huge undertaking. However it’s definitely an area AI will be able to excel at.
thekwoka@reddit
Like any new tech, there is a lot of wishful thinking and throwing shit at walls.
We still are using AI in more and more and more things every day, but not as these like "chat bot" style things very often, and many are at a place where it's not YET good enough for the things we really need.
metaphorm@reddit
the canonical use case is a chatbot. anything that has a user interface that involves interactive and iterative feedback from the user.
Mysterious-Rent7233@reddit
A chatbot is not an AI agent.
decamonos@reddit
The common useful applications of AI Agents are interfaced with using a chat UI. And they are a type of bot... so, a chatbot is not explicitly a bad description. Kind of a all rectangles are squares type situation really.
denialtorres@reddit
It's just a rebranding from the word "wrappers" to make it sound fancy
ZestyData@reddit
Coming out of a few select hugely-funded 2024 startups we're seeing the start of general-ish agents that don't require third party integration.
In terms of B2C we're seeing psuedo-personal-assistants. "hey alexa, push back my dinner res by half an hour and can you buy my mother-in-law a bday gift? send it direct to her. OH also, please fill in my jobhunt spreadsheet with my latest interview updates"
In B2B we're seeing agents that can generic white-collar office worker shit. Update client records on salesforce and then order ABC and file expenses in platform XYZ, and update the ticket in whatever ticket-system.
We're absolutely getting there this year.
nappiess@reddit
If you know how LLMs actually work, you'd know none of that would even be possible without a LOT of traditional software engineering code backing it up. It would essentially be likely using AI as a way to route requests to traditional APIs. So basically just a normal web app with an AI text parser.
ZestyData@reddit
I am a former lead LLM engineer and now an LLM researcher at one of the big labs that I'd rather not disclose.
What I said holds true. "Basically a normal web all with an AI text parser" is not even reductive it's simply wrong.
nappiess@reddit
Understanding what LLMs are, it's quite literally incapable of actually making decisions. But feel free to try and prove everyone wrong!
decamonos@reddit
Do you perhaps want to explain your definition of a decision and why you think a probabilistic model taking input and having it's output be to call a function is not 'making a decision'?
B_L_A_C_K_M_A_L_E@reddit
This is a distinction without a difference. LLMs can classify which actions are likely to be most appropriate given some scenario, and then interacting with some interface to cause this action to be taken.
ZestyData@reddit
You're making a claim that we simply do not have the philosophical or mathematical framework against which to draw boundaries.
I'm not saying an LLM is capable of making 'decisions' or not, but I'm saying its damn bold to claim so definitively that it is "quite literally incapable" of doing so. Would love to see your empirical proof or experimental evidence - of course with respect to your chosen definition of a 'decision'.
vitaminMN@reddit
We are? Who trusts the output of these?
ZestyData@reddit
Fair to ask. As it stands we measure all LLMs and Agentic systems by benchmarks. There are a series of general-agent benchmarks: WebArena, WorkArena, and more. They're all open benchmarks at the moment but they are peer-reviewable. Anyone with access to these closed agentic systems can test them on benchmarks. Understand I'm talking about incredibly cutting edge stuff, and the subfield is blossoming. There will be more evals, including closed evals that are less easily gamed, there will be open source systems. Startups are already trying to find niches in building agent systems that outperform at specific flavours of task to target specific industries.
All of that is to say performance can be quantified and compared to humans.
To your followup question about quantifying the financial (or other costly consequence) cost of errors; first that's a great point honestly, and not yet set in stone. I would love to see a benchmark that encompasses the severity/degree of success/failure. (Probably case-dependent, and how do you compare different dimensions of failure).
I'd also imagine the first lawsuit against a company serving a proprietary AI Agent that causes great financial loss will be a landmark ruling in case law. The big questions for me are less about technicality and more about business & law. Will companies & individuals in 20 years all have AI Agent insurance to protect us from the agents we 'buy' making mistakes? Who knows
(Thinking about it more, end of this year is optimistic given the rate of growth over the past 12 months, but within 2-3 feels really likely given the rate of growth since 2022)
Mysterious-Rent7233@reddit
It's a sign of the way AI conversations drive people insane that your solid comment with clear references and ideas, gets downvoted (-1 right now) and a top-level comment of "Fuck agents" would certainly get mostly upvotes.
SpecialBeginning6430@reddit
Verifying the output is going to be trivial compared to having to carry it out.
vitaminMN@reddit
But the cost of it doing the wrong thing is high. What if it ordered the wrong thing, writes a bad email, records something incorrectly etc.
Anything that requires judgement and some context seems ripe for errors
SpecialBeginning6430@reddit
I'm not doubting that it will be unreliable, I'm doubting that it's unreliability won't be enough to dissuade someone from replacing even a 20k waged worker with an AI in this particular case scenario. Even at that, people are devolved to doing not much else except proofreading AI errors, with the eventuality that the AI teaches itself the proper routine without needing to be proofread.
darkrose3333@reddit
Getting where this year?
ZestyData@reddit
I intentionally didn't define a hard goal, i'm no explicit oracle! :D
We already have mediocre closed-alpha agents that can do general web tasks with limited success.
I believe by the end of the year we'll have offerings that as a consumer I'd actively want to use to help me, and we'll have offerings that some businesses actively choose to purchase to boost their velocity (/ replace human labour). I don't know how general
And btw i don't mean AGI - not necessarily some being that has general intelligence. Just an agentic system of non-AGI LLMs that can generally do enough computer-based tasks with enough success that it makes them financially viable.
farastray@reddit
Finally a reasonable comment. So many devs are defensive and its completely mind boggling. You are not that special! Its something we haven't been told very often but its as true for us as it is with any other profession.
fasttosmile@reddit
Great summary
Feroc@reddit
I think they can be rather good as an "advanced search function" for larger documentations, if they use those documentation as a primary data source and doesn't hallucinate on random training data.
compubomb@reddit
I have, but in the heavy sciences. They are useful in doing complex data permutations, and are useful in cleaning up data in an automated fashion. So for research purposes, it's like paying an assistant to help you clean up a lot of data to use it for more formal data analysis tasks. I don't know what the "formal data analysis" part is, but I've seen it done.
PaxUnDomus@reddit
They are good for sucking out money, and for managers pissing me off with "CAN WE DO THIS WITH AI"
No Jamesh, we are not god, we did not create sentient beings, just a retarded toddler with extremely good memory.
TheOnceAndFutureDoug@reddit
Yes, I have.
So the vast majority of support requests, regardless of technical level of the product, are the same bullshit requests. 99% of the time a good LLM trained on your past support tickets can suggest exactly what is needed to fix what is almost certainly a common issue.
I've see this used effectively in Discord servers for this exact purpose, too. Super cool.
The problem is that too many systems make it hard to get to a person and then companies cut the number of actual people who can help. What they should be doing is using this to filter out the low-level basic stuff and leaving the real problems in the hands of highly trained and capable support staff.
i_like_trains_a_lot1@reddit
I am working in some agents myself to automate some of my work.
One will do file management and retrieval, and I want to be able to send it messages and files via WhatsApp or email and tell it "put this invoice in the invoice folder for January 2024, and rename it to the company name + invoice number". And then to be able to ask it to "give me in an archive all the invoices for 2024, and name it documents_january_2024.zip"). Things like this, I am currently managing like 3-4 workflows that do more or less the same kind of things. I tried to do a file management system to simplify it but there are just enough differences between the workflows that completely automating it increases the scope quite a lot (a lot of the little things need to be configurable).
The 2nd one is something to help me research, pick and ideate social media posts, and eventually be able to post things online on various channels.
Imo, what is missing from the "AI Agent" ideas and implementations nowadays are:
- the ability to do work in the background independently
- the ability to interact with multiple tools at once and compose/execute simple flows (eg. the example with picking up files in a certain folder, rename the files, put them in an archive, rename it, and then send it).
- the ability to initiate conversations. I would want to instruct it something like "three times per day, send me 5 tweet ideas. I'll choose 1 or more from them, and you'll schedule them for posting using the "personal profile" schedule"
the_ur_observer@reddit
The largest impediment of current AI systems to practical applications is a small context size. If you have a codebase, you cannot stuff the whole thing in chatgpt. Agentic AI (basically deciding like a human, maybe I should go into this directory and click on this file, ok maybe this file, ah here we go. Ok this references something elsewhere etc). There are less advanced ways to do this, such as chunking RAG algorithms, but these have obvious issues. Agentic AI systems could genuinely solve the problem of context size, much like a human might retrieve information written down since they can’t keep the world in their working memory.
It’s a massive boon to practical applications.
blingmaster009@reddit
That VM or batch job or stuck cron script that needs a restart......I can see AI agents detecting and fixing it without getting a production support person involved.
FatStoic@reddit
You're getting downvoted but I can 100% see AI-based operations troubleshooting and eventually AI-based compute management.
almost1it@reddit (OP)
Does this need an AI agent rather then a simple script to check if job run time more than X minutes, restart? Not shutting down the idea but just curious what value add the agent is here?
blingmaster009@reddit
An AI agent could just handle the restart instead of the script and possibly one AI agent could handle additional production support scenarios of a simpler nature instead of various scripts and manual interventions based on SOP's.
Another use I could think of is AI agent handles manual tasks like provisioning new user accounts or keys based on a approved request even if the task involves logging into a UI and navigating some screens that involve filling out form and pushing buttons.
jesusrambo@reddit
Lots of the applications I’ve seen are not useful because they’re just rehashing the same few ideas. Limiting the use cases to chatbots and customer support is really narrow minded
I’ve experimented a little with AI agents for planning and executing projects. I.e. for home DIY projects:
Etc
It’s interesting to see them work together and how they interact. I’m not sure why everyone laser focuses on paperwork, IMO the most interesting use cases will be more paradigm shifting
almost1it@reddit (OP)
Interesting, how did the project turn out? Do you think the real value is going to come from many small agents working together at scale?
lhfvii@reddit
I can see that devolving quickly with a few hallucinations and then the whole things turns into a negative feedback loop
almost1it@reddit (OP)
Fair point. Reminds me of a micro service architecture but applied to agents where in practice we just end up with agents that are highly coupled together and we just end up in a distributed monolith.
jesusrambo@reddit
(Sorry for some fragmented replies here)
I think with AI, there can be benefits to a distributed monolith.
Training smaller, more domain specific models is exponentially easier than huge models, for example
Hardware resources for running them are sort of a step function in practice, too. I.e., consider someone running a model at home. Available VRAM is a hard constraint, so splitting up a model so it fits is more practical than buying another GPU
jesusrambo@reddit
Sure, I think that’s a reasonable concern. That said, it’s always easy to poke holes in new ideas, but that doesn’t mean they’re unsolvable problems.
For example, like I mention in another comment, one strategy to mitigate that is having agents responsible for QCing other agents’ output and trying to identify hallucinations
jesusrambo@reddit
They collaborated in the way I was hoping for — need to work on improving the instructions, but to me, compelling enough to think there’s a there there. Lots more to play with
FWIW It’s not too hard to set something like that up in Amazon Bedrock and try it out for yourself if you’re interested.
I think there are lots of advantages to small models working together — lower hardware requirements is a huge benefit IMO, as well as being able to train models for more specific tasks
It’s also a more modular/flexible structure. Another commenter mentioned hallucination feedback loops — that’s a reasonable concern, but you can also do things like add another agent responsible for QCing the output of the others. (And another for QCing the output of that one, up to some diminishing returns)
Independent_Pitch598@reddit
A nice open source agent for development https://github.com/All-Hands-AI/OpenHands
ccricers@reddit
Sounds similar to the DAO concept that crypto bros tried to sell some 6-8 years ago. Only they seem to have reeled back from the concept of entire organizations and corps to just talk about the automation "agents" themselves.
Atupis@reddit
LLM taking action is a legitimate concept, but the hype is now on overdrive because agents are rather fickle. So, will agents do senior engineer jobs next year? Probably not, but in 2035, I would say yes. This is similar to the dotcom boom, where people hype legitimate technology that is not there yet but it will, in the long run, change how we operate.
lhfvii@reddit
All of that is making a comeback with the Crypto x AI new vertical which actually feels and probably is peak grifting
almost1it@reddit (OP)
The whole crypto x AI meta really feels like the industry had nothing new to show so it decided to attach itself to the latest tech trend. I cringe when a crypto bro says things like "crypto is AI money" non ironically.
I technically get what they mean...AIs are bots and crypto is a type of currency with an open interface which I guess makes it easier for bots to leverage compared to traditional money. But there are still too many gaps that makes it cringe when they try to hype it.
NotACockroach@reddit
I reckon start small. I just used one recently to create meeting note pages from google events. No summaries or anything like that, just date, time, attendees and some boiler plate headings.
It saves me enough clicks that in the middle of a meeting I can think "shit, I should write this down" , and have a page ready straight away.
wowitstrashagain@reddit
At my work, we use AI to automatically generate reports from machine inspection forms for different factories. Despite having a standard, each user fills it out differently and in different languages. AI does not care what language it's written in.
The generated reports require you to cross examine with actual data, but so far, the reports have been correct.
Based on the reports, we've gained some unique insights. Like what times of the day are inspections the most 'indepth' by inspectors. How different users focus on different things better. So we are creating a cycling system (instead of one inspector to one machine, one inspector inspects one thing on multiple machines). These things are basically impossible for our company to notice because they aren't statistics that can be generated automatically without LLMs. And they aren't gonna spend time hiring someone to find issues they don't even know exist or not.
I can see AI agents doing a lot of abstract data analysis.
RelationshipIll9576@reddit
I see it like this: agents are really just asynchronous workflows. Each step can be LLM-based, traditional programming-based, or even manual tasks.
There are a ton of processes that this fits into. From customized emails (campaigns), to market/product/competitive research, to scientific research, to manual IT work like provisioning new machines and accounts. Sure, you can argue that all of these can be handled by traditional software, but traditional software can't easily be genericized to fit a bunch of use cases out of the gate (if at all).
There's another aspect to this though related to context windows. LLMs have limits on how much data it can process. One potential way to address that is to break the problem space up into smaller chunks and iteratively process them and batch the results into larger and larger chunks. There are problems with reliability with this give hallucinations and bad processing piling up at each step, but once things become more stable, this seems like a potential approach for getting around these limits.
I have a small side project that's exploring this currently - using AI to summarize my emails so that I don't have to open each one and skim through it to see if it's useful/relevant. I'm using smaller models which hit the context window cap right away so using agents for something like this seems enticing.
BigFaceBass@reddit
Hook up the model to a text to speech tool and you’ve basically got a souped up robo caller. Get ready for next election season!
Chiashurb@reddit
If you want a robocall that advises you to glue the cheese to your pizza, sure
top_of_the_scrote@reddit
working with it right now, equipment related, got this huge db of stuff, someone's searching for some item, content generation people want to speed up their workflow (of making content) so this goes into the web finds pdfs/spec sheets (not available with db) and parses it/nodes, if the confidence is low, search some more, uses workflows (self calling/recursive code) it's hard to debug especially with lag.
idk I'm not psyched about it but it's my job atm
JonnyRocks@reddit
agents are coming out this year..thr idea is you say "monitor this inbox for invoices and process them through our invoice system. then email these three people and set up a meeting"
lhfvii@reddit
Sounds like a python script making a few API calls to the Gmail API and then google meets api. only with extra steps and a probabilistic approach which means undeterministic results
almost1it@reddit (OP)
Agree with this. I suppose the only value add that an LLM would add to this flow is classifying if an email is an invoice or not. And there are arguably better ways to do that without reaching for LLMs.
DependentlyHyped@reddit
Not willing to speculate about the broad-scale impact, but I’ve seen some pretty cool applications in my field compilers.
Nax5@reddit
If you consider Document AI or web scraper AIs agents, then those are useful.
I don’t anticipate agents writing good code for me. But I think the ability to have them go use websites is fascinating. It could turn almost anything on the Internet into an API.
No-Chocolate-9437@reddit
I have a discord bot I originally built for private use. I initially created it to help me with capture the flag challenges. It did pretty well, I recently switched to xAI and it does even better (maybe because a lot of vulnerability disclosures get posted to twitter)
iheartjetman@reddit
Here's a demo video of AgentForce from Salesforce. It's useful for Service center agents by acting as a real time assistant. It can do things like suggest actions and responses while the agent is interacting with the customer.
https://www.youtube.com/watch?v=5GFGTuCONJc
DuffyBravo@reddit
How about this site to help you create Performance reviews and Company objectives? https://ihateperfreviews.com/ (This has been a life saver at the end of the year for me!)
thatVisitingHasher@reddit
I’ve been playing with them using flowise. It feels like Salesforce. You can build workflows pretty quickly. It doesn’t feel like enterprise infrastructure. You can build small repeatable tools quickly. It’ll make you more efficient if you really understand prompt engineering, with the tools it provides. I can see it being a nice bridge to connect multiple services together.
Are agents going to change the workforce? Fuck no. It’ll create this generation’s Access gurus. Then those people will leave the company, and no one will know how to modify or fix them.
codesplosion@reddit
So far I’ve seen there’s some good uses for them in the “build me a report about x” space. Go discover and digest a lot of business info, summarize for a human to review.
There’s obviously more meat on the bone to the AI hype cycle than, say, web 3.0 or NFTs. But it’s still a hype cycle; buyer beware.
ImpossibleShoulder34@reddit
“Could have” already done that? And what kind of math do you think has been behind all the models at GS/JP for the last 15 years…Those companies alone have housed PhD graduates in statistical sciences and machine learning for years. What do you think they’ve been doing this entire time, jump rope?
No_Radish9565@reddit
I’m playing with Bedrock Agents at work and it basically seems like a purpose built Step Function with a lot of boilerplate taken care on your behalf. I don’t really get the hype.
chaoism@reddit
I view the Llama being really good high school test takers right now, so the agents will perform similarly, on specific areas
I haven't seen one that would make ppl go "oh shit", but rather "meh, it's better than nothing"
I do expect these agents taking over some intern jobs though, especially those that don't involve in development and creative side of things, namely data collect, summarizing, sending mails, and all these tasks