Microsoft stealth releases both “Magentic-One”: An Open Source Generalist Multi-Agent System for Solving Complex tasks, and AutogenBench
Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 42 comments
Had no idea these were even being developed. Found both while searching for news on Autogen Studio. The Magentic-One project looks fascinating. Seems to build on top of Autgen. It seems to add quite a lot of capabilities. Didn’t see any other posts regarding these two releases yet so I thought I would post.
GriffHook36@reddit
Anyone know if you can create your own custom agents to go beyond the 4 they included? I expect so since it's based on AutoGen but I haven't been able to tinker with it yet.
NefariousnessDue3741@reddit
Sure, the "websurfer" and "coder" is just the customized agent based on autogen, so you can write your agent and join the group chat with the original others
CptKrupnik@reddit
Just tested this out with a rather simple question that required getting sentiment from reddit eventually.
after about 20k tokens and 33 requests to gpt-4o, the model blocked me because the request did not comply with openai standards (it was something really really benign), so this is a major blocker, which I encountered in past experience with agents flow.
Eventually the agents will create a prompt that does not match the model filtering policy, and they won't try to work around it, and it can come, as we saw now, after 33 prompts and 20k token context.
I will try running this with the omni parser as well against llama 3.2 vision (with ollama) wish me luck.
Porespellar@reddit (OP)
Please share how you are configuring it to work with local LLMs (Ollama if possible). I’m sure lots of folks want to use it locally.
Icy-Corgi4757@reddit
I am working on it. I have it happily working offline with 3.2 vision on ollama but getting it to actually interact with the browser is proving to be cumbersome.
erdult@reddit
is this any better than open-interpretor
Porespellar@reddit (OP)
Only downside it is currently only supporting OpenAI models and not local. How hard is it to make it work with Ollama? Can someone fork it and do this or something?
Incompetent_Magician@reddit
It doesn't support Ollama but it does work with Ollama. I'm on MacOS and I use Podman.
xrailgun@reddit
Further modified to work on Windows, Docker, and OpenAI-compatible endpoints. I used deepseek.
Incompetent_Magician@reddit
Nicely done. I don't have a Windows machine to work with; thank you.
gentlecucumber@reddit
If it works with OpenAI then it works with local models. Use vLLM instead of Ollama.
Alexian_Theory@reddit
as mentioned the WebSurfer agent requires a multimodal LLM. So there is the problem really, still no multimodal for ollama AFAIK, still waiting on llama 3.2 11b to work, according to some previous posts it should be fun
Alexian_Theory@reddit
lol the timing. Ollama llama3.2 with vision dropped today.
_Erilaz@reddit
If it supports ClosedAI API, that isn't an issue at all.
Alexian_Theory@reddit
I’ve played with it for a while last week, I found it by chance looking for something similar to the Websurfer agent for the new core 0.4 dev release. the approach to web browsing is interesting. It takes snapshots of the headless browser it is running, passes the image to a vision enabled LLM and then decides how to further proceed to finish the task.
afourney@reddit
Author here. There’s a great paper we cite that was influential: WebVoyager. Please go check it out.
We use a combination of screenshots (with Set-of-marks prompts), AND a structured text we extract from the DOM. A major limitation of screenshots is that they can’t see what’s not on the screen! So the text helps the agent know if it needs to scroll, etc. Q&A and summarization is also done on the whole DOM to try to do it all in one shot.
After each action WebSurfer generates a new screenshot with the final state, and shares it with the team (all agents are multi-modal), along with a text representation. Note that the latest models have started to refuse to generate these text representations for some odd reason, so we’ll likely need to tweak things a bit.
There’s a ton of opportunity to improve this.
Enough-Meringue4745@reddit
It's the only feasible way given how bloated html is
FaceDeer@reddit
And also possibly to bypass Cloudflare and other such anti-bot mechanisms.
NarrowTea3631@reddit
headless browsers are generally very easy to detect, takes a lot of work to do serious automated stuff with em
psilent@reddit
“More worryingly, in a handful of cases — and until prompted otherwise — the agents occasionally attempted to recruit other humans for help (e.g., by posting to social media, emailing textbook authors, or, in one case, drafting a freedom of information request to a government entity).”
There you go, just ask on social media how to log in to a server
Porespellar@reddit (OP)
That’s friggin hilarious!! It thinks it’s people. I can see why they waited until post-election to release this and pretty much released it without any fanfare.
cyan2k@reddit
?? What are you talking about.... I'm playing with it since a couple of weeks. The branch is three months old
afourney@reddit
Author here. Indeed the code has been public since 0.4, and actually there’s an early version of this from March on 0.2 (go to GAIA Leaderboard and click March 01 MSR Frontiers entry). I spoke about an early version in the Spring here: https://youtu.be/KuX_dkqr7UY?si=BT1aD9SJvRJuj91g
Real_Pareak@reddit
>you guys are hallucinating like mini phi 3.5 in a two bit quant
That's the most LLM-nerdy insult I have ever heard, lol
wavinghandco@reddit
"November 4, 2024"
Porespellar@reddit (OP)
Yeah, that’s when the article was written. A day before the election, but all the mail in voting had already occurred and I don’t know that they actually posted the blog entry until today. Guess I could check the wayback machine. Regardless, this was just kind of put out there without a whole lot of press. The fact that I’m the first to post it here after it’s supposedly been out for two days should tell you all you need to know.
throwawayPzaFm@reddit
That's... kinda awesome.
afourney@reddit
Author here. The request was drafted for GAIA problem 3013b87b-dc19-466a-b803-6b7239b9fd9c, "*From the earliest record in the FDIC's Failed Bank List to 2022, what is the difference between the highest total paid dividend percentage from a Pennsylvania bank and a Virginia bank? Just give the number.*"
The draft **which was never sent** (I want to make that clear...it was never sent), was:
Dear Freedom of Information Act Officer,
Under the Freedom of Information Act (5 U.S.C. 552), I am requesting access to records or any available data that contain the following information:
1. The highest total paid dividend percentage for a failed bank located in the state of Pennsylvania, from the earliest record in the FDIC's Failed Bank List up to the year 2022.
2. The highest total paid dividend percentage for a failed bank located in the state of Virginia, from the earliest record in the FDIC's Failed Bank List up to the year 2022.
The requested information is for the purpose of conducting a comparative analysis of the financial resolutions of failed banks in these two states.
If there are any fees for searching or copying these records, please inform me before you fulfill my request. However, I would also like to request a waiver of all fees in that the disclosure of the requested information is in the public interest and will contribute significantly to the public's understanding of the FDIC's handling of failed bank resolutions.
If my request is denied in whole or part, I ask that you justify all deletions by reference to specific exemptions of the act. I will also expect you to release all segregable portions of otherwise exempt material. I reserve the right to appeal your decision to withhold any information or to deny a waiver of fees.
As I am sure you will agree, it is in the public interest that this information be released as quickly as possible. Therefore, I would appreciate a response within 20 business days, as the statute requires.
Thank you for your assistance.
Sincerely,
[Your Name]
[Your Address]
[Your Contact Information]
Dead_Internet_Theory@reddit
That's impressive.
JohnnyLovesData@reddit
Relevant XKCD ? Zealous Autoconfig
afourney@reddit
Author here. Missed opportunity to cite xkcd. Damnit. Will have to save it for the presentation.
posthubris@reddit
Model was trained on XKCD.
inconspiciousdude@reddit
There really is one for everything :/
I can see it becoming a bible of sorts in a post-apocalyptic world.
Jazzlike_Tooth929@reddit
mind blowing
foldl-li@reddit
Interesting. But, is GraphRAG widely adopted or not?
arjunainfinity@reddit
Nice, here’s an opensource multi-agent studio with telephone features as well https://github.com/NidumAI-Inc/agent-studio
Morganross@reddit
is that the worst possible example they could give?
an example is something that a human can relate to, not a fantasy figment of imagination.
ithkuil@reddit
The diagram makes it look like they defined a new agent for each tool call. Sorry but that doesn't make sense for this example. It's a toy example but that's oversimplified and that makes it confusing as to why they are doing these things.
My framework can do task this with one agent that has all of those types of commands enabled. You also don't need an orchestrator for this example. What you need an orchestrator for is when there is a ton of output and complexity for some of the subtasks that you don't want to burden the other tasks with. I just don't see that much complexity and output in this example.
Shir_man@reddit
How is Magnetic-one different from Autogen?
Enough-Meringue4745@reddit
I believe it IS autogen but its custom agents
HiddenSecretAccount@reddit
my brain exploded, thanks
mythicinfinity@reddit
open source!