Microsoft stealth releases both “Magentic-One”: An Open Source Generalist Multi-Agent System for Solving Complex tasks, and AutogenBench

[-]

GriffHook36@reddit

Anyone know if you can create your own custom agents to go beyond the 4 they included? I expect so since it's based on AutoGen but I haven't been able to tinker with it yet.

[-]

NefariousnessDue3741@reddit

Sure, the "websurfer" and "coder" is just the customized agent based on autogen, so you can write your agent and join the group chat with the original others

[-]

Just tested this out with a rather simple question that required getting sentiment from reddit eventually.
after about 20k tokens and 33 requests to gpt-4o, the model blocked me because the request did not comply with openai standards (it was something really really benign), so this is a major blocker, which I encountered in past experience with agents flow.
Eventually the agents will create a prompt that does not match the model filtering policy, and they won't try to work around it, and it can come, as we saw now, after 33 prompts and 20k token context.

I will try running this with the omni parser as well against llama 3.2 vision (with ollama) wish me luck.

[-]

Porespellar@reddit (OP)

Please share how you are configuring it to work with local LLMs (Ollama if possible). I’m sure lots of folks want to use it locally.

[-]

Icy-Corgi4757@reddit

I am working on it. I have it happily working offline with 3.2 vision on ollama but getting it to actually interact with the browser is proving to be cumbersome.

[-]

erdult@reddit

is this any better than open-interpretor

[-]

Porespellar@reddit (OP)

Only downside it is currently only supporting OpenAI models and not local. How hard is it to make it work with Ollama? Can someone fork it and do this or something?

[-]

Incompetent_Magician@reddit

It doesn't support Ollama but it does work with Ollama. I'm on MacOS and I use Podman.

#!/usr/bin/env python
import autogen
import os
import sys
import logging
import requests
import subprocess

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Set Docker host to use Podman socket
os.environ["DOCKER_HOST"] = "unix:///var/run/docker.sock"

BASE_URL = "http://localhost:11434/v1"

# Configuration for Ollama models
config_list = [
    {
        "base_url": BASE_URL,
        "api_key": "fakekey",
        "model": "qwen2.5:32b-instruct",
    }
]

llm_config = {
    "config_list": config_list,
}


# Function to validate Ollama server
def validate_ollama_server():
    try:
        response = requests.get(f"{BASE_URL}/models")
        response.raise_for_status()
        logger.info("Ollama server is running and accessible.")
    except requests.RequestException as e:
        logger.error(f"Failed to connect to Ollama server: {e}")
        sys.exit(1)


# Function to pull Python image
def pull_python_image():
    try:
        subprocess.run(["podman", "pull", "python:3-slim"], check=True)
        logger.info("Python image pulled successfully.")
    except subprocess.CalledProcessError as e:
        logger.error(f"Failed to pull Python image: {e}")
        sys.exit(1)


# Set up agents
assistant = autogen.AssistantAgent(name="assistant", llm_config=llm_config)
coder = autogen.AssistantAgent(name="coder", llm_config=llm_config)

# Code execution configuration
code_execution_config = {
    "work_dir": "coding",
    "use_docker": True,
}

# Set up user proxy agent
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="TERMINATE",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config=code_execution_config,
)

# Main execution
if __name__ == "__main__":
    # Validate Ollama server
    validate_ollama_server()

    # Pull Python image
    pull_python_image()

    # Initiate chat
    logger.info("Initiating chat with assistant...")
    try:
        user_proxy.initiate_chat(
            assistant,
            message="Write a Python function to the answer to life the universe and everything.",
        )
    except Exception as e:
        logger.error(f"An error occurred during chat: {e}")
        sys.exit(1)

    logger.info("Chat completed successfully.")

[-]

xrailgun@reddit

Further modified to work on Windows, Docker, and OpenAI-compatible endpoints. I used deepseek.

#!/usr/bin/env python
import autogen
import os
import sys
import logging
import subprocess

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# OpenAI API compatible configuration
OPENAI_API_BASE = "https://api.deepseek.com/v1"  # Replace with your endpoint
OPENAI_API_KEY = "sk-qwerty"  # Replace with your API key

config_list = [
    {
        "base_url": OPENAI_API_BASE,
        "api_key": OPENAI_API_KEY,
        "model": "deepseek-chat",  # Replace with your model name
    }
]

llm_config = {
    "config_list": config_list,
    "timeout": 60,  # Optional: adjust timeout as needed
    "cache_seed": 42,  # Optional: for reproducible results
}

# Function to pull Python image
def pull_python_image():
    try:
        subprocess.run(["docker", "pull", "python:3-slim"], check=True)
        logger.info("Python image pulled successfully.")
    except subprocess.CalledProcessError as e:
        logger.error(f"Failed to pull Python image: {e}")
        sys.exit(1)

# Set up agents
assistant = autogen.AssistantAgent(
    name="assistant", 
    llm_config=llm_config,
)

coder = autogen.AssistantAgent(
    name="coder", 
    llm_config=llm_config,
)

# Code execution configuration
code_execution_config = {
    "work_dir": os.path.join(os.getcwd(), "coding"),
    "use_docker": True,
}

# Set up user proxy agent
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="TERMINATE",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config=code_execution_config,
)

# Main execution
if __name__ == "__main__":
    # Pull Python image
    pull_python_image()

    # Initiate chat
    logger.info("Initiating chat with assistant...")
    try:
        user_proxy.initiate_chat(
            assistant,
            message="Write a Python function to answer the ultimate question of life the universe and everything.",
        )
    except Exception as e:
        logger.error(f"An error occurred during chat: {e}")
        sys.exit(1)

    logger.info("Chat completed successfully.")

[-]

Incompetent_Magician@reddit

Nicely done. I don't have a Windows machine to work with; thank you.

[-]

gentlecucumber@reddit

If it works with OpenAI then it works with local models. Use vLLM instead of Ollama.

[-]

Alexian_Theory@reddit

as mentioned the WebSurfer agent requires a multimodal LLM. So there is the problem really, still no multimodal for ollama AFAIK, still waiting on llama 3.2 11b to work, according to some previous posts it should be fun

[-]

Alexian_Theory@reddit

lol the timing. Ollama llama3.2 with vision dropped today.

[-]

_Erilaz@reddit

If it supports ClosedAI API, that isn't an issue at all.

[-]

Alexian_Theory@reddit

I’ve played with it for a while last week, I found it by chance looking for something similar to the Websurfer agent for the new core 0.4 dev release. the approach to web browsing is interesting. It takes snapshots of the headless browser it is running, passes the image to a vision enabled LLM and then decides how to further proceed to finish the task.

[-]

afourney@reddit

Author here. There’s a great paper we cite that was influential: WebVoyager. Please go check it out.

We use a combination of screenshots (with Set-of-marks prompts), AND a structured text we extract from the DOM. A major limitation of screenshots is that they can’t see what’s not on the screen! So the text helps the agent know if it needs to scroll, etc. Q&A and summarization is also done on the whole DOM to try to do it all in one shot.

After each action WebSurfer generates a new screenshot with the final state, and shares it with the team (all agents are multi-modal), along with a text representation. Note that the latest models have started to refuse to generate these text representations for some odd reason, so we’ll likely need to tweak things a bit.

There’s a ton of opportunity to improve this.

[-]

Enough-Meringue4745@reddit

It's the only feasible way given how bloated html is

[-]

FaceDeer@reddit

And also possibly to bypass Cloudflare and other such anti-bot mechanisms.

[-]

NarrowTea3631@reddit

headless browsers are generally very easy to detect, takes a lot of work to do serious automated stuff with em

[-]

psilent@reddit

“More worryingly, in a handful of cases — and until prompted otherwise — the agents occasionally attempted to recruit other humans for help (e.g., by posting to social media, emailing textbook authors, or, in one case, drafting a freedom of information request to a government entity).”

There you go, just ask on social media how to log in to a server

[-]

Porespellar@reddit (OP)

That’s friggin hilarious!! It thinks it’s people. I can see why they waited until post-election to release this and pretty much released it without any fanfare.

[-]

cyan2k@reddit

?? What are you talking about.... I'm playing with it since a couple of weeks. The branch is three months old

[-]

afourney@reddit

Author here. Indeed the code has been public since 0.4, and actually there’s an early version of this from March on 0.2 (go to GAIA Leaderboard and click March 01 MSR Frontiers entry). I spoke about an early version in the Spring here: https://youtu.be/KuX_dkqr7UY?si=BT1aD9SJvRJuj91g

[-]

Real_Pareak@reddit

>you guys are hallucinating like mini phi 3.5 in a two bit quant

That's the most LLM-nerdy insult I have ever heard, lol

[-]

wavinghandco@reddit

"November 4, 2024"

[-]

Porespellar@reddit (OP)

Yeah, that’s when the article was written. A day before the election, but all the mail in voting had already occurred and I don’t know that they actually posted the blog entry until today. Guess I could check the wayback machine. Regardless, this was just kind of put out there without a whole lot of press. The fact that I’m the first to post it here after it’s supposedly been out for two days should tell you all you need to know.

[-]

throwawayPzaFm@reddit

drafting a freedom of information request to a government entity

That's... kinda awesome.

[-]

afourney@reddit

Author here. The request was drafted for GAIA problem 3013b87b-dc19-466a-b803-6b7239b9fd9c, "*From the earliest record in the FDIC's Failed Bank List to 2022, what is the difference between the highest total paid dividend percentage from a Pennsylvania bank and a Virginia bank? Just give the number.*"

The draft **which was never sent** (I want to make that clear...it was never sent), was:

Dear Freedom of Information Act Officer,

Under the Freedom of Information Act (5 U.S.C. 552), I am requesting access to records or any available data that contain the following information:

1. The highest total paid dividend percentage for a failed bank located in the state of Pennsylvania, from the earliest record in the FDIC's Failed Bank List up to the year 2022.
2. The highest total paid dividend percentage for a failed bank located in the state of Virginia, from the earliest record in the FDIC's Failed Bank List up to the year 2022.

The requested information is for the purpose of conducting a comparative analysis of the financial resolutions of failed banks in these two states.

If there are any fees for searching or copying these records, please inform me before you fulfill my request. However, I would also like to request a waiver of all fees in that the disclosure of the requested information is in the public interest and will contribute significantly to the public's understanding of the FDIC's handling of failed bank resolutions.

If my request is denied in whole or part, I ask that you justify all deletions by reference to specific exemptions of the act. I will also expect you to release all segregable portions of otherwise exempt material. I reserve the right to appeal your decision to withhold any information or to deny a waiver of fees.

As I am sure you will agree, it is in the public interest that this information be released as quickly as possible. Therefore, I would appreciate a response within 20 business days, as the statute requires.

Thank you for your assistance.

Sincerely,

[Your Name]
[Your Address]
[Your Contact Information]

[-]

Dead_Internet_Theory@reddit

That's impressive.

[-]

JohnnyLovesData@reddit

Relevant XKCD ? Zealous Autoconfig

[-]

afourney@reddit

Author here. Missed opportunity to cite xkcd. Damnit. Will have to save it for the presentation.

[-]

posthubris@reddit

Model was trained on XKCD.

[-]

inconspiciousdude@reddit

There really is one for everything :/

I can see it becoming a bible of sorts in a post-apocalyptic world.

[-]

Jazzlike_Tooth929@reddit

mind blowing

[-]

foldl-li@reddit

Interesting. But, is GraphRAG widely adopted or not?

[-]

arjunainfinity@reddit

Nice, here’s an opensource multi-agent studio with telephone features as well https://github.com/NidumAI-Inc/agent-studio

[-]

Morganross@reddit

is that the worst possible example they could give?

an example is something that a human can relate to, not a fantasy figment of imagination.

[-]

ithkuil@reddit

The diagram makes it look like they defined a new agent for each tool call. Sorry but that doesn't make sense for this example. It's a toy example but that's oversimplified and that makes it confusing as to why they are doing these things.

My framework can do task this with one agent that has all of those types of commands enabled. You also don't need an orchestrator for this example. What you need an orchestrator for is when there is a ton of output and complexity for some of the subtasks that you don't want to burden the other tasks with. I just don't see that much complexity and output in this example.

[-]