AutoGen 2026: Microsoft's Framework for Multi-Agent Conversations and Code Execution

What Makes AutoGen Different

AutoGen, Microsoft's open-source multi-agent framework, takes a fundamentally different approach from LangGraph and CrewAI. While LangGraph builds workflows as state machines and CrewAI assigns roles to agents, AutoGen models everything as conversations between agents. Agents talk to each other using natural language messages. The conversation history is the state. Multi-step workflows emerge from agents taking turns in a dialogue.

This conversational paradigm has a unique advantage: it handles ambiguity naturally. When an agent is unsure about something, it asks another agent for clarification — exactly like humans do. It also makes code execution a first-class feature. AutoGen agents can write Python code, execute it in a sandboxed environment, read the output, debug errors, and iterate — all through the conversation.

AutoGen Architecture: Agents and Conversations

The core AutoGen abstraction is the ConversableAgent. Every agent type — assistant, user proxy, code executor — inherits from this base class. Agents communicate by sending messages to each other, and each agent has a configurable response function that determines how it replies.

from autogen import ConversableAgent, AssistantAgent, UserProxyAgent

# Configure the LLM
llm_config = {
    "config_list": [
        {
            "model": "gpt-4o",
            "api_key": "your-api-key",
        }
    ],
    "temperature": 0,
}

# Assistant agent: uses the LLM to generate responses
assistant = AssistantAgent(
    name="research_assistant",
    system_message="""You are a helpful research assistant.
    When asked to analyze data, write Python code to perform
    the analysis. Always explain your approach before writing code.""",
    llm_config=llm_config,
)

# User proxy: represents the human, can execute code
user_proxy = UserProxyAgent(
    name="user",
    human_input_mode="NEVER",  # Fully autonomous
    max_consecutive_auto_reply=10,
    code_execution_config={
        "work_dir": "workspace",
        "use_docker": True,  # Sandbox code execution
    },
    is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
)

# Start a conversation
result = user_proxy.initiate_chat(
    assistant,
    message="Analyze the top 10 tech stocks by market cap. "
            "Create a visualization comparing their P/E ratios.",
)

When you call initiate_chat, the conversation ping-pongs between agents. The user_proxy sends the initial message, the assistant responds (potentially with code), the user_proxy executes the code and sends back the output, the assistant reviews the output and either writes more code or provides the final answer. This continues until the termination condition is met.

Code Execution: AutoGen's Killer Feature

AutoGen's code execution is its most distinctive feature. The assistant agent writes Python code in markdown code blocks, and the user proxy automatically extracts and executes it. If the code fails, the error message goes back to the assistant, which debugs and retries.

# The assistant writes code like this in its responses:
# ~~~python
# import pandas as pd
# import matplotlib.pyplot as plt
# data = pd.read_csv("stocks.csv")
# plt.bar(data["company"], data["pe_ratio"])
# plt.savefig("pe_ratios.png")
# ~~~

# AutoGen automatically:
# 1. Extracts the code block
# 2. Executes it in the workspace directory
# 3. Captures stdout, stderr, and exit code
# 4. Sends the output back to the assistant

# Configure Docker-based execution for safety
code_executor_config = {
    "work_dir": "workspace",
    "use_docker": "python:3.11",  # Use specific Docker image
    "timeout": 60,  # Max execution time in seconds
    "last_n_messages": 3,  # Only look at recent messages for code
}

secure_proxy = UserProxyAgent(
    name="executor",
    human_input_mode="NEVER",
    code_execution_config=code_executor_config,
    is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
)

The Docker sandboxing is critical for production. Without it, the LLM-generated code runs on your host machine with full access. Docker isolates execution — the code runs in a container with no network access (unless you configure it), no access to the host filesystem, and strict resource limits.

Group Chat: Multi-Agent Conversations

AutoGen's GroupChat enables conversations with more than two agents. A GroupChatManager coordinates turn-taking, deciding which agent speaks next based on the conversation context.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from autogen import GroupChat, GroupChatManager

# Define specialized agents
data_engineer = AssistantAgent(
    name="data_engineer",
    system_message="""You are a data engineer. You write SQL queries
    and Python code for data extraction and transformation.
    You hand off to the analyst once data is ready.""",
    llm_config=llm_config,
)

data_analyst = AssistantAgent(
    name="data_analyst",
    system_message="""You are a data analyst. You perform statistical
    analysis and create visualizations. You work with data provided
    by the data engineer. You hand off to the writer for reporting.""",
    llm_config=llm_config,
)

report_writer = AssistantAgent(
    name="report_writer",
    system_message="""You are a technical writer. You create clear,
    well-structured reports from analysis results. When the report
    is complete, respond with TERMINATE.""",
    llm_config=llm_config,
)

executor = UserProxyAgent(
    name="executor",
    human_input_mode="NEVER",
    code_execution_config={"work_dir": "workspace", "use_docker": True},
    is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
)

# Create group chat
group_chat = GroupChat(
    agents=[executor, data_engineer, data_analyst, report_writer],
    messages=[],
    max_round=20,
    speaker_selection_method="auto",  # LLM decides who speaks next
)

manager = GroupChatManager(
    groupchat=group_chat,
    llm_config=llm_config,
)

# Start the group conversation
executor.initiate_chat(
    manager,
    message="Analyze our Q1 2026 sales data from the warehouse. "
            "Find the top performing regions and products. "
            "Create a report with visualizations.",
)

The speaker_selection_method parameter controls turn-taking. "auto" uses the LLM to decide who should speak next based on the conversation. "round_robin" cycles through agents in order. "random" picks randomly. You can also provide a custom function.

Custom Speaker Selection

For deterministic workflows, implement custom speaker selection that routes based on message content rather than LLM judgment.

def custom_speaker_selection(
    last_speaker: ConversableAgent,
    groupchat: GroupChat,
) -> ConversableAgent | str:
    """Deterministic speaker selection based on workflow stage."""
    messages = groupchat.messages
    last_content = messages[-1].get("content", "").lower()

    # After executor runs code, send output to the right agent
    if last_speaker.name == "executor":
        if "error" in last_content:
            return data_engineer  # Send errors back to engineer
        return data_analyst  # Successful output goes to analyst

    # Data engineer always goes to executor (to run code)
    if last_speaker.name == "data_engineer":
        return executor

    # Analyst produces analysis, goes to writer
    if last_speaker.name == "data_analyst":
        if "visualization" in last_content or "chart" in last_content:
            return executor  # Need to execute visualization code
        return report_writer

    # Writer produces final report
    if last_speaker.name == "report_writer":
        return "auto"  # Let LLM decide if more work needed

    return "auto"

group_chat = GroupChat(
    agents=[executor, data_engineer, data_analyst, report_writer],
    messages=[],
    max_round=20,
    speaker_selection_method=custom_speaker_selection,
)

Human Proxy Agent Patterns

The UserProxyAgent can be configured for different levels of human involvement. This is how you implement human-in-the-loop workflows in AutoGen.

# Fully autonomous: no human input
autonomous_proxy = UserProxyAgent(
    name="auto_executor",
    human_input_mode="NEVER",
    code_execution_config={"work_dir": "workspace"},
)

# Always ask for approval before executing
supervised_proxy = UserProxyAgent(
    name="supervised_executor",
    human_input_mode="ALWAYS",  # Prompt user before every action
    code_execution_config={"work_dir": "workspace"},
)

# Ask for input only when the agent terminates
review_proxy = UserProxyAgent(
    name="review_executor",
    human_input_mode="TERMINATE",  # Only prompt at the end
    code_execution_config={"work_dir": "workspace"},
)

Nested Conversations

AutoGen supports nested conversations — one agent can trigger an entire sub-conversation with other agents as part of its response. This enables composable multi-agent workflows.

# Define a nested chat that the main assistant can trigger
def research_nested_chat(query: str) -> str:
    """Run a research sub-conversation between specialized agents."""
    web_researcher = AssistantAgent(
        name="web_researcher",
        system_message="You search the web and summarize findings.",
        llm_config=llm_config,
    )
    fact_checker = AssistantAgent(
        name="fact_checker",
        system_message="You verify claims with sources. "
                       "Respond TERMINATE when verified.",
        llm_config=llm_config,
    )
    proxy = UserProxyAgent(
        name="proxy",
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
    )

    result = proxy.initiate_chat(
        web_researcher,
        message=f"Research this topic: {query}",
        max_turns=5,
    )
    return result.summary

# Register as a function the main agent can call
assistant.register_function(
    function_map={"research": research_nested_chat}
)

Registering Custom Reply Functions

For advanced control, register custom reply functions that intercept and handle specific message patterns.

def handle_data_request(
    recipient: ConversableAgent,
    messages: list[dict],
    sender: ConversableAgent,
    config: dict,
) -> tuple[bool, str]:
    """Custom reply function that intercepts data requests."""
    last_msg = messages[-1].get("content", "")

    if "fetch data" in last_msg.lower():
        # Directly query database instead of generating code
        import sqlite3
        conn = sqlite3.connect("company.db")
        result = conn.execute("SELECT * FROM sales LIMIT 10").fetchall()
        conn.close()
        return True, f"Data fetched directly:
{result}"

    return False, None  # Let normal processing handle it

assistant.register_reply(
    trigger=UserProxyAgent,
    reply_func=handle_data_request,
    position=0,  # Check this function first
)

FAQ

How does AutoGen handle code execution errors safely?

AutoGen wraps code execution in a sandbox — either Docker containers or local subprocess with configurable timeouts. When code fails, the error message (stderr output and exit code) is captured and sent back to the assistant agent as a conversation message. The assistant sees the error, diagnoses it, and writes corrected code. This debug loop typically resolves issues within 2-3 iterations. For production, always use Docker execution to prevent malicious or buggy code from affecting the host system. Set strict timeouts (30-60 seconds) to prevent infinite loops.

When should I use AutoGen instead of LangGraph or CrewAI?

Use AutoGen when your workflow involves iterative code generation and execution — data analysis, report generation, code review, or any task where the agent needs to write and run code. AutoGen's code execution sandbox is more mature than alternatives. Also choose AutoGen when the natural framing of your problem is a conversation between experts rather than a predefined workflow graph. AutoGen's flexibility makes it ideal for exploratory tasks where the exact steps are not known in advance.

How do I control costs with multi-agent AutoGen conversations?

Set max_round on GroupChat to limit conversation length. Use max_consecutive_auto_reply on UserProxyAgent to prevent runaway exchanges. Monitor token usage with the built-in cost tracking (each chat result includes token_usage). Use cheaper models (GPT-4o-mini) for simple agents like the executor, and reserve GPT-4o for agents that need strong reasoning. Cache LLM responses with AutoGen's built-in caching to avoid paying for repeated identical requests.

Can AutoGen agents use external APIs and tools?

Yes. Register functions with register_function to give agents callable tools. The assistant describes available functions in its system message and calls them using the standard function-calling format. The user proxy executes the function and returns the result. You can also register async functions for non-blocking API calls and tools that return structured data for the assistant to process.

#AutoGen #Microsoft #MultiAgent #CodeExecution #ConversationalAI #Python #AIFramework #GroupChat