Agent World Models: Internal Simulations for Planning and Prediction

What Is a World Model?

In reinforcement learning, an agent can either learn by trial and error in the real environment (model-free) or build an internal model of how the environment works and simulate outcomes before acting (model-based). A world model is that internal simulation — a representation of how the world changes in response to actions.

For LLM-based agents, a world model is not a neural network predicting pixel-level frames. Instead, it is a structured representation of the current state plus reasoning about how that state would change given different actions. The LLM uses its world knowledge to "imagine" what would happen, then picks the best path.

State Representation

The first requirement is a clean representation of the world state that the agent can reason about:

from pydantic import BaseModel
from typing import Any

class WorldState(BaseModel):
    """Structured representation of the current state."""
    entities: dict[str, dict[str, Any]]
    relationships: list[tuple[str, str, str]]  # (subject, relation, object)
    constraints: list[str]
    history: list[str]  # past actions taken

    def describe(self) -> str:
        """Convert state to natural language for LLM reasoning."""
        lines = ["Current State:"]
        for name, props in self.entities.items():
            props_str = ", ".join(f"{k}={v}" for k, v in props.items())
            lines.append(f"  {name}: {props_str}")
        for s, r, o in self.relationships:
            lines.append(f"  {s} --{r}--> {o}")
        for c in self.constraints:
            lines.append(f"  Constraint: {c}")
        return "\n".join(lines)

Simulating Action Consequences

The core of a world model is the transition function: given the current state and a proposed action, predict the next state.

from openai import OpenAI
import json

client = OpenAI()

def simulate_action(state: WorldState, action: str) -> WorldState:
    """Predict the world state after taking an action."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """You are a world simulator.
Given a current state and a proposed action, predict the resulting state.
Consider:
- Direct effects of the action
- Side effects and cascading consequences
- Constraint violations (flag them)
- What remains unchanged

Return the new state as JSON with the same schema."""},
            {"role": "user", "content": (
                f"{state.describe()}\n\n"
                f"Proposed action: {action}\n\n"
                "Predict the resulting state."
            )},
        ],
        response_format={"type": "json_object"},
    )
    new_state_data = json.loads(response.choices[0].message.content)
    return WorldState(**new_state_data)

Look-Ahead Planning with Tree Search

With a simulation function, the agent can explore multiple future paths before committing to an action:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from dataclasses import dataclass

@dataclass
class SimulationNode:
    state: WorldState
    action: str | None
    score: float
    children: list["SimulationNode"]
    depth: int

def look_ahead(
    state: WorldState,
    possible_actions: list[str],
    goal: str,
    depth: int = 2,
) -> str:
    """Simulate multiple action paths and choose the best."""
    best_action = None
    best_score = float("-inf")

    for action in possible_actions:
        # Simulate this action
        next_state = simulate_action(state, action)

        # Score: how close does this get us to the goal?
        score = evaluate_state(next_state, goal)

        if depth > 1:
            # Recurse: look further ahead
            future_actions = generate_actions(next_state, goal)
            future_best = look_ahead(
                next_state, future_actions, goal, depth - 1
            )
            # The score should account for future potential
            future_state = simulate_action(next_state, future_best)
            score = 0.4 * score + 0.6 * evaluate_state(future_state, goal)

        if score > best_score:
            best_score = score
            best_action = action

    return best_action

def evaluate_state(state: WorldState, goal: str) -> float:
    """Score how well a state satisfies the goal (0.0 to 1.0)."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Rate how close this state is to achieving the goal. "
                "Return a single float between 0.0 and 1.0."
            )},
            {"role": "user", "content": (
                f"{state.describe()}\nGoal: {goal}"
            )},
        ],
    )
    return float(response.choices[0].message.content.strip())

Practical Example: Project Management Agent

Consider an agent managing a software project. Its world model tracks developers, tasks, dependencies, and deadlines. Before assigning a task, it simulates the consequences:

project_state = WorldState(
    entities={
        "alice": {"role": "frontend", "current_task": "auth-ui", "load": 0.8},
        "bob": {"role": "backend", "current_task": None, "load": 0.2},
        "auth-api": {"type": "task", "status": "blocked", "priority": "high"},
    },
    relationships=[
        ("auth-ui", "depends_on", "auth-api"),
        ("alice", "assigned_to", "auth-ui"),
    ],
    constraints=[
        "No developer should exceed 1.0 load",
        "Blocked tasks cannot start until dependencies complete",
    ],
    history=["Sprint started 3 days ago"],
)

# Simulate: what if we assign auth-api to Bob?
next_state = simulate_action(project_state, "Assign auth-api to Bob")
# The model should predict: Bob's load increases, auth-api moves to
# in-progress, and once complete, auth-ui becomes unblocked for Alice.

Limitations and Mitigations

LLM-based world models are imperfect — they can miss edge cases, violate physical laws, or drift from reality over multiple simulation steps. Mitigate this by (1) grounding simulations with real data at every opportunity, (2) limiting look-ahead depth to 2-3 steps, and (3) re-syncing the world model with actual state after each real action.

FAQ

How accurate are LLM-based world models?

For common-sense reasoning and business logic, LLMs are surprisingly effective world simulators. They struggle with precise numerical computations and novel physical scenarios. Always validate critical simulations against real-world checks.

How do you prevent state drift in long simulations?

Re-ground the world model after every real action by querying actual data sources (databases, APIs, sensors). Treat the simulated state as a hypothesis that gets corrected by observation. Never let the agent act on a state that is more than 2-3 simulation steps removed from reality.

Is this the same as Monte Carlo Tree Search (MCTS)?

Conceptually similar. MCTS uses random rollouts to evaluate positions; world model agents use LLM-based simulation. The key difference is that LLMs can bring vast world knowledge to the simulation, while MCTS relies on domain-specific value functions. Some hybrid approaches use both.

#WorldModels #StatePrediction #LookAheadPlanning #AgentSimulation #AgenticAI #PythonAI #AIPlanning #ReinforcementLearning

Agent World Models: Internal Simulations for Planning and Prediction

What Is a World Model?

State Representation

Simulating Action Consequences

Look-Ahead Planning with Tree Search

Practical Example: Project Management Agent

Limitations and Mitigations

FAQ

How accurate are LLM-based world models?

How do you prevent state drift in long simulations?

Is this the same as Monte Carlo Tree Search (MCTS)?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding