Agent World Models: Internal Simulations for Planning and Prediction
Explore how AI agents use internal world models to simulate future states, predict action consequences, and perform look-ahead planning — enabling smarter decisions without costly real-world trial and error.
What Is a World Model?
In reinforcement learning, an agent can either learn by trial and error in the real environment (model-free) or build an internal model of how the environment works and simulate outcomes before acting (model-based). A world model is that internal simulation — a representation of how the world changes in response to actions.
For LLM-based agents, a world model is not a neural network predicting pixel-level frames. Instead, it is a structured representation of the current state plus reasoning about how that state would change given different actions. The LLM uses its world knowledge to "imagine" what would happen, then picks the best path.
State Representation
The first requirement is a clean representation of the world state that the agent can reason about:
from pydantic import BaseModel
from typing import Any
class WorldState(BaseModel):
"""Structured representation of the current state."""
entities: dict[str, dict[str, Any]]
relationships: list[tuple[str, str, str]] # (subject, relation, object)
constraints: list[str]
history: list[str] # past actions taken
def describe(self) -> str:
"""Convert state to natural language for LLM reasoning."""
lines = ["Current State:"]
for name, props in self.entities.items():
props_str = ", ".join(f"{k}={v}" for k, v in props.items())
lines.append(f" {name}: {props_str}")
for s, r, o in self.relationships:
lines.append(f" {s} --{r}--> {o}")
for c in self.constraints:
lines.append(f" Constraint: {c}")
return "\n".join(lines)
Simulating Action Consequences
The core of a world model is the transition function: given the current state and a proposed action, predict the next state.
from openai import OpenAI
import json
client = OpenAI()
def simulate_action(state: WorldState, action: str) -> WorldState:
"""Predict the world state after taking an action."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": """You are a world simulator.
Given a current state and a proposed action, predict the resulting state.
Consider:
- Direct effects of the action
- Side effects and cascading consequences
- Constraint violations (flag them)
- What remains unchanged
Return the new state as JSON with the same schema."""},
{"role": "user", "content": (
f"{state.describe()}\n\n"
f"Proposed action: {action}\n\n"
"Predict the resulting state."
)},
],
response_format={"type": "json_object"},
)
new_state_data = json.loads(response.choices[0].message.content)
return WorldState(**new_state_data)
Look-Ahead Planning with Tree Search
With a simulation function, the agent can explore multiple future paths before committing to an action:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from dataclasses import dataclass
@dataclass
class SimulationNode:
state: WorldState
action: str | None
score: float
children: list["SimulationNode"]
depth: int
def look_ahead(
state: WorldState,
possible_actions: list[str],
goal: str,
depth: int = 2,
) -> str:
"""Simulate multiple action paths and choose the best."""
best_action = None
best_score = float("-inf")
for action in possible_actions:
# Simulate this action
next_state = simulate_action(state, action)
# Score: how close does this get us to the goal?
score = evaluate_state(next_state, goal)
if depth > 1:
# Recurse: look further ahead
future_actions = generate_actions(next_state, goal)
future_best = look_ahead(
next_state, future_actions, goal, depth - 1
)
# The score should account for future potential
future_state = simulate_action(next_state, future_best)
score = 0.4 * score + 0.6 * evaluate_state(future_state, goal)
if score > best_score:
best_score = score
best_action = action
return best_action
def evaluate_state(state: WorldState, goal: str) -> float:
"""Score how well a state satisfies the goal (0.0 to 1.0)."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": (
"Rate how close this state is to achieving the goal. "
"Return a single float between 0.0 and 1.0."
)},
{"role": "user", "content": (
f"{state.describe()}\nGoal: {goal}"
)},
],
)
return float(response.choices[0].message.content.strip())
Practical Example: Project Management Agent
Consider an agent managing a software project. Its world model tracks developers, tasks, dependencies, and deadlines. Before assigning a task, it simulates the consequences:
project_state = WorldState(
entities={
"alice": {"role": "frontend", "current_task": "auth-ui", "load": 0.8},
"bob": {"role": "backend", "current_task": None, "load": 0.2},
"auth-api": {"type": "task", "status": "blocked", "priority": "high"},
},
relationships=[
("auth-ui", "depends_on", "auth-api"),
("alice", "assigned_to", "auth-ui"),
],
constraints=[
"No developer should exceed 1.0 load",
"Blocked tasks cannot start until dependencies complete",
],
history=["Sprint started 3 days ago"],
)
# Simulate: what if we assign auth-api to Bob?
next_state = simulate_action(project_state, "Assign auth-api to Bob")
# The model should predict: Bob's load increases, auth-api moves to
# in-progress, and once complete, auth-ui becomes unblocked for Alice.
Limitations and Mitigations
LLM-based world models are imperfect — they can miss edge cases, violate physical laws, or drift from reality over multiple simulation steps. Mitigate this by (1) grounding simulations with real data at every opportunity, (2) limiting look-ahead depth to 2-3 steps, and (3) re-syncing the world model with actual state after each real action.
FAQ
How accurate are LLM-based world models?
For common-sense reasoning and business logic, LLMs are surprisingly effective world simulators. They struggle with precise numerical computations and novel physical scenarios. Always validate critical simulations against real-world checks.
How do you prevent state drift in long simulations?
Re-ground the world model after every real action by querying actual data sources (databases, APIs, sensors). Treat the simulated state as a hypothesis that gets corrected by observation. Never let the agent act on a state that is more than 2-3 simulation steps removed from reality.
Is this the same as Monte Carlo Tree Search (MCTS)?
Conceptually similar. MCTS uses random rollouts to evaluate positions; world model agents use LLM-based simulation. The key difference is that LLMs can bring vast world knowledge to the simulation, while MCTS relies on domain-specific value functions. Some hybrid approaches use both.
#WorldModels #StatePrediction #LookAheadPlanning #AgentSimulation #AgenticAI #PythonAI #AIPlanning #ReinforcementLearning
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.