Skip to content
Back to Blog
Agentic AI10 min read

Building AI Agents That Browse the Web: Approaches and Pitfalls

Technical guide to web-browsing AI agents with Claude -- tool-based vs Computer Use, Playwright integration, rate limiting, and common pitfalls to avoid.

Two Approaches

Tool-based: Give Claude fetch_url and search_web tools. Fast, cost-effective, works for static sites. Computer Use: Claude visually operates a real browser. Full JS support, handles logins, but slower and more expensive. For most research tasks, tool-based wins.

Tool-Based Implementation

import anthropic, httpx
from bs4 import BeautifulSoup

client = anthropic.Anthropic()

def fetch_page(url: str) -> dict:
    resp = httpx.get(url, headers={"User-Agent": "ResearchBot/1.0"}, timeout=15, follow_redirects=True)
    soup = BeautifulSoup(resp.text, "html.parser")
    for tag in soup(["script", "style", "nav"]):
        tag.decompose()
    main = soup.find("main") or soup.find("article") or soup.body
    return {"url": url, "content": (main.get_text(strip=True) if main else "")[:8000]}

tools = [{"name": "fetch_page", "description": "Fetch webpage content.",
          "input_schema": {"type": "object", "properties": {"url": {"type": "string"}}, "required": ["url"]}}]

def web_agent(task: str) -> str:
    messages = [{"role": "user", "content": task}]
    while True:
        resp = client.messages.create(model="claude-sonnet-4-6", max_tokens=4096, tools=tools, messages=messages)
        if resp.stop_reason == "end_turn":
            return resp.content[0].text
        messages.append({"role": "assistant", "content": resp.content})
        results = [{"type": "tool_result", "tool_use_id": b.id, "content": str(fetch_page(**b.input))}
                   for b in resp.content if b.type == "tool_use"]
        messages.append({"role": "user", "content": results})

Common Pitfalls

  • Rate limiting: add 1-3 second delays between requests
  • Token overflow: truncate pages to 8000 chars maximum
  • Infinite loops: track visited URLs and enforce a max step count
  • Hallucinated URLs: validate before fetching, handle 404s gracefully
  • JS-only content: use Playwright headless browser for dynamic sites

Always respect robots.txt and check for official APIs before scraping.

Share this article
N

NYC News

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.