Building a Claude Web Scraper: Extracting Data Using Vision Instead of Selectors

Why Vision-Based Scraping?

Traditional web scraping with BeautifulSoup or Scrapy relies on parsing HTML and navigating the DOM tree. This works well for simple, well-structured pages. But the modern web is full of content that lives outside the DOM in a straightforward way: data rendered in canvas elements, charts built with D3 or Chart.js, information embedded in images, PDF viewers rendered in the browser, and dynamically loaded content hidden behind JavaScript frameworks.

Claude's vision capability lets you skip all of that complexity. Instead of parsing HTML, you take a screenshot and ask Claude to read what it sees. The data extraction happens at the visual level, making it resilient to DOM changes, anti-scraping measures, and complex rendering pipelines.

Basic Visual Extraction

The simplest form of visual scraping sends a screenshot to Claude with structured output instructions:

import anthropic
import json

client = anthropic.Anthropic()

def extract_table_data(screenshot_b64: str, description: str) -> list[dict]:
    """Extract tabular data from a screenshot using Claude vision."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    },
                },
                {
                    "type": "text",
                    "text": f"""Extract all data from the table visible in this screenshot.

Context: {description}

Return the data as a JSON array of objects where each object represents
a row and the keys are the column headers. Use exact values as shown.
Return ONLY valid JSON, no other text.""",
                },
            ],
        }],
    )

    return json.loads(response.content[0].text)

This function handles any visible table — HTML tables, tables rendered inside canvas, tables in embedded PDFs, even tables in images. Claude reads the visual content and returns structured JSON.

Extracting Data from Charts

Charts are a prime use case for vision-based scraping because the data in a chart is rendered as pixels, not accessible DOM elements. Claude can read bar charts, line charts, pie charts, and more:

def extract_chart_data(screenshot_b64: str, chart_type: str) -> dict:
    """Extract data points from a chart in a screenshot."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    },
                },
                {
                    "type": "text",
                    "text": f"""Analyze this {chart_type} chart and extract all data points.

For each data series, provide:
- series_name: the label of the series
- data_points: array of {{label, value}} objects

Also extract:
- chart_title: the title of the chart
- x_axis_label: the x-axis label
- y_axis_label: the y-axis label

Return as JSON. Estimate numeric values from the chart's axis scale
as precisely as possible.""",
                },
            ],
        }],
    )

    return json.loads(response.content[0].text)

Full-Page Scraping with Scrolling

Real-world scraping often requires scrolling through a page to capture all content. Here is a complete scraper that handles pagination through scrolling:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from playwright.async_api import async_playwright
import asyncio
import base64

class VisualScraper:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.all_data = []

    async def scrape_full_page(self, url: str, extraction_prompt: str) -> list:
        async with async_playwright() as p:
            browser = await p.chromium.launch()
            page = await browser.new_page(viewport={"width": 1280, "height": 800})
            await page.goto(url, wait_until="networkidle")

            prev_screenshot = None
            scroll_count = 0
            max_scrolls = 20

            while scroll_count < max_scrolls:
                screenshot = await page.screenshot()
                screenshot_b64 = base64.standard_b64encode(screenshot).decode()

                # Check if page content has changed after scroll
                if screenshot_b64 == prev_screenshot:
                    break  # Reached bottom of page

                prev_screenshot = screenshot_b64

                # Extract data from current viewport
                data = await self._extract(screenshot_b64, extraction_prompt)
                self.all_data.extend(data)

                # Scroll down
                await page.mouse.wheel(0, 600)
                await asyncio.sleep(1)
                scroll_count += 1

            await browser.close()

        return self._deduplicate(self.all_data)

    async def _extract(self, screenshot_b64: str, prompt: str) -> list:
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": prompt + "\nReturn as JSON array."},
                ],
            }],
        )
        try:
            return json.loads(response.content[0].text)
        except json.JSONDecodeError:
            return []

    def _deduplicate(self, items: list) -> list:
        seen = set()
        unique = []
        for item in items:
            key = json.dumps(item, sort_keys=True)
            if key not in seen:
                seen.add(key)
                unique.append(item)
        return unique

Handling Complex Layouts

Some pages have data spread across cards, tiles, or non-tabular layouts. Claude handles these naturally:

extraction_prompt = """Extract all product listings visible on this page.
For each product, return:
- name: product name
- price: price as shown (include currency symbol)
- rating: star rating if visible
- review_count: number of reviews if shown
- availability: in stock or out of stock
- image_description: brief description of the product image

If any field is not visible for a product, use null."""

scraper = VisualScraper()
products = asyncio.run(
    scraper.scrape_full_page(
        "https://example.com/products",
        extraction_prompt
    )
)

The key advantage here is that Claude understands layout semantics. It knows that a price displayed below a product name belongs to that product, even if the HTML structure groups them in unexpected ways.

Accuracy Considerations

Vision-based extraction is not pixel-perfect for numeric values read from charts. Claude estimates values based on axis scales and visual position. For bar charts, expect accuracy within 2-5% of the actual value. For precise numeric extraction from tables, accuracy is typically above 99% since Claude reads the actual rendered text.

Always validate extracted data against known reference points when possible. For critical applications, extract the same data multiple times and compare results, flagging any discrepancies for human review.

FAQ

How does vision-based scraping handle anti-bot protection?

Since Claude works from screenshots rather than making HTTP requests, it is invisible to server-side anti-bot systems. The browser session itself still needs to avoid detection, but the extraction step happens entirely on the client side through image analysis.

Can Claude extract data from screenshots of mobile layouts?

Yes. Set your browser viewport to a mobile resolution (e.g., 375x812 for iPhone) and Claude will interpret the mobile layout correctly. It understands responsive design patterns like hamburger menus, stacked cards, and collapsible sections.

What is the cost of scraping a 20-page website?

With one screenshot per viewport and an average of 3-5 scrolls per page, that is roughly 60-100 API calls. At Claude Sonnet pricing with image inputs, expect approximately $1-3 for the full scrape. This is significantly more expensive than HTML parsing, so reserve vision-based scraping for pages where traditional methods fail.

#ClaudeWebScraper #VisionAI #DataExtraction #WebScraping #StructuredOutput #AIDataParsing #ComputerUse

Building a Claude Web Scraper: Extracting Data Using Vision Instead of Selectors

Why Vision-Based Scraping?

Basic Visual Extraction

Extracting Data from Charts

Full-Page Scraping with Scrolling

Handling Complex Layouts

Accuracy Considerations

FAQ

How does vision-based scraping handle anti-bot protection?

Can Claude extract data from screenshots of mobile layouts?

What is the cost of scraping a 20-page website?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding