Skip to content
Learn Agentic AI14 min read0 views

AI Agent for Spreadsheet Automation: Formulas, Formatting, and Analysis on Demand

Build an AI agent that reads and writes spreadsheets, generates formulas from natural language descriptions, cleans messy data, creates pivot tables, and applies conditional formatting — turning plain English requests into Excel operations.

The Spreadsheet Automation Opportunity

Spreadsheets remain the most widely used data tool in business. Yet most users only scratch the surface — they know SUM and VLOOKUP but struggle with INDEX-MATCH, dynamic arrays, pivot tables, and conditional formatting rules. An AI spreadsheet agent accepts natural language instructions and translates them into precise spreadsheet operations.

Instead of Googling "Excel formula to find second highest value in column B where column A equals 'West'", the user simply tells the agent what they need. The agent reads the spreadsheet, understands its structure, and applies the right formula, formatting, or analysis.

Reading Spreadsheet Structure

The first tool inspects a workbook and reports its structure:

import openpyxl
from openpyxl.utils import get_column_letter
from agents import Agent, Runner, function_tool

_workbooks: dict[str, openpyxl.Workbook] = {}
_paths: dict[str, str] = {}

@function_tool
def open_spreadsheet(file_path: str, name: str = "default") -> str:
    """Open an Excel file and return its structure: sheet names,
    dimensions, column headers, and sample data."""
    try:
        wb = openpyxl.load_workbook(file_path, data_only=True)
    except Exception as e:
        return f"Error opening file: {e}"

    _workbooks[name] = wb
    _paths[name] = file_path

    parts = [f"Workbook: {file_path}", f"Sheets: {wb.sheetnames}"]

    for sheet_name in wb.sheetnames:
        ws = wb[sheet_name]
        parts.append(f"\nSheet '{sheet_name}': {ws.max_row} rows x {ws.max_column} cols")

        # Headers (first row)
        headers = []
        for col in range(1, ws.max_column + 1):
            val = ws.cell(row=1, column=col).value
            headers.append(f"{get_column_letter(col)}: {val}")
        parts.append(f"  Headers: {', '.join(headers)}")

        # Sample data (rows 2-4)
        parts.append("  Sample data:")
        for row in range(2, min(5, ws.max_row + 1)):
            vals = []
            for col in range(1, ws.max_column + 1):
                vals.append(str(ws.cell(row=row, column=col).value))
            parts.append(f"    Row {row}: {' | '.join(vals)}")

    return "\n".join(parts)

Formula Generation Tool

This tool lets the agent write formulas to specific cells:

@function_tool
def set_formula(
    name: str, sheet: str, cell: str, formula: str
) -> str:
    """Write a formula to a specific cell in the spreadsheet.
    Example: set_formula('default', 'Sheet1', 'D2', '=B2*C2')"""
    if name not in _workbooks:
        return f"No workbook '{name}' open."

    wb = _workbooks[name]
    if sheet not in wb.sheetnames:
        return f"Sheet '{sheet}' not found. Available: {wb.sheetnames}"

    ws = wb[sheet]
    try:
        ws[cell] = formula
        return f"Set {cell} = {formula}"
    except Exception as e:
        return f"Error setting formula: {e}"

@function_tool
def set_formulas_range(
    name: str, sheet: str, start_cell: str, formula: str, count: int
) -> str:
    """Apply a formula down a column for multiple rows.
    The formula should use relative references that will auto-adjust."""
    if name not in _workbooks:
        return f"No workbook '{name}' open."

    ws = _workbooks[name][sheet]

    import re
    match = re.match(r"([A-Z]+)(\d+)", start_cell)
    if not match:
        return "Invalid cell reference."

    col_letter = match.group(1)
    start_row = int(match.group(2))

    for i in range(count):
        row = start_row + i
        # Adjust row references in formula
        adjusted = re.sub(
            r"([A-Z]+)(\d+)",
            lambda m: f"{m.group(1)}{int(m.group(2)) + i}",
            formula,
        )
        ws[f"{col_letter}{row}"] = adjusted

    return f"Applied formula to {col_letter}{start_row}:{col_letter}{start_row + count - 1}"

Data Cleaning Tool

Messy spreadsheets need cleaning before analysis. This tool handles common data quality issues:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

@function_tool
def clean_column(
    name: str, sheet: str, column: str, operation: str
) -> str:
    """Clean data in a spreadsheet column.
    Operations: trim_whitespace, remove_duplicates, fill_blanks_above,
    to_number, to_date, standardize_case."""
    if name not in _workbooks:
        return f"No workbook '{name}' open."

    ws = _workbooks[name][sheet]
    col_idx = openpyxl.utils.column_index_from_string(column)

    changed = 0
    for row in range(2, ws.max_row + 1):
        cell = ws.cell(row=row, column=col_idx)
        original = cell.value

        if operation == "trim_whitespace" and isinstance(original, str):
            cell.value = original.strip()
        elif operation == "standardize_case" and isinstance(original, str):
            cell.value = original.title()
        elif operation == "to_number" and isinstance(original, str):
            cleaned = original.replace(",", "").replace("$", "").strip()
            try:
                cell.value = float(cleaned)
            except ValueError:
                continue
        elif operation == "fill_blanks_above" and original is None:
            above = ws.cell(row=row - 1, column=col_idx).value
            if above is not None:
                cell.value = above
        else:
            continue

        if cell.value != original:
            changed += 1

    return f"Cleaned column {column}: {changed} cells modified."

Pivot Table Generation Tool

Pivot tables are the most requested spreadsheet feature — and one of the hardest for non-experts. This tool creates them programmatically:

import pandas as pd

@function_tool
def create_pivot(
    name: str,
    sheet: str,
    rows: str,
    values: str,
    agg_func: str = "sum",
    output_sheet: str = "Pivot",
) -> str:
    """Create a pivot table from spreadsheet data.
    rows: column name for grouping. values: column name to aggregate.
    agg_func: sum, mean, count, min, max."""
    if name not in _workbooks:
        return f"No workbook '{name}' open."

    wb = _workbooks[name]
    ws = wb[sheet]

    # Convert sheet to DataFrame
    data = ws.values
    headers = next(data)
    df = pd.DataFrame(data, columns=headers)

    if rows not in df.columns or values not in df.columns:
        return f"Column not found. Available: {list(df.columns)}"

    try:
        pivot = df.pivot_table(index=rows, values=values, aggfunc=agg_func)
        pivot = pivot.sort_values(values, ascending=False)
    except Exception as e:
        return f"Pivot error: {e}"

    # Write pivot to new sheet
    if output_sheet in wb.sheetnames:
        del wb[output_sheet]
    pivot_ws = wb.create_sheet(output_sheet)

    pivot_ws.cell(row=1, column=1, value=rows)
    pivot_ws.cell(row=1, column=2, value=f"{agg_func}({values})")

    for i, (idx, row) in enumerate(pivot.iterrows(), start=2):
        pivot_ws.cell(row=i, column=1, value=idx)
        pivot_ws.cell(row=i, column=2, value=round(row[values], 2))

    return f"Pivot table created in sheet '{output_sheet}': {len(pivot)} rows."

Save Tool

After all modifications, save the workbook:

@function_tool
def save_spreadsheet(name: str, output_path: str = "") -> str:
    """Save the modified spreadsheet. If output_path is empty,
    overwrites the original file."""
    if name not in _workbooks:
        return f"No workbook '{name}' open."

    path = output_path or _paths.get(name, "output.xlsx")
    try:
        _workbooks[name].save(path)
        return f"Saved to {path}"
    except Exception as e:
        return f"Save error: {e}"

Assembling the Spreadsheet Agent

spreadsheet_agent = Agent(
    name="Spreadsheet Agent",
    instructions="""You are a spreadsheet automation agent. When given a task:
1. Call open_spreadsheet to understand the file structure and data.
2. For formula requests, use set_formula or set_formulas_range.
   Write standard Excel formulas (SUM, VLOOKUP, INDEX/MATCH, IF, etc).
3. For data cleaning, use clean_column with the appropriate operation.
4. For analysis requests, use create_pivot to summarize data.
5. Always call save_spreadsheet when modifications are complete.
6. Explain what you did in plain language so the user can verify.

Formula rules:
- Use absolute references ($A$1) for fixed cells.
- Use relative references (A1) for formulas that will be copied down.
- Prefer INDEX/MATCH over VLOOKUP for flexibility.
- Use IFERROR to handle potential errors gracefully.""",
    tools=[
        open_spreadsheet, set_formula, set_formulas_range,
        clean_column, create_pivot, save_spreadsheet,
    ],
)

Example: Complete Workflow

result = Runner.run_sync(
    spreadsheet_agent,
    "Open sales_report.xlsx. Clean the Revenue column by removing dollar "
    "signs and converting to numbers. Add a Profit column that subtracts "
    "Cost from Revenue. Create a pivot table showing total profit by Region. "
    "Save as sales_report_updated.xlsx.",
)
print(result.final_output)

The agent opens the file, inspects its structure (finds columns: Region, Product, Revenue, Cost), cleans the Revenue column (removes $ and commas, converts to float), writes =B2-C2 formulas down the Profit column, creates a pivot table grouped by Region summing Profit, and saves the updated file.

FAQ

Can this agent work with Google Sheets instead of Excel?

Yes. Replace openpyxl with the gspread library and Google Sheets API. The agent tools stay conceptually the same — open, read, write formulas, save — but the underlying API calls change. Google Sheets also supports Apps Script for more advanced automation.

How does the agent handle spreadsheets with multiple header rows or merged cells?

Add logic in open_spreadsheet to detect merged cells (ws.merged_cells.ranges) and multi-row headers. Report these to the agent so it can adjust its formula row references accordingly. Unmerging cells before processing is often the simplest approach.

What is the maximum spreadsheet size this can handle?

OpenPyXL handles files up to several hundred thousand rows efficiently. For very large files (1M+ rows), switch to openpyxl in read-only mode for reading and use xlsxwriter for writing. The agent interface remains the same — only the underlying library changes.


#Spreadsheet #Excel #OpenPyXL #Automation #AIAgents #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.