AI Agent for Spreadsheet Automation: Formulas, Formatting, and Analysis on Demand
Build an AI agent that reads and writes spreadsheets, generates formulas from natural language descriptions, cleans messy data, creates pivot tables, and applies conditional formatting — turning plain English requests into Excel operations.
The Spreadsheet Automation Opportunity
Spreadsheets remain the most widely used data tool in business. Yet most users only scratch the surface — they know SUM and VLOOKUP but struggle with INDEX-MATCH, dynamic arrays, pivot tables, and conditional formatting rules. An AI spreadsheet agent accepts natural language instructions and translates them into precise spreadsheet operations.
Instead of Googling "Excel formula to find second highest value in column B where column A equals 'West'", the user simply tells the agent what they need. The agent reads the spreadsheet, understands its structure, and applies the right formula, formatting, or analysis.
Reading Spreadsheet Structure
The first tool inspects a workbook and reports its structure:
import openpyxl
from openpyxl.utils import get_column_letter
from agents import Agent, Runner, function_tool
_workbooks: dict[str, openpyxl.Workbook] = {}
_paths: dict[str, str] = {}
@function_tool
def open_spreadsheet(file_path: str, name: str = "default") -> str:
"""Open an Excel file and return its structure: sheet names,
dimensions, column headers, and sample data."""
try:
wb = openpyxl.load_workbook(file_path, data_only=True)
except Exception as e:
return f"Error opening file: {e}"
_workbooks[name] = wb
_paths[name] = file_path
parts = [f"Workbook: {file_path}", f"Sheets: {wb.sheetnames}"]
for sheet_name in wb.sheetnames:
ws = wb[sheet_name]
parts.append(f"\nSheet '{sheet_name}': {ws.max_row} rows x {ws.max_column} cols")
# Headers (first row)
headers = []
for col in range(1, ws.max_column + 1):
val = ws.cell(row=1, column=col).value
headers.append(f"{get_column_letter(col)}: {val}")
parts.append(f" Headers: {', '.join(headers)}")
# Sample data (rows 2-4)
parts.append(" Sample data:")
for row in range(2, min(5, ws.max_row + 1)):
vals = []
for col in range(1, ws.max_column + 1):
vals.append(str(ws.cell(row=row, column=col).value))
parts.append(f" Row {row}: {' | '.join(vals)}")
return "\n".join(parts)
Formula Generation Tool
This tool lets the agent write formulas to specific cells:
@function_tool
def set_formula(
name: str, sheet: str, cell: str, formula: str
) -> str:
"""Write a formula to a specific cell in the spreadsheet.
Example: set_formula('default', 'Sheet1', 'D2', '=B2*C2')"""
if name not in _workbooks:
return f"No workbook '{name}' open."
wb = _workbooks[name]
if sheet not in wb.sheetnames:
return f"Sheet '{sheet}' not found. Available: {wb.sheetnames}"
ws = wb[sheet]
try:
ws[cell] = formula
return f"Set {cell} = {formula}"
except Exception as e:
return f"Error setting formula: {e}"
@function_tool
def set_formulas_range(
name: str, sheet: str, start_cell: str, formula: str, count: int
) -> str:
"""Apply a formula down a column for multiple rows.
The formula should use relative references that will auto-adjust."""
if name not in _workbooks:
return f"No workbook '{name}' open."
ws = _workbooks[name][sheet]
import re
match = re.match(r"([A-Z]+)(\d+)", start_cell)
if not match:
return "Invalid cell reference."
col_letter = match.group(1)
start_row = int(match.group(2))
for i in range(count):
row = start_row + i
# Adjust row references in formula
adjusted = re.sub(
r"([A-Z]+)(\d+)",
lambda m: f"{m.group(1)}{int(m.group(2)) + i}",
formula,
)
ws[f"{col_letter}{row}"] = adjusted
return f"Applied formula to {col_letter}{start_row}:{col_letter}{start_row + count - 1}"
Data Cleaning Tool
Messy spreadsheets need cleaning before analysis. This tool handles common data quality issues:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
@function_tool
def clean_column(
name: str, sheet: str, column: str, operation: str
) -> str:
"""Clean data in a spreadsheet column.
Operations: trim_whitespace, remove_duplicates, fill_blanks_above,
to_number, to_date, standardize_case."""
if name not in _workbooks:
return f"No workbook '{name}' open."
ws = _workbooks[name][sheet]
col_idx = openpyxl.utils.column_index_from_string(column)
changed = 0
for row in range(2, ws.max_row + 1):
cell = ws.cell(row=row, column=col_idx)
original = cell.value
if operation == "trim_whitespace" and isinstance(original, str):
cell.value = original.strip()
elif operation == "standardize_case" and isinstance(original, str):
cell.value = original.title()
elif operation == "to_number" and isinstance(original, str):
cleaned = original.replace(",", "").replace("$", "").strip()
try:
cell.value = float(cleaned)
except ValueError:
continue
elif operation == "fill_blanks_above" and original is None:
above = ws.cell(row=row - 1, column=col_idx).value
if above is not None:
cell.value = above
else:
continue
if cell.value != original:
changed += 1
return f"Cleaned column {column}: {changed} cells modified."
Pivot Table Generation Tool
Pivot tables are the most requested spreadsheet feature — and one of the hardest for non-experts. This tool creates them programmatically:
import pandas as pd
@function_tool
def create_pivot(
name: str,
sheet: str,
rows: str,
values: str,
agg_func: str = "sum",
output_sheet: str = "Pivot",
) -> str:
"""Create a pivot table from spreadsheet data.
rows: column name for grouping. values: column name to aggregate.
agg_func: sum, mean, count, min, max."""
if name not in _workbooks:
return f"No workbook '{name}' open."
wb = _workbooks[name]
ws = wb[sheet]
# Convert sheet to DataFrame
data = ws.values
headers = next(data)
df = pd.DataFrame(data, columns=headers)
if rows not in df.columns or values not in df.columns:
return f"Column not found. Available: {list(df.columns)}"
try:
pivot = df.pivot_table(index=rows, values=values, aggfunc=agg_func)
pivot = pivot.sort_values(values, ascending=False)
except Exception as e:
return f"Pivot error: {e}"
# Write pivot to new sheet
if output_sheet in wb.sheetnames:
del wb[output_sheet]
pivot_ws = wb.create_sheet(output_sheet)
pivot_ws.cell(row=1, column=1, value=rows)
pivot_ws.cell(row=1, column=2, value=f"{agg_func}({values})")
for i, (idx, row) in enumerate(pivot.iterrows(), start=2):
pivot_ws.cell(row=i, column=1, value=idx)
pivot_ws.cell(row=i, column=2, value=round(row[values], 2))
return f"Pivot table created in sheet '{output_sheet}': {len(pivot)} rows."
Save Tool
After all modifications, save the workbook:
@function_tool
def save_spreadsheet(name: str, output_path: str = "") -> str:
"""Save the modified spreadsheet. If output_path is empty,
overwrites the original file."""
if name not in _workbooks:
return f"No workbook '{name}' open."
path = output_path or _paths.get(name, "output.xlsx")
try:
_workbooks[name].save(path)
return f"Saved to {path}"
except Exception as e:
return f"Save error: {e}"
Assembling the Spreadsheet Agent
spreadsheet_agent = Agent(
name="Spreadsheet Agent",
instructions="""You are a spreadsheet automation agent. When given a task:
1. Call open_spreadsheet to understand the file structure and data.
2. For formula requests, use set_formula or set_formulas_range.
Write standard Excel formulas (SUM, VLOOKUP, INDEX/MATCH, IF, etc).
3. For data cleaning, use clean_column with the appropriate operation.
4. For analysis requests, use create_pivot to summarize data.
5. Always call save_spreadsheet when modifications are complete.
6. Explain what you did in plain language so the user can verify.
Formula rules:
- Use absolute references ($A$1) for fixed cells.
- Use relative references (A1) for formulas that will be copied down.
- Prefer INDEX/MATCH over VLOOKUP for flexibility.
- Use IFERROR to handle potential errors gracefully.""",
tools=[
open_spreadsheet, set_formula, set_formulas_range,
clean_column, create_pivot, save_spreadsheet,
],
)
Example: Complete Workflow
result = Runner.run_sync(
spreadsheet_agent,
"Open sales_report.xlsx. Clean the Revenue column by removing dollar "
"signs and converting to numbers. Add a Profit column that subtracts "
"Cost from Revenue. Create a pivot table showing total profit by Region. "
"Save as sales_report_updated.xlsx.",
)
print(result.final_output)
The agent opens the file, inspects its structure (finds columns: Region, Product, Revenue, Cost), cleans the Revenue column (removes $ and commas, converts to float), writes =B2-C2 formulas down the Profit column, creates a pivot table grouped by Region summing Profit, and saves the updated file.
FAQ
Can this agent work with Google Sheets instead of Excel?
Yes. Replace openpyxl with the gspread library and Google Sheets API. The agent tools stay conceptually the same — open, read, write formulas, save — but the underlying API calls change. Google Sheets also supports Apps Script for more advanced automation.
How does the agent handle spreadsheets with multiple header rows or merged cells?
Add logic in open_spreadsheet to detect merged cells (ws.merged_cells.ranges) and multi-row headers. Report these to the agent so it can adjust its formula row references accordingly. Unmerging cells before processing is often the simplest approach.
What is the maximum spreadsheet size this can handle?
OpenPyXL handles files up to several hundred thousand rows efficiently. For very large files (1M+ rows), switch to openpyxl in read-only mode for reading and use xlsxwriter for writing. The agent interface remains the same — only the underlying library changes.
#Spreadsheet #Excel #OpenPyXL #Automation #AIAgents #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.