Build a News Aggregation Agent: Source Monitoring, Summarization, and Personalized Feeds

Why Build a News Aggregation Agent

Information overload is a daily reality. Between dozens of news sites, blogs, and newsletters, staying informed without drowning in content requires aggressive filtering and summarization. A news aggregation agent automates the entire workflow: it monitors sources, pulls new articles, summarizes them, and generates a personalized digest based on your interests.

This tutorial builds a complete news aggregation system with RSS parsing, article summarization, preference learning, and digest generation.

Project Setup

mkdir news-agent && cd news-agent
python -m venv venv && source venv/bin/activate
pip install openai-agents pydantic
mkdir -p src
touch src/__init__.py src/feed_parser.py src/summarizer.py
touch src/preferences.py src/agent.py

Step 1: Build the Feed Parser

We simulate RSS feed parsing with structured article data. In production, use the feedparser library to pull real RSS feeds.

# src/feed_parser.py
from datetime import datetime, timedelta
import random
from pydantic import BaseModel

class Article(BaseModel):
    id: str
    title: str
    source: str
    url: str
    published: str
    category: str
    content_preview: str  # first 200 chars

MOCK_ARTICLES = [
    Article(id="a001", title="New Breakthrough in Quantum Computing",
            source="TechCrunch", url="https://example.com/quantum",
            published="2026-03-17", category="technology",
            content_preview="Researchers at MIT have demonstrated a 1000-qubit quantum processor that maintains coherence for over 10 milliseconds, a significant leap that could accelerate drug discovery and materials science."),
    Article(id="a002", title="Federal Reserve Holds Interest Rates Steady",
            source="Reuters", url="https://example.com/fed-rates",
            published="2026-03-17", category="finance",
            content_preview="The Federal Reserve announced it will maintain the current interest rate, citing stable inflation and strong employment numbers. Markets responded positively with the S&P 500 rising 0.8 percent."),
    Article(id="a003", title="AI Agents Transform Customer Service Industry",
            source="Wired", url="https://example.com/ai-cs",
            published="2026-03-17", category="technology",
            content_preview="Companies deploying AI agents for customer service report 40 percent faster resolution times and 25 percent cost reduction. The shift from chatbots to autonomous agents marks a new era in support."),
    Article(id="a004", title="Climate Summit Reaches New Emissions Agreement",
            source="BBC News", url="https://example.com/climate",
            published="2026-03-16", category="environment",
            content_preview="World leaders at the 2026 Climate Summit agreed to reduce industrial emissions by 35 percent before 2035. The agreement includes binding commitments from the top 20 emitting nations."),
    Article(id="a005", title="SpaceX Launches Next-Gen Starlink Satellites",
            source="Ars Technica", url="https://example.com/starlink",
            published="2026-03-16", category="space",
            content_preview="SpaceX successfully launched 60 next-generation Starlink satellites with direct-to-cell capabilities. The new constellation aims to provide global cellular connectivity by late 2026."),
    Article(id="a006", title="Python 3.15 Released with Pattern Matching Upgrades",
            source="InfoWorld", url="https://example.com/python315",
            published="2026-03-16", category="technology",
            content_preview="Python 3.15 introduces exhaustiveness checking for match statements, improved type narrowing, and a new concurrent.futures API that simplifies async task management."),
    Article(id="a007", title="Major Healthcare Provider Adopts AI Diagnostics",
            source="STAT News", url="https://example.com/ai-health",
            published="2026-03-15", category="health",
            content_preview="Kaiser Permanente announced full deployment of AI-assisted diagnostic tools across its network, helping radiologists detect early-stage cancers with 15 percent higher accuracy."),
    Article(id="a008", title="Electric Vehicle Sales Surge 45 Percent in Q1",
            source="Bloomberg", url="https://example.com/ev-sales",
            published="2026-03-15", category="automotive",
            content_preview="Global electric vehicle sales grew 45 percent in Q1 2026 compared to the same period last year, driven by new affordable models from Chinese manufacturers entering European markets."),
]

def fetch_articles(
    category: str | None = None, days: int = 7,
) -> list[Article]:
    cutoff = (
        datetime.now() - timedelta(days=days)
    ).strftime("%Y-%m-%d")
    filtered = [
        a for a in MOCK_ARTICLES if a.published >= cutoff
    ]
    if category:
        filtered = [
            a for a in filtered
            if a.category.lower() == category.lower()
        ]
    return filtered

def get_categories() -> list[str]:
    return list(set(a.category for a in MOCK_ARTICLES))

Step 2: Article Summarizer

The summarizer condenses articles into brief summaries. We use extractive summarization (selecting key sentences) as the baseline. The agent's LLM provides abstractive summarization on top.

# src/summarizer.py
from src.feed_parser import Article

def summarize_article(article: Article) -> dict:
    return {
        "title": article.title,
        "source": article.source,
        "date": article.published,
        "category": article.category,
        "summary": article.content_preview,
        "url": article.url,
    }

def create_digest(
    articles: list[Article], max_articles: int = 5,
) -> str:
    lines = [
        f"=== News Digest ({len(articles)} articles) ===\n"
    ]
    for article in articles[:max_articles]:
        summary = summarize_article(article)
        lines.append(f"**{summary['title']}**")
        lines.append(
            f"Source: {summary['source']} | "
            f"{summary['date']} | {summary['category']}"
        )
        lines.append(f"{summary['summary']}")
        lines.append(f"Read more: {summary['url']}\n")
    return "\n".join(lines)

Step 3: Preference Engine

The preference engine tracks which categories the user reads most and uses that to rank future articles.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

# src/preferences.py
import json

class UserPreferences:
    def __init__(self):
        self.category_scores: dict[str, float] = {}
        self.read_articles: set[str] = set()
        self.blocked_sources: set[str] = set()

    def record_read(self, category: str, article_id: str):
        self.category_scores[category] = (
            self.category_scores.get(category, 0) + 1.0
        )
        self.read_articles.add(article_id)

    def block_source(self, source: str):
        self.blocked_sources.add(source.lower())

    def get_top_categories(self, n: int = 3) -> list[str]:
        sorted_cats = sorted(
            self.category_scores.items(),
            key=lambda x: x[1], reverse=True,
        )
        return [cat for cat, _ in sorted_cats[:n]]

    def score_article(self, article) -> float:
        if article.source.lower() in self.blocked_sources:
            return -1.0
        if article.id in self.read_articles:
            return -1.0
        return self.category_scores.get(article.category, 0.5)

    def get_profile(self) -> str:
        if not self.category_scores:
            return "No preferences recorded yet."
        top = self.get_top_categories()
        return (
            f"Top interests: {', '.join(top)}\n"
            f"Articles read: {len(self.read_articles)}\n"
            f"Blocked sources: {', '.join(self.blocked_sources) or 'none'}"
        )

preferences = UserPreferences()

Step 4: Build the Agent

# src/agent.py
import asyncio
from agents import Agent, Runner, function_tool
from src.feed_parser import fetch_articles, get_categories
from src.summarizer import create_digest
from src.preferences import preferences

@function_tool
def get_news(category: str = "", days: int = 7) -> str:
    """Fetch recent news articles, optionally filtered by category."""
    cat = category if category else None
    articles = fetch_articles(cat, days)
    if not articles:
        return "No articles found."
    # Score and sort by preference
    scored = sorted(
        articles,
        key=lambda a: preferences.score_article(a),
        reverse=True,
    )
    return create_digest(scored)

@function_tool
def get_available_categories() -> str:
    """List available news categories."""
    return ", ".join(get_categories())

@function_tool
def mark_as_read(article_id: str, category: str) -> str:
    """Record that the user read an article."""
    preferences.record_read(category, article_id)
    return f"Recorded: {article_id} in {category}"

@function_tool
def block_news_source(source: str) -> str:
    """Block a news source from appearing in feeds."""
    preferences.block_source(source)
    return f"Blocked source: {source}"

@function_tool
def view_preferences() -> str:
    """View user reading preferences."""
    return preferences.get_profile()

news_agent = Agent(
    name="News Aggregator",
    instructions="""You are a personalized news aggregation agent.
Fetch and summarize news for the user based on their interests.
Track their reading habits to improve recommendations over time.
Present articles clearly with source attribution.
If the user mentions a topic, search for that category first.""",
    tools=[
        get_news, get_available_categories,
        mark_as_read, block_news_source, view_preferences,
    ],
)

async def main():
    result = await Runner.run(
        news_agent,
        "Show me the latest tech news and any major headlines "
        "from today. Skip anything from Bloomberg.",
    )
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())

The agent blocks Bloomberg, fetches technology articles and today's headlines, then presents a curated digest with summaries.

FAQ

How do I connect this to real RSS feeds?

Install feedparser (pip install feedparser) and replace the MOCK_ARTICLES list with a function that parses real RSS URLs. Call feedparser.parse(url) for each feed, extract title, link, published date, and summary fields, and convert them into Article models. The rest of the pipeline — summarization, preference scoring, and digest generation — works unchanged.

Can the agent generate email digests automatically?

Yes. Add a send_digest_email tool that formats the digest as HTML and sends it via SMTP or an email API like SendGrid. Schedule the agent to run daily using cron, and it will generate a personalized digest based on accumulated preferences and deliver it to your inbox.

How does the preference learning improve over time?

Every time you read an article or ask about a specific topic, the agent calls mark_as_read, incrementing that category's score. Articles in higher-scored categories float to the top of future digests. Over weeks of use, the system naturally prioritizes topics you engage with most and de-prioritizes ones you ignore.

#NewsAggregation #AIAgent #Python #RSS #Summarization #AgenticAI #LearnAI #AIEngineering

Build a News Aggregation Agent: Source Monitoring, Summarization, and Personalized Feeds

Why Build a News Aggregation Agent

Project Setup

Step 1: Build the Feed Parser

Step 2: Article Summarizer

Step 3: Preference Engine

Step 4: Build the Agent

FAQ

How do I connect this to real RSS feeds?

Can the agent generate email digests automatically?

How does the preference learning improve over time?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding