Coreference Resolution: Helping Agents Understand Pronouns and References

The Pronoun Problem in Agent Conversations

Consider this conversation with an AI agent:

User: "I need to reschedule my appointment with Dr. Martinez."
Agent: "When would you like to meet?"
User: "Can she do Thursday afternoon?"

Who is "she"? A human immediately knows it refers to Dr. Martinez. But an agent processing messages independently sees only the word "she" with no link to the doctor mentioned two turns earlier. Coreference resolution is the NLP task that connects pronouns, definite descriptions, and other referring expressions to their antecedents.

Without coreference resolution, agents misinterpret follow-up questions, lose track of entities across turns, and produce confused responses. It is one of the most underappreciated capabilities in conversational AI.

Understanding Coreference Chains

A coreference chain is a set of mentions in a text that all refer to the same real-world entity. In the sentence "Alice said she would bring her laptop," the chain is: [Alice, she, her] — all three refer to the same person.

Types of referring expressions that coreference systems must handle:

Pronouns: he, she, it, they, this, that
Definite descriptions: "the manager," "the previous order"
Proper nouns repeated: "Dr. Martinez" ... "Martinez"
Demonstratives: "this issue," "those items"

Coreference Resolution with spaCy and coreferee

The coreferee library adds coreference resolution to spaCy pipelines.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

import spacy

nlp = spacy.load("en_core_web_trf")
nlp.add_pipe("coreferee")

def resolve_coreferences(text: str) -> dict:
    """Identify coreference chains in text."""
    doc = nlp(text)
    chains = []

    if doc._.coref_chains:
        for chain in doc._.coref_chains:
            mentions = []
            for mention in chain:
                span_tokens = [doc[i] for i in mention]
                mention_text = " ".join(t.text for t in span_tokens)
                start_idx = span_tokens[0].idx
                mentions.append({
                    "text": mention_text,
                    "start": start_idx,
                })
            chains.append(mentions)

    return {"text": text, "chains": chains}

text = "Sarah called the restaurant. She asked if they had a table for two."
result = resolve_coreferences(text)
# chains: [['Sarah', 'She'], ['the restaurant', 'they']]

LLM-Based Coreference Resolution

For production agents, LLMs provide the most flexible coreference resolution, especially across conversation turns.

import openai

def resolve_with_llm(conversation: list[dict]) -> str:
    """Resolve pronouns in the latest message using conversation context."""
    messages = [
        {
            "role": "system",
            "content": """Rewrite the last user message by replacing all
pronouns and references with the specific entities they refer to.
Use the conversation history for context.
Return ONLY the rewritten message, nothing else.""",
        }
    ]
    messages.extend(conversation)

    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        temperature=0,
    )
    return response.choices[0].message.content

conversation = [
    {"role": "user", "content": "I need help with my order from TechCorp."},
    {"role": "assistant", "content": "I can help with that. What is the issue?"},
    {"role": "user", "content": "They shipped it to the wrong address."},
]

resolved = resolve_with_llm(conversation)
# "TechCorp shipped my order to the wrong address."

Building a Context Tracker for Multi-Turn Agents

A robust agent maintains an entity registry that tracks all mentioned entities and resolves references against them.

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Entity:
    name: str
    entity_type: str
    aliases: list[str] = field(default_factory=list)
    last_mentioned_turn: int = 0

class ConversationContextTracker:
    def __init__(self):
        self.entities: list[Entity] = []
        self.turn_count = 0

    def register_entity(self, name: str, entity_type: str):
        """Add a new entity to the registry."""
        for entity in self.entities:
            if entity.name.lower() == name.lower():
                entity.last_mentioned_turn = self.turn_count
                return
        self.entities.append(Entity(
            name=name,
            entity_type=entity_type,
            last_mentioned_turn=self.turn_count,
        ))

    def resolve_pronoun(self, pronoun: str) -> Optional[str]:
        """Resolve a pronoun to the most recently mentioned matching entity."""
        pronoun_map = {
            "he": "PERSON", "him": "PERSON", "his": "PERSON",
            "she": "PERSON", "her": "PERSON", "hers": "PERSON",
            "it": "THING", "its": "THING",
            "they": "ORG", "them": "ORG", "their": "ORG",
        }
        target_type = pronoun_map.get(pronoun.lower())
        if not target_type:
            return None

        candidates = [
            e for e in self.entities
            if e.entity_type == target_type
        ]
        if not candidates:
            return None

        # Return the most recently mentioned entity
        return max(candidates, key=lambda e: e.last_mentioned_turn).name

    def advance_turn(self):
        self.turn_count += 1

Integrating Coreference into Agent Preprocessing

The complete pattern preprocesses each user message before the agent's main reasoning loop.

class CoreferencePreprocessor:
    def __init__(self, nlp_pipeline, context_tracker):
        self.nlp = nlp_pipeline
        self.tracker = context_tracker

    def process_message(self, message: str) -> str:
        """Resolve references and update entity registry."""
        self.tracker.advance_turn()

        doc = self.nlp(message)

        # Register new entities found via NER
        for ent in doc.ents:
            self.tracker.register_entity(ent.text, ent.label_)

        # Replace pronouns with resolved entities
        resolved = message
        for token in reversed(list(doc)):
            if token.pos_ == "PRON":
                entity_name = self.tracker.resolve_pronoun(token.text)
                if entity_name:
                    resolved = (
                        resolved[:token.idx]
                        + entity_name
                        + resolved[token.idx + len(token.text):]
                    )

        return resolved

By resolving coreferences before the agent processes the message, you ensure that tool calls, database queries, and API requests use explicit entity names rather than ambiguous pronouns.

FAQ

How accurate are current coreference resolution systems?

State-of-the-art neural coreference models achieve around 80-85% F1 score on benchmark datasets like OntoNotes. LLM-based resolution using GPT-4 class models tends to perform better in conversational contexts, reaching 90%+ accuracy for common pronoun patterns. However, complex cases involving nested references, cataphora (forward references), or ambiguous gender still challenge all systems.

How do I handle coreference across very long conversations?

Maintain a sliding window of the most recently mentioned entities rather than tracking the entire conversation history. Entities mentioned more recently are more likely to be referents. A window of 5 to 10 turns covers the vast majority of coreference patterns in natural conversation. For entities that persist across the entire conversation (like the user's name), promote them to a "pinned" status in your tracker.

Should I resolve coreferences before or after intent classification?

Resolve coreferences before intent classification. If a user says "Cancel it," the intent classifier needs to know what "it" refers to in order to route correctly. Is it a subscription cancellation, an order cancellation, or an appointment cancellation? Resolving the pronoun first produces "Cancel the subscription," which the intent classifier can handle accurately.

#CoreferenceResolution #NLP #Anaphora #ContextTracking #AIAgents #Python #AgenticAI #LearnAI #AIEngineering

Coreference Resolution: Helping Agents Understand Pronouns and References

The Pronoun Problem in Agent Conversations

Understanding Coreference Chains

Coreference Resolution with spaCy and coreferee

LLM-Based Coreference Resolution

Building a Context Tracker for Multi-Turn Agents

Integrating Coreference into Agent Preprocessing

FAQ

How accurate are current coreference resolution systems?

How do I handle coreference across very long conversations?

Should I resolve coreferences before or after intent classification?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding