Coreference Resolution: Helping Agents Understand Pronouns and References
Learn how coreference resolution enables AI agents to track pronouns and references across conversation turns, with practical implementations using spaCy, neural models, and LLM-based approaches.
The Pronoun Problem in Agent Conversations
Consider this conversation with an AI agent:
- User: "I need to reschedule my appointment with Dr. Martinez."
- Agent: "When would you like to meet?"
- User: "Can she do Thursday afternoon?"
Who is "she"? A human immediately knows it refers to Dr. Martinez. But an agent processing messages independently sees only the word "she" with no link to the doctor mentioned two turns earlier. Coreference resolution is the NLP task that connects pronouns, definite descriptions, and other referring expressions to their antecedents.
Without coreference resolution, agents misinterpret follow-up questions, lose track of entities across turns, and produce confused responses. It is one of the most underappreciated capabilities in conversational AI.
Understanding Coreference Chains
A coreference chain is a set of mentions in a text that all refer to the same real-world entity. In the sentence "Alice said she would bring her laptop," the chain is: [Alice, she, her] — all three refer to the same person.
Types of referring expressions that coreference systems must handle:
- Pronouns: he, she, it, they, this, that
- Definite descriptions: "the manager," "the previous order"
- Proper nouns repeated: "Dr. Martinez" ... "Martinez"
- Demonstratives: "this issue," "those items"
Coreference Resolution with spaCy and coreferee
The coreferee library adds coreference resolution to spaCy pipelines.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import spacy
nlp = spacy.load("en_core_web_trf")
nlp.add_pipe("coreferee")
def resolve_coreferences(text: str) -> dict:
"""Identify coreference chains in text."""
doc = nlp(text)
chains = []
if doc._.coref_chains:
for chain in doc._.coref_chains:
mentions = []
for mention in chain:
span_tokens = [doc[i] for i in mention]
mention_text = " ".join(t.text for t in span_tokens)
start_idx = span_tokens[0].idx
mentions.append({
"text": mention_text,
"start": start_idx,
})
chains.append(mentions)
return {"text": text, "chains": chains}
text = "Sarah called the restaurant. She asked if they had a table for two."
result = resolve_coreferences(text)
# chains: [['Sarah', 'She'], ['the restaurant', 'they']]
LLM-Based Coreference Resolution
For production agents, LLMs provide the most flexible coreference resolution, especially across conversation turns.
import openai
def resolve_with_llm(conversation: list[dict]) -> str:
"""Resolve pronouns in the latest message using conversation context."""
messages = [
{
"role": "system",
"content": """Rewrite the last user message by replacing all
pronouns and references with the specific entities they refer to.
Use the conversation history for context.
Return ONLY the rewritten message, nothing else.""",
}
]
messages.extend(conversation)
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0,
)
return response.choices[0].message.content
conversation = [
{"role": "user", "content": "I need help with my order from TechCorp."},
{"role": "assistant", "content": "I can help with that. What is the issue?"},
{"role": "user", "content": "They shipped it to the wrong address."},
]
resolved = resolve_with_llm(conversation)
# "TechCorp shipped my order to the wrong address."
Building a Context Tracker for Multi-Turn Agents
A robust agent maintains an entity registry that tracks all mentioned entities and resolves references against them.
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class Entity:
name: str
entity_type: str
aliases: list[str] = field(default_factory=list)
last_mentioned_turn: int = 0
class ConversationContextTracker:
def __init__(self):
self.entities: list[Entity] = []
self.turn_count = 0
def register_entity(self, name: str, entity_type: str):
"""Add a new entity to the registry."""
for entity in self.entities:
if entity.name.lower() == name.lower():
entity.last_mentioned_turn = self.turn_count
return
self.entities.append(Entity(
name=name,
entity_type=entity_type,
last_mentioned_turn=self.turn_count,
))
def resolve_pronoun(self, pronoun: str) -> Optional[str]:
"""Resolve a pronoun to the most recently mentioned matching entity."""
pronoun_map = {
"he": "PERSON", "him": "PERSON", "his": "PERSON",
"she": "PERSON", "her": "PERSON", "hers": "PERSON",
"it": "THING", "its": "THING",
"they": "ORG", "them": "ORG", "their": "ORG",
}
target_type = pronoun_map.get(pronoun.lower())
if not target_type:
return None
candidates = [
e for e in self.entities
if e.entity_type == target_type
]
if not candidates:
return None
# Return the most recently mentioned entity
return max(candidates, key=lambda e: e.last_mentioned_turn).name
def advance_turn(self):
self.turn_count += 1
Integrating Coreference into Agent Preprocessing
The complete pattern preprocesses each user message before the agent's main reasoning loop.
class CoreferencePreprocessor:
def __init__(self, nlp_pipeline, context_tracker):
self.nlp = nlp_pipeline
self.tracker = context_tracker
def process_message(self, message: str) -> str:
"""Resolve references and update entity registry."""
self.tracker.advance_turn()
doc = self.nlp(message)
# Register new entities found via NER
for ent in doc.ents:
self.tracker.register_entity(ent.text, ent.label_)
# Replace pronouns with resolved entities
resolved = message
for token in reversed(list(doc)):
if token.pos_ == "PRON":
entity_name = self.tracker.resolve_pronoun(token.text)
if entity_name:
resolved = (
resolved[:token.idx]
+ entity_name
+ resolved[token.idx + len(token.text):]
)
return resolved
By resolving coreferences before the agent processes the message, you ensure that tool calls, database queries, and API requests use explicit entity names rather than ambiguous pronouns.
FAQ
How accurate are current coreference resolution systems?
State-of-the-art neural coreference models achieve around 80-85% F1 score on benchmark datasets like OntoNotes. LLM-based resolution using GPT-4 class models tends to perform better in conversational contexts, reaching 90%+ accuracy for common pronoun patterns. However, complex cases involving nested references, cataphora (forward references), or ambiguous gender still challenge all systems.
How do I handle coreference across very long conversations?
Maintain a sliding window of the most recently mentioned entities rather than tracking the entire conversation history. Entities mentioned more recently are more likely to be referents. A window of 5 to 10 turns covers the vast majority of coreference patterns in natural conversation. For entities that persist across the entire conversation (like the user's name), promote them to a "pinned" status in your tracker.
Should I resolve coreferences before or after intent classification?
Resolve coreferences before intent classification. If a user says "Cancel it," the intent classifier needs to know what "it" refers to in order to route correctly. Is it a subscription cancellation, an order cancellation, or an appointment cancellation? Resolving the pronoun first produces "Cancel the subscription," which the intent classifier can handle accurately.
#CoreferenceResolution #NLP #Anaphora #ContextTracking #AIAgents #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.