Building a Translation Memory for AI Agents: Consistent Terminology Across Interactions
Implement translation memory systems with term glossaries, translation caching, and consistency enforcement to maintain uniform terminology across all AI agent interactions.
The Terminology Consistency Problem
When an AI agent translates "escalation" as "escalacion" in one response and "derivacion" in the next, users lose trust. Inconsistent terminology makes the agent feel unreliable and creates confusion, especially in domain-specific contexts like healthcare, legal, or financial services where precise terms carry regulatory weight.
Translation memory (TM) solves this by storing approved translations of terms and phrases, then enforcing their reuse across all agent interactions. This is a standard practice in the professional translation industry, and it applies directly to AI agents.
Term Glossary Data Model
The foundation of translation memory is a structured glossary that maps source terms to approved translations per language.
from dataclasses import dataclass, field
from typing import Dict, List, Optional
from datetime import datetime
@dataclass
class GlossaryEntry:
term_id: str
source_term: str
source_lang: str
translations: Dict[str, str] # lang_code -> approved translation
domain: str # e.g., "medical", "legal", "general"
context_note: str = ""
do_not_translate: bool = False # Brand names, product names
created_at: str = ""
updated_at: str = ""
@dataclass
class Glossary:
entries: List[GlossaryEntry] = field(default_factory=list)
_index: Dict[str, Dict[str, GlossaryEntry]] = field(default_factory=dict)
def add_entry(self, entry: GlossaryEntry) -> None:
self.entries.append(entry)
# Index by source language and lowercase term
lang_index = self._index.setdefault(entry.source_lang, {})
lang_index[entry.source_term.lower()] = entry
def lookup(self, term: str, source_lang: str = "en") -> Optional[GlossaryEntry]:
lang_index = self._index.get(source_lang, {})
return lang_index.get(term.lower())
def get_translation(self, term: str, target_lang: str, source_lang: str = "en") -> Optional[str]:
entry = self.lookup(term, source_lang)
if not entry:
return None
if entry.do_not_translate:
return entry.source_term # Return as-is
return entry.translations.get(target_lang)
Translation Cache with Fuzzy Matching
Beyond exact term matches, cache full phrase translations and use fuzzy matching to find similar previously translated segments.
from difflib import SequenceMatcher
from typing import Tuple
@dataclass
class TranslationSegment:
source_text: str
source_lang: str
target_text: str
target_lang: str
match_score: float # 1.0 for exact, lower for fuzzy
domain: str
last_used: str
use_count: int = 0
class TranslationMemoryStore:
def __init__(self, fuzzy_threshold: float = 0.75):
self.segments: List[TranslationSegment] = []
self.fuzzy_threshold = fuzzy_threshold
self._exact_index: Dict[str, TranslationSegment] = {}
def add_segment(self, segment: TranslationSegment) -> None:
key = f"{segment.source_lang}:{segment.target_lang}:{segment.source_text.lower()}"
self._exact_index[key] = segment
self.segments.append(segment)
def find_match(
self, source: str, source_lang: str, target_lang: str
) -> Optional[TranslationSegment]:
# Try exact match first
key = f"{source_lang}:{target_lang}:{source.lower()}"
exact = self._exact_index.get(key)
if exact:
exact.use_count += 1
return exact
# Fuzzy match
best_match: Optional[TranslationSegment] = None
best_score = 0.0
for seg in self.segments:
if seg.source_lang != source_lang or seg.target_lang != target_lang:
continue
score = SequenceMatcher(None, source.lower(), seg.source_text.lower()).ratio()
if score > best_score and score >= self.fuzzy_threshold:
best_score = score
best_match = seg
if best_match:
# Return a copy with adjusted score
return TranslationSegment(
source_text=best_match.source_text,
source_lang=best_match.source_lang,
target_text=best_match.target_text,
target_lang=best_match.target_lang,
match_score=best_score,
domain=best_match.domain,
last_used=best_match.last_used,
use_count=best_match.use_count,
)
return None
Consistency Enforcement in Agent Responses
Before sending a response, scan it for terms that have glossary entries and verify they use the approved translation.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import re
class ConsistencyEnforcer:
def __init__(self, glossary: Glossary):
self.glossary = glossary
def check_response(self, response: str, target_lang: str) -> dict:
"""Check response for terminology consistency violations."""
violations = []
suggestions = []
for entry in self.glossary.entries:
approved = entry.translations.get(target_lang)
if not approved:
continue
# Check if source term appears untranslated
if entry.source_term.lower() in response.lower() and not entry.do_not_translate:
violations.append({
"term": entry.source_term,
"expected": approved,
"issue": "source term used instead of translation",
})
return {
"consistent": len(violations) == 0,
"violations": violations,
"total_checked": len(self.glossary.entries),
}
def enforce(self, response: str, target_lang: str) -> str:
"""Replace inconsistent terminology with approved translations."""
result = response
for entry in self.glossary.entries:
if entry.do_not_translate:
continue
approved = entry.translations.get(target_lang)
if not approved:
continue
# Case-insensitive replacement of source terms
pattern = re.compile(re.escape(entry.source_term), re.IGNORECASE)
result = pattern.sub(approved, result)
return result
Glossary-Augmented Translation Prompts
When using an LLM for translation, inject the glossary into the prompt to guide consistent term usage.
class GlossaryAugmentedTranslator:
def __init__(self, client, glossary: Glossary):
self.client = client
self.glossary = glossary
def _build_glossary_context(self, text: str, target_lang: str) -> str:
"""Extract relevant glossary entries for the text being translated."""
relevant = []
for entry in self.glossary.entries:
if entry.source_term.lower() in text.lower():
trans = entry.translations.get(target_lang)
if trans:
note = f" ({entry.context_note})" if entry.context_note else ""
if entry.do_not_translate:
relevant.append(f"- '{entry.source_term}' -> DO NOT TRANSLATE (keep as-is)")
else:
relevant.append(f"- '{entry.source_term}' -> '{trans}'{note}")
if not relevant:
return ""
return "MANDATORY GLOSSARY (use these exact translations):\n" + "\n".join(relevant)
async def translate(self, text: str, source_lang: str, target_lang: str) -> str:
glossary_ctx = self._build_glossary_context(text, target_lang)
system_msg = f"Translate from {source_lang} to {target_lang}."
if glossary_ctx:
system_msg += f"\n\n{glossary_ctx}"
system_msg += "\nPreserve formatting and code blocks."
resp = await self.client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_msg},
{"role": "user", "content": text},
],
temperature=0.1,
)
return resp.choices[0].message.content or ""
Glossary Updates and Versioning
Glossaries evolve as products change. Maintain version history to understand when and why terms were updated.
@dataclass
class GlossaryChange:
term_id: str
field_changed: str
old_value: str
new_value: str
changed_by: str
changed_at: str
reason: str
class VersionedGlossary:
def __init__(self, glossary: Glossary):
self.glossary = glossary
self.changelog: List[GlossaryChange] = []
def update_translation(
self, term_id: str, target_lang: str, new_translation: str,
changed_by: str, reason: str
) -> None:
entry = None
for e in self.glossary.entries:
if e.term_id == term_id:
entry = e
break
if not entry:
raise ValueError(f"Term {term_id} not found")
old_value = entry.translations.get(target_lang, "")
self.changelog.append(GlossaryChange(
term_id=term_id,
field_changed=f"translations.{target_lang}",
old_value=old_value,
new_value=new_translation,
changed_by=changed_by,
changed_at=datetime.utcnow().isoformat(),
reason=reason,
))
entry.translations[target_lang] = new_translation
entry.updated_at = datetime.utcnow().isoformat()
FAQ
How large should my glossary be before it impacts translation quality?
Start with 50-100 high-impact domain terms. Glossaries up to 500 entries work well when injected into LLM translation prompts. Beyond that, filter to only include entries relevant to the specific text being translated (as shown in the _build_glossary_context method) to avoid overwhelming the model's context window.
Should I store the translation memory in a database or in files?
For small-to-medium agents (under 10,000 segments), JSON files versioned in Git work well and keep the translation memory auditable. For larger systems, use a database (PostgreSQL with trigram indexes for fuzzy matching) and expose the TM through an internal API. The key requirement is that translators and developers can both access and update it.
How do I handle terms that have multiple valid translations depending on context?
Add context tags to glossary entries. For example, "account" in a banking context translates differently than "account" in a user authentication context. The consistency enforcer should match on both the term and the context tag. When context is ambiguous, flag the term for human review rather than auto-replacing.
#TranslationMemory #TerminologyManagement #Consistency #AIAgents #Localization #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.