Building a Translation Memory for AI Agents: Consistent Terminology Across Interactions

The Terminology Consistency Problem

When an AI agent translates "escalation" as "escalacion" in one response and "derivacion" in the next, users lose trust. Inconsistent terminology makes the agent feel unreliable and creates confusion, especially in domain-specific contexts like healthcare, legal, or financial services where precise terms carry regulatory weight.

Translation memory (TM) solves this by storing approved translations of terms and phrases, then enforcing their reuse across all agent interactions. This is a standard practice in the professional translation industry, and it applies directly to AI agents.

Term Glossary Data Model

The foundation of translation memory is a structured glossary that maps source terms to approved translations per language.

from dataclasses import dataclass, field
from typing import Dict, List, Optional
from datetime import datetime

@dataclass
class GlossaryEntry:
    term_id: str
    source_term: str
    source_lang: str
    translations: Dict[str, str]  # lang_code -> approved translation
    domain: str  # e.g., "medical", "legal", "general"
    context_note: str = ""
    do_not_translate: bool = False  # Brand names, product names
    created_at: str = ""
    updated_at: str = ""

@dataclass
class Glossary:
    entries: List[GlossaryEntry] = field(default_factory=list)
    _index: Dict[str, Dict[str, GlossaryEntry]] = field(default_factory=dict)

    def add_entry(self, entry: GlossaryEntry) -> None:
        self.entries.append(entry)
        # Index by source language and lowercase term
        lang_index = self._index.setdefault(entry.source_lang, {})
        lang_index[entry.source_term.lower()] = entry

    def lookup(self, term: str, source_lang: str = "en") -> Optional[GlossaryEntry]:
        lang_index = self._index.get(source_lang, {})
        return lang_index.get(term.lower())

    def get_translation(self, term: str, target_lang: str, source_lang: str = "en") -> Optional[str]:
        entry = self.lookup(term, source_lang)
        if not entry:
            return None
        if entry.do_not_translate:
            return entry.source_term  # Return as-is
        return entry.translations.get(target_lang)

Translation Cache with Fuzzy Matching

Beyond exact term matches, cache full phrase translations and use fuzzy matching to find similar previously translated segments.

from difflib import SequenceMatcher
from typing import Tuple

@dataclass
class TranslationSegment:
    source_text: str
    source_lang: str
    target_text: str
    target_lang: str
    match_score: float  # 1.0 for exact, lower for fuzzy
    domain: str
    last_used: str
    use_count: int = 0

class TranslationMemoryStore:
    def __init__(self, fuzzy_threshold: float = 0.75):
        self.segments: List[TranslationSegment] = []
        self.fuzzy_threshold = fuzzy_threshold
        self._exact_index: Dict[str, TranslationSegment] = {}

    def add_segment(self, segment: TranslationSegment) -> None:
        key = f"{segment.source_lang}:{segment.target_lang}:{segment.source_text.lower()}"
        self._exact_index[key] = segment
        self.segments.append(segment)

    def find_match(
        self, source: str, source_lang: str, target_lang: str
    ) -> Optional[TranslationSegment]:
        # Try exact match first
        key = f"{source_lang}:{target_lang}:{source.lower()}"
        exact = self._exact_index.get(key)
        if exact:
            exact.use_count += 1
            return exact

        # Fuzzy match
        best_match: Optional[TranslationSegment] = None
        best_score = 0.0
        for seg in self.segments:
            if seg.source_lang != source_lang or seg.target_lang != target_lang:
                continue
            score = SequenceMatcher(None, source.lower(), seg.source_text.lower()).ratio()
            if score > best_score and score >= self.fuzzy_threshold:
                best_score = score
                best_match = seg

        if best_match:
            # Return a copy with adjusted score
            return TranslationSegment(
                source_text=best_match.source_text,
                source_lang=best_match.source_lang,
                target_text=best_match.target_text,
                target_lang=best_match.target_lang,
                match_score=best_score,
                domain=best_match.domain,
                last_used=best_match.last_used,
                use_count=best_match.use_count,
            )
        return None

Consistency Enforcement in Agent Responses

Before sending a response, scan it for terms that have glossary entries and verify they use the approved translation.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

import re

class ConsistencyEnforcer:
    def __init__(self, glossary: Glossary):
        self.glossary = glossary

    def check_response(self, response: str, target_lang: str) -> dict:
        """Check response for terminology consistency violations."""
        violations = []
        suggestions = []

        for entry in self.glossary.entries:
            approved = entry.translations.get(target_lang)
            if not approved:
                continue

            # Check if source term appears untranslated
            if entry.source_term.lower() in response.lower() and not entry.do_not_translate:
                violations.append({
                    "term": entry.source_term,
                    "expected": approved,
                    "issue": "source term used instead of translation",
                })

        return {
            "consistent": len(violations) == 0,
            "violations": violations,
            "total_checked": len(self.glossary.entries),
        }

    def enforce(self, response: str, target_lang: str) -> str:
        """Replace inconsistent terminology with approved translations."""
        result = response
        for entry in self.glossary.entries:
            if entry.do_not_translate:
                continue
            approved = entry.translations.get(target_lang)
            if not approved:
                continue
            # Case-insensitive replacement of source terms
            pattern = re.compile(re.escape(entry.source_term), re.IGNORECASE)
            result = pattern.sub(approved, result)
        return result

Glossary-Augmented Translation Prompts

When using an LLM for translation, inject the glossary into the prompt to guide consistent term usage.

class GlossaryAugmentedTranslator:
    def __init__(self, client, glossary: Glossary):
        self.client = client
        self.glossary = glossary

    def _build_glossary_context(self, text: str, target_lang: str) -> str:
        """Extract relevant glossary entries for the text being translated."""
        relevant = []
        for entry in self.glossary.entries:
            if entry.source_term.lower() in text.lower():
                trans = entry.translations.get(target_lang)
                if trans:
                    note = f" ({entry.context_note})" if entry.context_note else ""
                    if entry.do_not_translate:
                        relevant.append(f"- '{entry.source_term}' -> DO NOT TRANSLATE (keep as-is)")
                    else:
                        relevant.append(f"- '{entry.source_term}' -> '{trans}'{note}")
        if not relevant:
            return ""
        return "MANDATORY GLOSSARY (use these exact translations):\n" + "\n".join(relevant)

    async def translate(self, text: str, source_lang: str, target_lang: str) -> str:
        glossary_ctx = self._build_glossary_context(text, target_lang)
        system_msg = f"Translate from {source_lang} to {target_lang}."
        if glossary_ctx:
            system_msg += f"\n\n{glossary_ctx}"
        system_msg += "\nPreserve formatting and code blocks."

        resp = await self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": system_msg},
                {"role": "user", "content": text},
            ],
            temperature=0.1,
        )
        return resp.choices[0].message.content or ""

Glossary Updates and Versioning

Glossaries evolve as products change. Maintain version history to understand when and why terms were updated.

@dataclass
class GlossaryChange:
    term_id: str
    field_changed: str
    old_value: str
    new_value: str
    changed_by: str
    changed_at: str
    reason: str

class VersionedGlossary:
    def __init__(self, glossary: Glossary):
        self.glossary = glossary
        self.changelog: List[GlossaryChange] = []

    def update_translation(
        self, term_id: str, target_lang: str, new_translation: str,
        changed_by: str, reason: str
    ) -> None:
        entry = None
        for e in self.glossary.entries:
            if e.term_id == term_id:
                entry = e
                break
        if not entry:
            raise ValueError(f"Term {term_id} not found")

        old_value = entry.translations.get(target_lang, "")
        self.changelog.append(GlossaryChange(
            term_id=term_id,
            field_changed=f"translations.{target_lang}",
            old_value=old_value,
            new_value=new_translation,
            changed_by=changed_by,
            changed_at=datetime.utcnow().isoformat(),
            reason=reason,
        ))
        entry.translations[target_lang] = new_translation
        entry.updated_at = datetime.utcnow().isoformat()

FAQ

How large should my glossary be before it impacts translation quality?

Start with 50-100 high-impact domain terms. Glossaries up to 500 entries work well when injected into LLM translation prompts. Beyond that, filter to only include entries relevant to the specific text being translated (as shown in the _build_glossary_context method) to avoid overwhelming the model's context window.

Should I store the translation memory in a database or in files?

For small-to-medium agents (under 10,000 segments), JSON files versioned in Git work well and keep the translation memory auditable. For larger systems, use a database (PostgreSQL with trigram indexes for fuzzy matching) and expose the TM through an internal API. The key requirement is that translators and developers can both access and update it.

How do I handle terms that have multiple valid translations depending on context?

Add context tags to glossary entries. For example, "account" in a banking context translates differently than "account" in a user authentication context. The consistency enforcer should match on both the term and the context tag. When context is ambiguous, flag the term for human review rather than auto-replacing.

#TranslationMemory #TerminologyManagement #Consistency #AIAgents #Localization #AgenticAI #LearnAI #AIEngineering

Building a Translation Memory for AI Agents: Consistent Terminology Across Interactions

The Terminology Consistency Problem

Term Glossary Data Model

Translation Cache with Fuzzy Matching

Consistency Enforcement in Agent Responses

Glossary-Augmented Translation Prompts

Glossary Updates and Versioning

FAQ

How large should my glossary be before it impacts translation quality?

Should I store the translation memory in a database or in files?

How do I handle terms that have multiple valid translations depending on context?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding