Skip to content
Learn Agentic AI14 min read0 views

Build a Language Translation Agent: Multi-Language Support with Context Awareness

Create an AI translation agent that translates between multiple languages while preserving context, manages terminology databases for domain-specific vocabulary, and performs quality checks on translations.

Why Build a Translation Agent

Machine translation has improved dramatically, but raw translation APIs still struggle with context, domain terminology, and nuance. A translation agent wraps translation capabilities with context management, terminology databases, and quality checking. It remembers the subject matter of your conversation, applies domain-specific vocabulary correctly, and flags potential issues before delivering the final translation.

This tutorial builds a multi-language translation agent with mock translation, a terminology database, context tracking, and quality validation.

Project Setup

mkdir translation-agent && cd translation-agent
python -m venv venv && source venv/bin/activate
pip install openai-agents pydantic
mkdir -p src
touch src/__init__.py src/translator.py src/terminology.py
touch src/quality.py src/agent.py

Step 1: Build the Translation Engine

We simulate translation with a dictionary-based approach. In production, replace this with calls to Google Translate, DeepL, or AWS Translate APIs.

# src/translator.py
from pydantic import BaseModel

class TranslationResult(BaseModel):
    source_lang: str
    target_lang: str
    original: str
    translated: str
    confidence: float

SUPPORTED_LANGUAGES = [
    "english", "spanish", "french", "german",
    "japanese", "portuguese", "italian",
]

# Simple word-level mock translations for demonstration
MOCK_TRANSLATIONS: dict[str, dict[str, str]] = {
    "english->spanish": {
        "hello": "hola", "world": "mundo", "how": "como",
        "are": "estas", "you": "tu", "good": "bueno",
        "morning": "manana", "thank": "gracias", "please": "por favor",
        "the": "el", "is": "es", "and": "y",
        "software": "software", "database": "base de datos",
        "server": "servidor", "network": "red",
        "meeting": "reunion", "report": "informe",
    },
    "english->french": {
        "hello": "bonjour", "world": "monde", "how": "comment",
        "are": "allez", "you": "vous", "good": "bon",
        "morning": "matin", "thank": "merci", "please": "s'il vous plait",
        "the": "le", "is": "est", "and": "et",
        "software": "logiciel", "database": "base de donnees",
        "server": "serveur", "network": "reseau",
        "meeting": "reunion", "report": "rapport",
    },
}

class TranslationContext:
    """Tracks conversation context for better translations."""
    def __init__(self):
        self.domain: str = "general"
        self.previous_translations: list[TranslationResult] = []
        self.source_lang: str = "english"
        self.target_lang: str = "spanish"

    def set_context(self, domain: str, source: str, target: str):
        self.domain = domain
        self.source_lang = source.lower()
        self.target_lang = target.lower()

    def add_translation(self, result: TranslationResult):
        self.previous_translations.append(result)
        if len(self.previous_translations) > 20:
            self.previous_translations.pop(0)

context = TranslationContext()

def translate_text(
    text: str,
    source_lang: str = "",
    target_lang: str = "",
) -> TranslationResult:
    src = source_lang.lower() or context.source_lang
    tgt = target_lang.lower() or context.target_lang
    pair_key = f"{src}->{tgt}"

    word_map = MOCK_TRANSLATIONS.get(pair_key, {})
    words = text.lower().split()
    translated_words = [word_map.get(w, w) for w in words]
    translated = " ".join(translated_words)

    known = sum(1 for w in words if w in word_map)
    confidence = known / len(words) if words else 0.0

    result = TranslationResult(
        source_lang=src,
        target_lang=tgt,
        original=text,
        translated=translated,
        confidence=round(confidence, 2),
    )
    context.add_translation(result)
    return result

Step 2: Terminology Database

Domain-specific terms need consistent translations. A terminology database ensures "server" always translates to "servidor" in IT context, not "camarero" (waiter).

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

# src/terminology.py
from pydantic import BaseModel

class TermEntry(BaseModel):
    term: str
    translations: dict[str, str]  # lang -> translation
    domain: str
    notes: str = ""

class TerminologyDB:
    def __init__(self):
        self.entries: dict[str, TermEntry] = {}
        self._load_defaults()

    def _load_defaults(self):
        defaults = [
            TermEntry(
                term="server",
                translations={
                    "spanish": "servidor",
                    "french": "serveur",
                },
                domain="technology",
                notes="Computing context, not restaurant",
            ),
            TermEntry(
                term="bug",
                translations={
                    "spanish": "error",
                    "french": "bogue",
                },
                domain="technology",
                notes="Software defect, not insect",
            ),
            TermEntry(
                term="cloud",
                translations={
                    "spanish": "nube",
                    "french": "nuage",
                },
                domain="technology",
                notes="Cloud computing context",
            ),
            TermEntry(
                term="sprint",
                translations={
                    "spanish": "sprint",
                    "french": "sprint",
                },
                domain="technology",
                notes="Agile methodology term, keep as-is",
            ),
        ]
        for entry in defaults:
            self.entries[entry.term.lower()] = entry

    def lookup(self, term: str, target_lang: str) -> str | None:
        entry = self.entries.get(term.lower())
        if entry:
            return entry.translations.get(target_lang.lower())
        return None

    def add_term(
        self, term: str, translations: dict[str, str],
        domain: str, notes: str = "",
    ) -> str:
        self.entries[term.lower()] = TermEntry(
            term=term, translations=translations,
            domain=domain, notes=notes,
        )
        return f"Added term '{term}' to terminology database"

    def list_terms(self, domain: str = "") -> str:
        entries = list(self.entries.values())
        if domain:
            entries = [e for e in entries if e.domain == domain]
        if not entries:
            return "No terms found."
        lines = []
        for e in entries:
            trans = ", ".join(
                f"{lang}: {word}"
                for lang, word in e.translations.items()
            )
            lines.append(f"  {e.term} [{e.domain}]: {trans}")
            if e.notes:
                lines.append(f"    Note: {e.notes}")
        return "\n".join(lines)

term_db = TerminologyDB()

Step 3: Quality Checker

# src/quality.py
from src.translator import TranslationResult

def check_quality(result: TranslationResult) -> dict:
    issues = []
    if result.confidence < 0.3:
        issues.append(
            "Low confidence: many words were not found in "
            "translation dictionary. Consider manual review."
        )
    if result.original.lower() == result.translated.lower():
        issues.append(
            "Translation identical to source. The text may "
            "already be in the target language or untranslatable."
        )
    if len(result.translated.split()) < len(result.original.split()) * 0.5:
        issues.append(
            "Translation significantly shorter than source. "
            "Some content may be lost."
        )
    return {
        "confidence": result.confidence,
        "issues": issues if issues else ["No issues detected."],
        "recommendation": (
            "Manual review recommended"
            if issues else "Translation looks good"
        ),
    }

Step 4: Assemble the Agent

# src/agent.py
import asyncio
import json
from agents import Agent, Runner, function_tool
from src.translator import translate_text, context, SUPPORTED_LANGUAGES
from src.terminology import term_db
from src.quality import check_quality

@function_tool
def translate(
    text: str, source_lang: str = "", target_lang: str = "",
) -> str:
    """Translate text between languages."""
    result = translate_text(text, source_lang, target_lang)
    quality = check_quality(result)
    return json.dumps({
        "original": result.original,
        "translated": result.translated,
        "confidence": result.confidence,
        "quality": quality,
    }, indent=2)

@function_tool
def set_translation_context(
    domain: str, source_lang: str, target_lang: str,
) -> str:
    """Set the translation context for the session."""
    context.set_context(domain, source_lang, target_lang)
    return f"Context set: {domain} domain, {source_lang} -> {target_lang}"

@function_tool
def lookup_term(term: str, target_lang: str = "") -> str:
    """Look up domain-specific terminology."""
    tgt = target_lang or context.target_lang
    result = term_db.lookup(term, tgt)
    if result:
        return f"'{term}' -> '{result}' in {tgt}"
    return f"Term '{term}' not found in terminology database"

@function_tool
def add_terminology(
    term: str, translations_json: str,
    domain: str, notes: str = "",
) -> str:
    """Add a term to the terminology database."""
    translations = json.loads(translations_json)
    return term_db.add_term(term, translations, domain, notes)

@function_tool
def list_supported_languages() -> str:
    """List supported languages."""
    return ", ".join(SUPPORTED_LANGUAGES)

translation_agent = Agent(
    name="Translation Agent",
    instructions="""You are a professional translation agent.
Translate text while preserving context and using correct
domain terminology. Always check quality after translating.
Use the terminology database for technical or specialized terms.
If confidence is low, warn the user and suggest alternatives.""",
    tools=[
        translate, set_translation_context,
        lookup_term, add_terminology,
        list_supported_languages,
    ],
)

async def main():
    result = await Runner.run(
        translation_agent,
        "Set context to technology domain, English to Spanish. "
        "Then translate: 'The server has a critical bug in "
        "the cloud deployment pipeline.'",
    )
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())

The agent sets the technology domain context, looks up "server," "bug," and "cloud" in the terminology database to get the correct technical translations, translates the full sentence, and runs a quality check.

FAQ

How do I replace the mock translator with a real translation API?

Install the googletrans library or use the official Google Cloud Translation or DeepL API. Replace the translate_text function body with an API call that sends the text, source language, and target language. Keep the TranslationResult model as the return type so the quality checker and context tracker continue to work without changes.

How does context awareness improve translation quality?

Context tracking ensures that when translating a series of related sentences, the agent remembers the domain and previous translations. This prevents inconsistencies like translating "server" as "servidor" in one sentence and "camarero" in the next. The terminology database enforces consistent vocabulary within a domain.

Can this handle document-level translation?

Yes. Split the document into paragraphs, translate each one sequentially while maintaining the context object, and reassemble the output. The context tracker accumulates domain signals across paragraphs, so translations improve as the agent processes more of the document and builds a stronger understanding of the subject matter.


#Translation #NLP #AIAgent #Python #MultiLanguage #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.