Build a Language Translation Agent: Multi-Language Support with Context Awareness
Create an AI translation agent that translates between multiple languages while preserving context, manages terminology databases for domain-specific vocabulary, and performs quality checks on translations.
Why Build a Translation Agent
Machine translation has improved dramatically, but raw translation APIs still struggle with context, domain terminology, and nuance. A translation agent wraps translation capabilities with context management, terminology databases, and quality checking. It remembers the subject matter of your conversation, applies domain-specific vocabulary correctly, and flags potential issues before delivering the final translation.
This tutorial builds a multi-language translation agent with mock translation, a terminology database, context tracking, and quality validation.
Project Setup
mkdir translation-agent && cd translation-agent
python -m venv venv && source venv/bin/activate
pip install openai-agents pydantic
mkdir -p src
touch src/__init__.py src/translator.py src/terminology.py
touch src/quality.py src/agent.py
Step 1: Build the Translation Engine
We simulate translation with a dictionary-based approach. In production, replace this with calls to Google Translate, DeepL, or AWS Translate APIs.
# src/translator.py
from pydantic import BaseModel
class TranslationResult(BaseModel):
source_lang: str
target_lang: str
original: str
translated: str
confidence: float
SUPPORTED_LANGUAGES = [
"english", "spanish", "french", "german",
"japanese", "portuguese", "italian",
]
# Simple word-level mock translations for demonstration
MOCK_TRANSLATIONS: dict[str, dict[str, str]] = {
"english->spanish": {
"hello": "hola", "world": "mundo", "how": "como",
"are": "estas", "you": "tu", "good": "bueno",
"morning": "manana", "thank": "gracias", "please": "por favor",
"the": "el", "is": "es", "and": "y",
"software": "software", "database": "base de datos",
"server": "servidor", "network": "red",
"meeting": "reunion", "report": "informe",
},
"english->french": {
"hello": "bonjour", "world": "monde", "how": "comment",
"are": "allez", "you": "vous", "good": "bon",
"morning": "matin", "thank": "merci", "please": "s'il vous plait",
"the": "le", "is": "est", "and": "et",
"software": "logiciel", "database": "base de donnees",
"server": "serveur", "network": "reseau",
"meeting": "reunion", "report": "rapport",
},
}
class TranslationContext:
"""Tracks conversation context for better translations."""
def __init__(self):
self.domain: str = "general"
self.previous_translations: list[TranslationResult] = []
self.source_lang: str = "english"
self.target_lang: str = "spanish"
def set_context(self, domain: str, source: str, target: str):
self.domain = domain
self.source_lang = source.lower()
self.target_lang = target.lower()
def add_translation(self, result: TranslationResult):
self.previous_translations.append(result)
if len(self.previous_translations) > 20:
self.previous_translations.pop(0)
context = TranslationContext()
def translate_text(
text: str,
source_lang: str = "",
target_lang: str = "",
) -> TranslationResult:
src = source_lang.lower() or context.source_lang
tgt = target_lang.lower() or context.target_lang
pair_key = f"{src}->{tgt}"
word_map = MOCK_TRANSLATIONS.get(pair_key, {})
words = text.lower().split()
translated_words = [word_map.get(w, w) for w in words]
translated = " ".join(translated_words)
known = sum(1 for w in words if w in word_map)
confidence = known / len(words) if words else 0.0
result = TranslationResult(
source_lang=src,
target_lang=tgt,
original=text,
translated=translated,
confidence=round(confidence, 2),
)
context.add_translation(result)
return result
Step 2: Terminology Database
Domain-specific terms need consistent translations. A terminology database ensures "server" always translates to "servidor" in IT context, not "camarero" (waiter).
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
# src/terminology.py
from pydantic import BaseModel
class TermEntry(BaseModel):
term: str
translations: dict[str, str] # lang -> translation
domain: str
notes: str = ""
class TerminologyDB:
def __init__(self):
self.entries: dict[str, TermEntry] = {}
self._load_defaults()
def _load_defaults(self):
defaults = [
TermEntry(
term="server",
translations={
"spanish": "servidor",
"french": "serveur",
},
domain="technology",
notes="Computing context, not restaurant",
),
TermEntry(
term="bug",
translations={
"spanish": "error",
"french": "bogue",
},
domain="technology",
notes="Software defect, not insect",
),
TermEntry(
term="cloud",
translations={
"spanish": "nube",
"french": "nuage",
},
domain="technology",
notes="Cloud computing context",
),
TermEntry(
term="sprint",
translations={
"spanish": "sprint",
"french": "sprint",
},
domain="technology",
notes="Agile methodology term, keep as-is",
),
]
for entry in defaults:
self.entries[entry.term.lower()] = entry
def lookup(self, term: str, target_lang: str) -> str | None:
entry = self.entries.get(term.lower())
if entry:
return entry.translations.get(target_lang.lower())
return None
def add_term(
self, term: str, translations: dict[str, str],
domain: str, notes: str = "",
) -> str:
self.entries[term.lower()] = TermEntry(
term=term, translations=translations,
domain=domain, notes=notes,
)
return f"Added term '{term}' to terminology database"
def list_terms(self, domain: str = "") -> str:
entries = list(self.entries.values())
if domain:
entries = [e for e in entries if e.domain == domain]
if not entries:
return "No terms found."
lines = []
for e in entries:
trans = ", ".join(
f"{lang}: {word}"
for lang, word in e.translations.items()
)
lines.append(f" {e.term} [{e.domain}]: {trans}")
if e.notes:
lines.append(f" Note: {e.notes}")
return "\n".join(lines)
term_db = TerminologyDB()
Step 3: Quality Checker
# src/quality.py
from src.translator import TranslationResult
def check_quality(result: TranslationResult) -> dict:
issues = []
if result.confidence < 0.3:
issues.append(
"Low confidence: many words were not found in "
"translation dictionary. Consider manual review."
)
if result.original.lower() == result.translated.lower():
issues.append(
"Translation identical to source. The text may "
"already be in the target language or untranslatable."
)
if len(result.translated.split()) < len(result.original.split()) * 0.5:
issues.append(
"Translation significantly shorter than source. "
"Some content may be lost."
)
return {
"confidence": result.confidence,
"issues": issues if issues else ["No issues detected."],
"recommendation": (
"Manual review recommended"
if issues else "Translation looks good"
),
}
Step 4: Assemble the Agent
# src/agent.py
import asyncio
import json
from agents import Agent, Runner, function_tool
from src.translator import translate_text, context, SUPPORTED_LANGUAGES
from src.terminology import term_db
from src.quality import check_quality
@function_tool
def translate(
text: str, source_lang: str = "", target_lang: str = "",
) -> str:
"""Translate text between languages."""
result = translate_text(text, source_lang, target_lang)
quality = check_quality(result)
return json.dumps({
"original": result.original,
"translated": result.translated,
"confidence": result.confidence,
"quality": quality,
}, indent=2)
@function_tool
def set_translation_context(
domain: str, source_lang: str, target_lang: str,
) -> str:
"""Set the translation context for the session."""
context.set_context(domain, source_lang, target_lang)
return f"Context set: {domain} domain, {source_lang} -> {target_lang}"
@function_tool
def lookup_term(term: str, target_lang: str = "") -> str:
"""Look up domain-specific terminology."""
tgt = target_lang or context.target_lang
result = term_db.lookup(term, tgt)
if result:
return f"'{term}' -> '{result}' in {tgt}"
return f"Term '{term}' not found in terminology database"
@function_tool
def add_terminology(
term: str, translations_json: str,
domain: str, notes: str = "",
) -> str:
"""Add a term to the terminology database."""
translations = json.loads(translations_json)
return term_db.add_term(term, translations, domain, notes)
@function_tool
def list_supported_languages() -> str:
"""List supported languages."""
return ", ".join(SUPPORTED_LANGUAGES)
translation_agent = Agent(
name="Translation Agent",
instructions="""You are a professional translation agent.
Translate text while preserving context and using correct
domain terminology. Always check quality after translating.
Use the terminology database for technical or specialized terms.
If confidence is low, warn the user and suggest alternatives.""",
tools=[
translate, set_translation_context,
lookup_term, add_terminology,
list_supported_languages,
],
)
async def main():
result = await Runner.run(
translation_agent,
"Set context to technology domain, English to Spanish. "
"Then translate: 'The server has a critical bug in "
"the cloud deployment pipeline.'",
)
print(result.final_output)
if __name__ == "__main__":
asyncio.run(main())
The agent sets the technology domain context, looks up "server," "bug," and "cloud" in the terminology database to get the correct technical translations, translates the full sentence, and runs a quality check.
FAQ
How do I replace the mock translator with a real translation API?
Install the googletrans library or use the official Google Cloud Translation or DeepL API. Replace the translate_text function body with an API call that sends the text, source language, and target language. Keep the TranslationResult model as the return type so the quality checker and context tracker continue to work without changes.
How does context awareness improve translation quality?
Context tracking ensures that when translating a series of related sentences, the agent remembers the domain and previous translations. This prevents inconsistencies like translating "server" as "servidor" in one sentence and "camarero" in the next. The terminology database enforces consistent vocabulary within a domain.
Can this handle document-level translation?
Yes. Split the document into paragraphs, translate each one sequentially while maintaining the context object, and reassemble the output. The context tracker accumulates domain signals across paragraphs, so translations improve as the agent processes more of the document and builds a stronger understanding of the subject matter.
#Translation #NLP #AIAgent #Python #MultiLanguage #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.