Prompt Migration: Adapting Prompts When Switching Between LLM Providers

Why Prompt Migration is Harder Than It Looks

Switching from OpenAI to Anthropic or from Claude to Gemini seems like it should be straightforward — just point to a different API. In practice, every provider has different strengths, quirks in how they follow instructions, varying system prompt conventions, and different optimal prompting patterns.

A prompt that works perfectly with GPT-4o might produce verbose, off-topic responses from Claude if you copy it verbatim. Migration is not a find-and-replace operation. It is a systematic adaptation process.

Understanding Provider Differences

Before migrating, map out the key differences between your source and target providers.

from dataclasses import dataclass, field

@dataclass
class ProviderProfile:
    name: str
    system_prompt_support: str  # "full", "limited", "none"
    max_system_prompt_tokens: int
    optimal_instruction_style: str
    strengths: list[str]
    quirks: list[str]
    message_format: str  # "openai", "anthropic", "google"

PROVIDER_PROFILES = {
    "openai": ProviderProfile(
        name="OpenAI (GPT-4o)",
        system_prompt_support="full",
        max_system_prompt_tokens=16000,
        optimal_instruction_style="directive",
        strengths=[
            "Follows structured output formats well",
            "Strong at multi-step reasoning",
        ],
        quirks=[
            "May add unsolicited caveats",
            "Tends toward verbose responses by default",
        ],
        message_format="openai",
    ),
    "anthropic": ProviderProfile(
        name="Anthropic (Claude)",
        system_prompt_support="full",
        max_system_prompt_tokens=32000,
        optimal_instruction_style="conversational_directive",
        strengths=[
            "Excellent at following nuanced instructions",
            "Strong long-context performance",
        ],
        quirks=[
            "Prefers explicit permission over implicit",
            "Benefits from examples in prompts",
        ],
        message_format="anthropic",
    ),
    "google": ProviderProfile(
        name="Google (Gemini)",
        system_prompt_support="full",
        max_system_prompt_tokens=8000,
        optimal_instruction_style="structured",
        strengths=[
            "Strong at multi-modal tasks",
            "Good at grounded factual responses",
        ],
        quirks=[
            "System instruction handling differs from chat",
            "May need more explicit formatting guidance",
        ],
        message_format="google",
    ),
}

The Migration Checklist

Systematize the migration process to avoid missing critical adaptations.

@dataclass
class MigrationTask:
    description: str
    completed: bool = False
    notes: str = ""

class PromptMigrationChecklist:
    """Structured checklist for migrating prompts."""

    def __init__(
        self, source: str, target: str, prompt_name: str
    ):
        self.source = source
        self.target = target
        self.prompt_name = prompt_name
        self.tasks = self._build_checklist()

    def _build_checklist(self) -> list[MigrationTask]:
        return [
            MigrationTask(
                "Audit source prompt: document all behaviors, "
                "edge cases, and output format requirements"
            ),
            MigrationTask(
                "Map message format differences "
                f"({self.source} -> {self.target})"
            ),
            MigrationTask(
                "Adapt instruction style to target provider's "
                "optimal pattern"
            ),
            MigrationTask(
                "Adjust token limits and context window usage"
            ),
            MigrationTask(
                "Convert provider-specific features (tool format, "
                "structured output schema, etc.)"
            ),
            MigrationTask(
                "Run benchmark suite against target provider"
            ),
            MigrationTask(
                "Compare outputs side-by-side for 20+ test cases"
            ),
            MigrationTask(
                "Validate error handling and edge case behavior"
            ),
            MigrationTask(
                "Update monitoring and alerting for new provider"
            ),
            MigrationTask(
                "Run shadow traffic before full cutover"
            ),
        ]

    def report(self) -> str:
        total = len(self.tasks)
        done = sum(1 for t in self.tasks if t.completed)
        lines = [
            f"Migration: {self.prompt_name} "
            f"({self.source} -> {self.target})",
            f"Progress: {done}/{total}",
            "",
        ]
        for i, task in enumerate(self.tasks, 1):
            status = "x" if task.completed else " "
            lines.append(f"[{status}] {i}. {task.description}")
            if task.notes:
                lines.append(f"    Note: {task.notes}")
        return "\n".join(lines)

Provider Abstraction Layer

Build an abstraction that isolates your application from provider-specific details.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from abc import ABC, abstractmethod

@dataclass
class LLMResponse:
    text: str
    input_tokens: int
    output_tokens: int
    model: str
    latency_ms: float

class LLMProvider(ABC):
    """Abstract base for LLM providers."""

    @abstractmethod
    async def complete(
        self, system_prompt: str, messages: list[dict],
        temperature: float = 0.7, max_tokens: int = 1024,
    ) -> LLMResponse:
        pass

class OpenAIProvider(LLMProvider):
    def __init__(self, model: str = "gpt-4o"):
        from openai import AsyncOpenAI
        self.client = AsyncOpenAI()
        self.model = model

    async def complete(
        self, system_prompt, messages, temperature=0.7,
        max_tokens=1024
    ) -> LLMResponse:
        import time
        start = time.monotonic()
        formatted = [{"role": "system", "content": system_prompt}]
        formatted.extend(messages)
        response = await self.client.chat.completions.create(
            model=self.model, messages=formatted,
            temperature=temperature, max_tokens=max_tokens,
        )
        latency = (time.monotonic() - start) * 1000
        choice = response.choices[0]
        return LLMResponse(
            text=choice.message.content,
            input_tokens=response.usage.prompt_tokens,
            output_tokens=response.usage.completion_tokens,
            model=self.model, latency_ms=latency,
        )

class AnthropicProvider(LLMProvider):
    def __init__(self, model: str = "claude-sonnet-4-20250514"):
        from anthropic import AsyncAnthropic
        self.client = AsyncAnthropic()
        self.model = model

    async def complete(
        self, system_prompt, messages, temperature=0.7,
        max_tokens=1024
    ) -> LLMResponse:
        import time
        start = time.monotonic()
        response = await self.client.messages.create(
            model=self.model, system=system_prompt,
            messages=messages, temperature=temperature,
            max_tokens=max_tokens,
        )
        latency = (time.monotonic() - start) * 1000
        return LLMResponse(
            text=response.content[0].text,
            input_tokens=response.usage.input_tokens,
            output_tokens=response.usage.output_tokens,
            model=self.model, latency_ms=latency,
        )

Prompt Adaptation Patterns

Some prompts need structural changes, not just re-wording.

class PromptAdapter:
    """Adapt prompts for different provider conventions."""

    def adapt_for_anthropic(self, openai_prompt: str) -> str:
        """Adapt an OpenAI-style prompt for Claude."""
        adapted = openai_prompt
        # Claude responds better to explicit permissions
        adapted = adapted.replace(
            "You must not", "Please avoid"
        )
        # Claude benefits from explicit output format examples
        if "respond in JSON" in adapted.lower():
            adapted += (
                "\n\nHere is an example of the expected format:"
                "\n{\n  \"key\": \"value\"\n}"
            )
        return adapted

    def adapt_for_openai(self, anthropic_prompt: str) -> str:
        """Adapt an Anthropic-style prompt for GPT-4o."""
        adapted = anthropic_prompt
        # GPT-4o handles direct instructions well
        adapted = adapted.replace(
            "Please avoid", "Do not"
        )
        # Remove Anthropic-specific XML tag patterns
        import re
        adapted = re.sub(
            r'<(thinking|scratchpad)>.*?</\1>',
            '', adapted, flags=re.DOTALL
        )
        return adapted

Shadow Traffic Testing

Before cutting over, run both providers in parallel and compare.

import asyncio

class ShadowRunner:
    """Run prompts against source and target in parallel."""

    def __init__(
        self, source: LLMProvider, target: LLMProvider
    ):
        self.source = source
        self.target = target

    async def compare(
        self, system_prompt: str, messages: list[dict],
        source_prompt: str = None, target_prompt: str = None,
    ) -> dict:
        """Run both providers and compare outputs."""
        s_prompt = source_prompt or system_prompt
        t_prompt = target_prompt or system_prompt

        source_resp, target_resp = await asyncio.gather(
            self.source.complete(s_prompt, messages),
            self.target.complete(t_prompt, messages),
        )
        return {
            "source": {
                "text": source_resp.text,
                "tokens": source_resp.output_tokens,
                "latency_ms": source_resp.latency_ms,
            },
            "target": {
                "text": target_resp.text,
                "tokens": target_resp.output_tokens,
                "latency_ms": target_resp.latency_ms,
            },
        }

FAQ

How long does a typical prompt migration take?

For a single agent with a well-defined benchmark suite, expect 2-3 days of adaptation and testing. For a multi-agent system with complex interactions, budget 1-2 weeks. The migration itself is quick — the testing and tuning is what takes time.

Can I use the same prompt for all providers?

For simple prompts, a generic version may work across providers. For production agents with specific behavioral requirements, you almost always need provider-specific tuning. The abstraction layer lets your application code stay generic while the prompts themselves are adapted per provider.

What is the biggest risk during provider migration?

Subtle behavioral differences that existing tests do not catch. A model might follow formatting instructions perfectly but interpret ambiguous edge cases differently. Run your benchmark suite and also have humans review 50-100 real conversation samples from the new provider before full cutover.

#LLMMigration #ProviderAbstraction #PromptEngineering #AIArchitecture #MultiModel #AgenticAI #LearnAI #AIEngineering

Prompt Migration: Adapting Prompts When Switching Between LLM Providers

Why Prompt Migration is Harder Than It Looks

Understanding Provider Differences

The Migration Checklist

Provider Abstraction Layer

Prompt Adaptation Patterns

Shadow Traffic Testing

FAQ

How long does a typical prompt migration take?

Can I use the same prompt for all providers?

What is the biggest risk during provider migration?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding