7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

AI Coding Interviews in 2026: Not Your Father's LeetCode

The coding bar for AI roles has shifted dramatically. Anthropic doesn't ask LeetCode at all — they test progressive system building. Meta now has an AI-assisted coding round where you work with real AI tools. OpenAI's coding questions focus on practical ML implementation.

Here are 7 real coding questions from these companies, with the approaches that pass.

Important: Anthropic strictly prohibits AI assistance during live interviews. Meta explicitly provides AI tools. Know the rules before your interview.

HARD OpenAI Google DeepMind

Q1: Implement Multi-Head Attention From Scratch

The Task

Implement scaled dot-product multi-head attention using only basic PyTorch tensor operations. No nn.MultiheadAttention.

Solution Approach

import torch
import torch.nn as nn
import math

class MultiHeadAttention(nn.Module):
    def __init__(self, d_model: int, n_heads: int):
        super().__init__()
        assert d_model % n_heads == 0

        self.d_model = d_model
        self.n_heads = n_heads
        self.d_k = d_model // n_heads

        # Projection matrices
        self.W_q = nn.Linear(d_model, d_model, bias=False)
        self.W_k = nn.Linear(d_model, d_model, bias=False)
        self.W_v = nn.Linear(d_model, d_model, bias=False)
        self.W_o = nn.Linear(d_model, d_model, bias=False)

    def forward(self, x: torch.Tensor, mask: torch.Tensor = None):
        batch_size, seq_len, _ = x.shape

        # Project and reshape: (B, N, d) -> (B, h, N, d_k)
        Q = self.W_q(x).view(batch_size, seq_len, self.n_heads, self.d_k).transpose(1, 2)
        K = self.W_k(x).view(batch_size, seq_len, self.n_heads, self.d_k).transpose(1, 2)
        V = self.W_v(x).view(batch_size, seq_len, self.n_heads, self.d_k).transpose(1, 2)

        # Scaled dot-product attention
        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)

        # Apply causal mask if provided
        if mask is not None:
            scores = scores.masked_fill(mask == 0, float('-inf'))

        attn_weights = torch.softmax(scores, dim=-1)

        # Apply attention to values
        context = torch.matmul(attn_weights, V)  # (B, h, N, d_k)

        # Reshape back: (B, h, N, d_k) -> (B, N, d)
        context = context.transpose(1, 2).contiguous().view(batch_size, seq_len, self.d_model)

        return self.W_o(context)

What They Evaluate

Criteria	What They Look For
Correctness	Proper scaling by sqrt(d_k), correct reshape/transpose operations
Mask handling	Causal mask for autoregressive, padding mask for variable-length
Memory layout	Using `.contiguous()` before `.view()` after transpose
Edge cases	What happens with seq_len=1? With d_model not divisible by n_heads?

Common Follow-Up Questions

"Add GQA support" — Modify so n_kv_heads < n_heads, with Q heads grouped to share KV heads
"Add KV cache for inference" — Accept and return cached K,V tensors
"Make it memory efficient" — Discuss Flash Attention algorithm (tiling + online softmax)
"Add RoPE" — Apply rotation to Q,K before computing attention scores

HARD Anthropic

Q2: Build an In-Memory Database With Progressive Complexity

The Format

Anthropic's coding interviews use progressive rounds — you start with a simple implementation and the interviewer adds complexity every 15-20 minutes. The question below is reconstructed from candidate reports.

Round 1 — Basic Operations (15 min)

class InMemoryDB:
    """Implement SET, GET, DELETE operations."""

    def __init__(self):
        self.store = {}

    def set(self, key: str, value: str) -> None:
        self.store[key] = value

    def get(self, key: str) -> str | None:
        return self.store.get(key)

    def delete(self, key: str) -> bool:
        if key in self.store:
            del self.store[key]
            return True
        return False

Round 2 — Filtered Scans (15 min)

"Now add a SCAN operation that filters by a prefix and returns matching key-value pairs."

def scan(self, prefix: str) -> list[tuple[str, str]]:
    return [(k, v) for k, v in self.store.items() if k.startswith(prefix)]

The interviewer pushes: "This is O(n) over all keys. How would you make prefix scan efficient?"

Better approach: Use a trie or sorted dict (SortedDict from sortedcontainers) for O(log n + k) prefix scans where k is the number of matches.

Round 3 — TTL Support (15 min)

"Add TTL (time-to-live) support. Keys should expire after a specified duration."

import time

class InMemoryDB:
    def __init__(self):
        self.store = {}        # key -> value
        self.ttls = {}         # key -> expiry_timestamp

    def set(self, key: str, value: str, ttl: int = None) -> None:
        self.store[key] = value
        if ttl is not None:
            self.ttls[key] = time.time() + ttl
        elif key in self.ttls:
            del self.ttls[key]  # Remove TTL if re-set without one

    def get(self, key: str) -> str | None:
        if key in self.ttls and time.time() > self.ttls[key]:
            self.delete(key)
            return None
        return self.store.get(key)

    def _lazy_cleanup(self):
        """Periodically clean expired keys."""
        now = time.time()
        expired = [k for k, exp in self.ttls.items() if now > exp]
        for k in expired:
            self.delete(k)

Round 4 — Persistence (15 min)

"Add save/load to compress the database to a file and restore it."

import json, gzip

def save(self, filepath: str) -> None:
    data = {"store": self.store, "ttls": self.ttls}
    with gzip.open(filepath, 'wt') as f:
        json.dump(data, f)

def load(self, filepath: str) -> None:
    with gzip.open(filepath, 'rt') as f:
        data = json.load(f)
    self.store = data["store"]
    self.ttls = {k: float(v) for k, v in data["ttls"].items()}

What Anthropic Is Really Evaluating

Code quality under pressure: Clean, readable code even as complexity grows
Modular design: Can you extend your initial design without rewriting everything?
Edge case awareness: What happens when you GET a key that's expired? What about concurrent TTL cleanup?
Communication: Do you talk through your approach before coding? Do you ask clarifying questions?
Progressive thinking: Do you anticipate where this is going and design for extensibility?

MEDIUM Anthropic

Q3: Implement a Bank Application With Transaction Types

The Task

Build a banking system that handles deposits, withdrawals, and transfers with proper validation. Progressive complexity adds transaction history and balance queries.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Core Implementation

from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum

class TxnType(Enum):
    DEPOSIT = "deposit"
    WITHDRAWAL = "withdrawal"
    TRANSFER = "transfer"

@dataclass
class Transaction:
    txn_type: TxnType
    amount: float
    timestamp: datetime
    from_account: str | None = None
    to_account: str | None = None

class Bank:
    def __init__(self):
        self.accounts: dict[str, float] = {}
        self.history: dict[str, list[Transaction]] = {}

    def create_account(self, account_id: str, initial_balance: float = 0) -> None:
        if account_id in self.accounts:
            raise ValueError(f"Account {account_id} already exists")
        if initial_balance < 0:
            raise ValueError("Initial balance cannot be negative")
        self.accounts[account_id] = initial_balance
        self.history[account_id] = []

    def deposit(self, account_id: str, amount: float) -> float:
        self._validate_account(account_id)
        if amount <= 0:
            raise ValueError("Deposit amount must be positive")
        self.accounts[account_id] += amount
        self.history[account_id].append(
            Transaction(TxnType.DEPOSIT, amount, datetime.now(), to_account=account_id)
        )
        return self.accounts[account_id]

    def withdraw(self, account_id: str, amount: float) -> float:
        self._validate_account(account_id)
        if amount <= 0:
            raise ValueError("Withdrawal amount must be positive")
        if self.accounts[account_id] < amount:
            raise ValueError("Insufficient funds")
        self.accounts[account_id] -= amount
        self.history[account_id].append(
            Transaction(TxnType.WITHDRAWAL, amount, datetime.now(), from_account=account_id)
        )
        return self.accounts[account_id]

    def transfer(self, from_id: str, to_id: str, amount: float) -> None:
        self._validate_account(from_id)
        self._validate_account(to_id)
        if from_id == to_id:
            raise ValueError("Cannot transfer to same account")
        self.withdraw(from_id, amount)
        self.deposit(to_id, amount)
        # Record transfer in both histories
        txn = Transaction(TxnType.TRANSFER, amount, datetime.now(), from_id, to_id)
        self.history[from_id].append(txn)
        self.history[to_id].append(txn)

    def _validate_account(self, account_id: str) -> None:
        if account_id not in self.accounts:
            raise ValueError(f"Account {account_id} not found")

Progressive Follow-Ups

"Add transaction rollback": If deposit in a transfer succeeds but something fails, undo the withdrawal. Implement a simple saga pattern.
"Add concurrent access": Use locks to handle multiple threads doing transfers simultaneously. Discuss deadlock prevention (always lock accounts in sorted order).
"Add interest calculation": Compound interest on all accounts, run monthly. Discuss precision issues with floating point.

MEDIUM Anthropic

Q4: Debug Broken ML Notebooks

The Format

Anthropic's "Bug Fixing" round (reported March 2026): You're given a Jupyter notebook with ML training/inference code that has multiple bugs. Find and fix them.

Common Bug Patterns to Watch For

1. Shape Mismatches

# BUG: Wrong dimension for softmax
logits = model(x)  # shape: (batch, seq_len, vocab_size)
probs = torch.softmax(logits, dim=1)  # Bug! Should be dim=-1 (or dim=2)

2. Device Mismatches

# BUG: Model on GPU, new tensor on CPU
model = model.cuda()
mask = torch.ones(batch_size, seq_len)  # CPU tensor!
output = model(x.cuda(), mask)  # RuntimeError: tensors on different devices
# Fix: mask = mask.cuda() or mask = mask.to(x.device)

3. Gradient Bugs

# BUG: Forgetting to zero gradients
for batch in dataloader:
    loss = criterion(model(batch), targets)
    loss.backward()
    optimizer.step()
    # Missing: optimizer.zero_grad() — gradients accumulate!

4. Data Leakage

# BUG: Fitting scaler on test data
scaler = StandardScaler()
X_all_scaled = scaler.fit_transform(X_all)  # Fits on ALL data including test
X_train, X_test = X_all_scaled[:800], X_all_scaled[800:]
# Fix: Fit on train only, transform test

5. Off-By-One in Tokenization

# BUG: Not accounting for special tokens
max_length = 512
tokens = tokenizer(text, max_length=max_length, truncation=True)
# Actual content tokens = 510 (2 slots taken by [CLS] and [SEP])

How to Approach This Round

Read the full notebook first — understand the intended logic before looking for bugs
Check shapes at each step — most bugs are shape/dimension errors
Trace the data flow — input → preprocessing → model → loss → backward → update
Look for silent bugs — code that runs but produces wrong results (wrong dim for softmax, missing gradient zeroing) is harder to catch than crashes
Test incrementally — fix one bug, run the cell, check the output, move to the next

HARD Anthropic

Q5: Implement Concurrent System Components With Fault Tolerance

The Task

Build a concurrent task processor that executes independent tasks in parallel, handles failures gracefully, and reports results.

Solution Approach

import asyncio
from dataclasses import dataclass
from enum import Enum
from typing import Callable, Any

class TaskStatus(Enum):
    PENDING = "pending"
    RUNNING = "running"
    COMPLETED = "completed"
    FAILED = "failed"

@dataclass
class TaskResult:
    task_id: str
    status: TaskStatus
    result: Any = None
    error: str | None = None

class ConcurrentProcessor:
    def __init__(self, max_concurrency: int = 5, timeout: float = 30.0):
        self.semaphore = asyncio.Semaphore(max_concurrency)
        self.timeout = timeout

    async def _execute_task(
        self, task_id: str, func: Callable, *args
    ) -> TaskResult:
        async with self.semaphore:
            try:
                result = await asyncio.wait_for(
                    func(*args), timeout=self.timeout
                )
                return TaskResult(task_id, TaskStatus.COMPLETED, result=result)
            except asyncio.TimeoutError:
                return TaskResult(task_id, TaskStatus.FAILED, error="Timeout")
            except Exception as e:
                return TaskResult(task_id, TaskStatus.FAILED, error=str(e))

    async def process_all(
        self, tasks: list[tuple[str, Callable, tuple]]
    ) -> list[TaskResult]:
        """Execute all tasks concurrently, return all results."""
        coros = [
            self._execute_task(task_id, func, *args)
            for task_id, func, args in tasks
        ]
        return await asyncio.gather(*coros)

    async def process_with_retry(
        self, task_id: str, func: Callable, args: tuple,
        max_retries: int = 3, backoff: float = 1.0
    ) -> TaskResult:
        """Execute with exponential backoff retry."""
        for attempt in range(max_retries):
            result = await self._execute_task(task_id, func, *args)
            if result.status == TaskStatus.COMPLETED:
                return result
            if attempt < max_retries - 1:
                await asyncio.sleep(backoff * (2 ** attempt))
        return result  # Return last failed result

Follow-Up Questions

"Add a circuit breaker": After N consecutive failures, stop sending tasks to that function and return a fast failure for a cooldown period.
"Handle task dependencies": Some tasks depend on others. Build a DAG executor that respects ordering constraints.
"Add graceful shutdown": On shutdown signal, finish running tasks but don't start new ones. Return pending tasks as cancelled.

NEW FORMAT Meta

Q6: Meta's AI-Assisted Coding Round

What Is It?

Meta launched this new interview format in late 2025. You get a real multi-file codebase and real AI tools (GPT-4o mini, Claude Sonnet, Gemini 2.5 Pro, LLaMA 4). You're evaluated on how effectively you use AI to solve programming tasks.

What You're Given

A multi-file project (typically Python or Java)
Access to AI chat (like Copilot Chat)
60 minutes to complete multiple tasks of increasing complexity

What They Evaluate

Criteria	Weight	What They Look For
Problem decomposition	High	How you break tasks into AI-promptable sub-tasks
Prompt quality	High	Specific, contextual prompts that give the AI what it needs
Verification	High	Do you test AI output? Do you catch AI mistakes?
Code understanding	Medium	Can you read and navigate unfamiliar code?
Speed & efficiency	Medium	How much you accomplish in 60 minutes

Strategies That Work

Read the codebase yourself first — Don't immediately ask AI to explain everything. Understand the structure, then use AI for specific tasks.
Give AI context — "Here's the function signature, the test that should pass, and the error I'm getting. Fix the implementation." — much better than "write a function."
Verify AI output — Run the code. Check edge cases. AI will write plausible-looking code with subtle bugs.
Use AI for boilerplate, think yourself for logic — AI is great for generating test scaffolding, data classes, and configuration. Use your brain for the actual algorithm.

Common Mistakes That Fail Candidates

Blindly copying AI output without reading it
Spending too long prompting when you could write it faster yourself
Not running/testing code after AI generates it
Over-relying on AI for simple tasks (wastes time waiting for responses)
Under-utilizing AI for complex boilerplate (reinventing the wheel)

MEDIUM AI Startups Amazon

Q7: Implement Vector Similarity Search

The Task

Implement cosine similarity search over a collection of vectors. Then discuss how to scale it with approximate nearest neighbors.

Exact Search Implementation

import numpy as np
from typing import List, Tuple

class VectorStore:
    def __init__(self, dimension: int):
        self.dimension = dimension
        self.vectors: list[np.ndarray] = []
        self.metadata: list[dict] = []

    def add(self, vector: np.ndarray, meta: dict = None) -> int:
        assert vector.shape == (self.dimension,)
        # Normalize for cosine similarity
        norm = np.linalg.norm(vector)
        if norm > 0:
            vector = vector / norm
        self.vectors.append(vector)
        self.metadata.append(meta or {})
        return len(self.vectors) - 1

    def search(self, query: np.ndarray, top_k: int = 5) -> List[Tuple[int, float, dict]]:
        query_norm = query / np.linalg.norm(query)

        # Cosine similarity = dot product of normalized vectors
        if not self.vectors:
            return []

        matrix = np.stack(self.vectors)  # (N, d)
        similarities = matrix @ query_norm  # (N,)

        # Get top-k indices
        top_indices = np.argpartition(similarities, -top_k)[-top_k:]
        top_indices = top_indices[np.argsort(similarities[top_indices])[::-1]]

        return [
            (int(idx), float(similarities[idx]), self.metadata[idx])
            for idx in top_indices
        ]

Scaling Discussion: ANN Algorithms

Algorithm	How It Works	Tradeoff
HNSW	Hierarchical navigable small world graph — multi-layer graph traversal	Best recall, but high memory (graph overhead)
IVF	Inverted file — cluster vectors, search only nearby clusters	Good speed, lower memory, tunable recall
PQ	Product quantization — compress vectors to compact codes	Lowest memory, but lower recall
IVF-PQ	Combine IVF and PQ	Best memory/speed/recall balance for large scale

The Discussion They Want

"Exact search is O(n*d) per query — fine for <100K vectors. At millions+ vectors, you need ANN. HNSW is the default choice for most vector databases (Pinecone, Weaviate, Qdrant use it) because it has the best recall at a given latency. The tradeoff is memory — HNSW needs to store the graph structure, roughly 2-4x the raw vector storage. For billion-scale with limited memory, IVF-PQ is better — it compresses vectors to ~32 bytes each (vs. 3072 bytes for a 768-dim FP32 vector). The key parameter to tune is the recall-latency tradeoff: more probes (IVF) or more candidates (HNSW ef_search) = better recall, higher latency."

Frequently Asked Questions

Does Anthropic ask LeetCode?

No. Anthropic's coding interviews focus on progressive system building (like the database question above) and bug fixing. They evaluate code quality, design thinking, and how you handle increasing complexity — not algorithm puzzle solving.

What language should I use?

Python is standard for AI roles. Some companies (Meta, Google) accept C++ or Java. For ML-specific questions (attention implementation), PyTorch is expected. Anthropic's coding round is language-agnostic but most candidates use Python.

How should I prepare for Meta's AI-assisted round?

Practice working with AI coding tools on real projects. The key skill is knowing when to use AI vs. when to code yourself. Practice giving specific, context-rich prompts. And always verify AI output — candidates who blindly accept AI suggestions fail.

How much LeetCode do I still need?

For AI engineering roles specifically: Medium-level proficiency is sufficient. You should be comfortable with arrays, hashmaps, trees, and basic graph algorithms. Hard LeetCode problems are rarely asked for AI roles (except at Google, which still asks traditional coding).

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

AI Coding Interviews in 2026: Not Your Father's LeetCode

The Task

Solution Approach

What They Evaluate

The Format

Round 1 — Basic Operations (15 min)

Round 2 — Filtered Scans (15 min)

Round 3 — TTL Support (15 min)

Round 4 — Persistence (15 min)

The Task

Core Implementation

The Format

Common Bug Patterns to Watch For

The Task

Solution Approach

What Is It?

What You're Given

What They Evaluate

Strategies That Work

The Task

Exact Search Implementation

Scaling Discussion: ANN Algorithms

Frequently Asked Questions

Does Anthropic ask LeetCode?

What language should I use?

How should I prepare for Meta's AI-assisted round?

How much LeetCode do I still need?

Try CallSphere AI Voice Agents

Related Articles

8 AI System Design Interview Questions Actually Asked at FAANG in 2026

8 LLM & RAG Interview Questions That OpenAI, Anthropic & Google Actually Ask

7 ML Fundamentals Questions That Top AI Companies Still Ask in 2026