7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)
Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.
AI Coding Interviews in 2026: Not Your Father's LeetCode
The coding bar for AI roles has shifted dramatically. Anthropic doesn't ask LeetCode at all — they test progressive system building. Meta now has an AI-assisted coding round where you work with real AI tools. OpenAI's coding questions focus on practical ML implementation.
Here are 7 real coding questions from these companies, with the approaches that pass.
Important: Anthropic strictly prohibits AI assistance during live interviews. Meta explicitly provides AI tools. Know the rules before your interview.
The Task
Implement scaled dot-product multi-head attention using only basic PyTorch tensor operations. No nn.MultiheadAttention.
Solution Approach
import torch
import torch.nn as nn
import math
class MultiHeadAttention(nn.Module):
def __init__(self, d_model: int, n_heads: int):
super().__init__()
assert d_model % n_heads == 0
self.d_model = d_model
self.n_heads = n_heads
self.d_k = d_model // n_heads
# Projection matrices
self.W_q = nn.Linear(d_model, d_model, bias=False)
self.W_k = nn.Linear(d_model, d_model, bias=False)
self.W_v = nn.Linear(d_model, d_model, bias=False)
self.W_o = nn.Linear(d_model, d_model, bias=False)
def forward(self, x: torch.Tensor, mask: torch.Tensor = None):
batch_size, seq_len, _ = x.shape
# Project and reshape: (B, N, d) -> (B, h, N, d_k)
Q = self.W_q(x).view(batch_size, seq_len, self.n_heads, self.d_k).transpose(1, 2)
K = self.W_k(x).view(batch_size, seq_len, self.n_heads, self.d_k).transpose(1, 2)
V = self.W_v(x).view(batch_size, seq_len, self.n_heads, self.d_k).transpose(1, 2)
# Scaled dot-product attention
scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)
# Apply causal mask if provided
if mask is not None:
scores = scores.masked_fill(mask == 0, float('-inf'))
attn_weights = torch.softmax(scores, dim=-1)
# Apply attention to values
context = torch.matmul(attn_weights, V) # (B, h, N, d_k)
# Reshape back: (B, h, N, d_k) -> (B, N, d)
context = context.transpose(1, 2).contiguous().view(batch_size, seq_len, self.d_model)
return self.W_o(context)
What They Evaluate
| Criteria | What They Look For |
|---|---|
| Correctness | Proper scaling by sqrt(d_k), correct reshape/transpose operations |
| Mask handling | Causal mask for autoregressive, padding mask for variable-length |
| Memory layout | Using .contiguous() before .view() after transpose |
| Edge cases | What happens with seq_len=1? With d_model not divisible by n_heads? |
Common Follow-Up Questions
- "Add GQA support" — Modify so n_kv_heads < n_heads, with Q heads grouped to share KV heads
- "Add KV cache for inference" — Accept and return cached K,V tensors
- "Make it memory efficient" — Discuss Flash Attention algorithm (tiling + online softmax)
- "Add RoPE" — Apply rotation to Q,K before computing attention scores
The Format
Anthropic's coding interviews use progressive rounds — you start with a simple implementation and the interviewer adds complexity every 15-20 minutes. The question below is reconstructed from candidate reports.
Round 1 — Basic Operations (15 min)
class InMemoryDB:
"""Implement SET, GET, DELETE operations."""
def __init__(self):
self.store = {}
def set(self, key: str, value: str) -> None:
self.store[key] = value
def get(self, key: str) -> str | None:
return self.store.get(key)
def delete(self, key: str) -> bool:
if key in self.store:
del self.store[key]
return True
return False
Round 2 — Filtered Scans (15 min)
"Now add a SCAN operation that filters by a prefix and returns matching key-value pairs."
def scan(self, prefix: str) -> list[tuple[str, str]]:
return [(k, v) for k, v in self.store.items() if k.startswith(prefix)]
The interviewer pushes: "This is O(n) over all keys. How would you make prefix scan efficient?"
Better approach: Use a trie or sorted dict (SortedDict from sortedcontainers) for O(log n + k) prefix scans where k is the number of matches.
Round 3 — TTL Support (15 min)
"Add TTL (time-to-live) support. Keys should expire after a specified duration."
import time
class InMemoryDB:
def __init__(self):
self.store = {} # key -> value
self.ttls = {} # key -> expiry_timestamp
def set(self, key: str, value: str, ttl: int = None) -> None:
self.store[key] = value
if ttl is not None:
self.ttls[key] = time.time() + ttl
elif key in self.ttls:
del self.ttls[key] # Remove TTL if re-set without one
def get(self, key: str) -> str | None:
if key in self.ttls and time.time() > self.ttls[key]:
self.delete(key)
return None
return self.store.get(key)
def _lazy_cleanup(self):
"""Periodically clean expired keys."""
now = time.time()
expired = [k for k, exp in self.ttls.items() if now > exp]
for k in expired:
self.delete(k)
Round 4 — Persistence (15 min)
"Add save/load to compress the database to a file and restore it."
import json, gzip
def save(self, filepath: str) -> None:
data = {"store": self.store, "ttls": self.ttls}
with gzip.open(filepath, 'wt') as f:
json.dump(data, f)
def load(self, filepath: str) -> None:
with gzip.open(filepath, 'rt') as f:
data = json.load(f)
self.store = data["store"]
self.ttls = {k: float(v) for k, v in data["ttls"].items()}
What Anthropic Is Really Evaluating
- Code quality under pressure: Clean, readable code even as complexity grows
- Modular design: Can you extend your initial design without rewriting everything?
- Edge case awareness: What happens when you GET a key that's expired? What about concurrent TTL cleanup?
- Communication: Do you talk through your approach before coding? Do you ask clarifying questions?
- Progressive thinking: Do you anticipate where this is going and design for extensibility?
The Task
Build a banking system that handles deposits, withdrawals, and transfers with proper validation. Progressive complexity adds transaction history and balance queries.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Core Implementation
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
class TxnType(Enum):
DEPOSIT = "deposit"
WITHDRAWAL = "withdrawal"
TRANSFER = "transfer"
@dataclass
class Transaction:
txn_type: TxnType
amount: float
timestamp: datetime
from_account: str | None = None
to_account: str | None = None
class Bank:
def __init__(self):
self.accounts: dict[str, float] = {}
self.history: dict[str, list[Transaction]] = {}
def create_account(self, account_id: str, initial_balance: float = 0) -> None:
if account_id in self.accounts:
raise ValueError(f"Account {account_id} already exists")
if initial_balance < 0:
raise ValueError("Initial balance cannot be negative")
self.accounts[account_id] = initial_balance
self.history[account_id] = []
def deposit(self, account_id: str, amount: float) -> float:
self._validate_account(account_id)
if amount <= 0:
raise ValueError("Deposit amount must be positive")
self.accounts[account_id] += amount
self.history[account_id].append(
Transaction(TxnType.DEPOSIT, amount, datetime.now(), to_account=account_id)
)
return self.accounts[account_id]
def withdraw(self, account_id: str, amount: float) -> float:
self._validate_account(account_id)
if amount <= 0:
raise ValueError("Withdrawal amount must be positive")
if self.accounts[account_id] < amount:
raise ValueError("Insufficient funds")
self.accounts[account_id] -= amount
self.history[account_id].append(
Transaction(TxnType.WITHDRAWAL, amount, datetime.now(), from_account=account_id)
)
return self.accounts[account_id]
def transfer(self, from_id: str, to_id: str, amount: float) -> None:
self._validate_account(from_id)
self._validate_account(to_id)
if from_id == to_id:
raise ValueError("Cannot transfer to same account")
self.withdraw(from_id, amount)
self.deposit(to_id, amount)
# Record transfer in both histories
txn = Transaction(TxnType.TRANSFER, amount, datetime.now(), from_id, to_id)
self.history[from_id].append(txn)
self.history[to_id].append(txn)
def _validate_account(self, account_id: str) -> None:
if account_id not in self.accounts:
raise ValueError(f"Account {account_id} not found")
Progressive Follow-Ups
- "Add transaction rollback": If deposit in a transfer succeeds but something fails, undo the withdrawal. Implement a simple saga pattern.
- "Add concurrent access": Use locks to handle multiple threads doing transfers simultaneously. Discuss deadlock prevention (always lock accounts in sorted order).
- "Add interest calculation": Compound interest on all accounts, run monthly. Discuss precision issues with floating point.
The Format
Anthropic's "Bug Fixing" round (reported March 2026): You're given a Jupyter notebook with ML training/inference code that has multiple bugs. Find and fix them.
Common Bug Patterns to Watch For
1. Shape Mismatches
# BUG: Wrong dimension for softmax
logits = model(x) # shape: (batch, seq_len, vocab_size)
probs = torch.softmax(logits, dim=1) # Bug! Should be dim=-1 (or dim=2)
2. Device Mismatches
# BUG: Model on GPU, new tensor on CPU
model = model.cuda()
mask = torch.ones(batch_size, seq_len) # CPU tensor!
output = model(x.cuda(), mask) # RuntimeError: tensors on different devices
# Fix: mask = mask.cuda() or mask = mask.to(x.device)
3. Gradient Bugs
# BUG: Forgetting to zero gradients
for batch in dataloader:
loss = criterion(model(batch), targets)
loss.backward()
optimizer.step()
# Missing: optimizer.zero_grad() — gradients accumulate!
4. Data Leakage
# BUG: Fitting scaler on test data
scaler = StandardScaler()
X_all_scaled = scaler.fit_transform(X_all) # Fits on ALL data including test
X_train, X_test = X_all_scaled[:800], X_all_scaled[800:]
# Fix: Fit on train only, transform test
5. Off-By-One in Tokenization
# BUG: Not accounting for special tokens
max_length = 512
tokens = tokenizer(text, max_length=max_length, truncation=True)
# Actual content tokens = 510 (2 slots taken by [CLS] and [SEP])
How to Approach This Round
- Read the full notebook first — understand the intended logic before looking for bugs
- Check shapes at each step — most bugs are shape/dimension errors
- Trace the data flow — input → preprocessing → model → loss → backward → update
- Look for silent bugs — code that runs but produces wrong results (wrong dim for softmax, missing gradient zeroing) is harder to catch than crashes
- Test incrementally — fix one bug, run the cell, check the output, move to the next
The Task
Build a concurrent task processor that executes independent tasks in parallel, handles failures gracefully, and reports results.
Solution Approach
import asyncio
from dataclasses import dataclass
from enum import Enum
from typing import Callable, Any
class TaskStatus(Enum):
PENDING = "pending"
RUNNING = "running"
COMPLETED = "completed"
FAILED = "failed"
@dataclass
class TaskResult:
task_id: str
status: TaskStatus
result: Any = None
error: str | None = None
class ConcurrentProcessor:
def __init__(self, max_concurrency: int = 5, timeout: float = 30.0):
self.semaphore = asyncio.Semaphore(max_concurrency)
self.timeout = timeout
async def _execute_task(
self, task_id: str, func: Callable, *args
) -> TaskResult:
async with self.semaphore:
try:
result = await asyncio.wait_for(
func(*args), timeout=self.timeout
)
return TaskResult(task_id, TaskStatus.COMPLETED, result=result)
except asyncio.TimeoutError:
return TaskResult(task_id, TaskStatus.FAILED, error="Timeout")
except Exception as e:
return TaskResult(task_id, TaskStatus.FAILED, error=str(e))
async def process_all(
self, tasks: list[tuple[str, Callable, tuple]]
) -> list[TaskResult]:
"""Execute all tasks concurrently, return all results."""
coros = [
self._execute_task(task_id, func, *args)
for task_id, func, args in tasks
]
return await asyncio.gather(*coros)
async def process_with_retry(
self, task_id: str, func: Callable, args: tuple,
max_retries: int = 3, backoff: float = 1.0
) -> TaskResult:
"""Execute with exponential backoff retry."""
for attempt in range(max_retries):
result = await self._execute_task(task_id, func, *args)
if result.status == TaskStatus.COMPLETED:
return result
if attempt < max_retries - 1:
await asyncio.sleep(backoff * (2 ** attempt))
return result # Return last failed result
Follow-Up Questions
- "Add a circuit breaker": After N consecutive failures, stop sending tasks to that function and return a fast failure for a cooldown period.
- "Handle task dependencies": Some tasks depend on others. Build a DAG executor that respects ordering constraints.
- "Add graceful shutdown": On shutdown signal, finish running tasks but don't start new ones. Return pending tasks as cancelled.
What Is It?
Meta launched this new interview format in late 2025. You get a real multi-file codebase and real AI tools (GPT-4o mini, Claude Sonnet, Gemini 2.5 Pro, LLaMA 4). You're evaluated on how effectively you use AI to solve programming tasks.
What You're Given
- A multi-file project (typically Python or Java)
- Access to AI chat (like Copilot Chat)
- 60 minutes to complete multiple tasks of increasing complexity
What They Evaluate
| Criteria | Weight | What They Look For |
|---|---|---|
| Problem decomposition | High | How you break tasks into AI-promptable sub-tasks |
| Prompt quality | High | Specific, contextual prompts that give the AI what it needs |
| Verification | High | Do you test AI output? Do you catch AI mistakes? |
| Code understanding | Medium | Can you read and navigate unfamiliar code? |
| Speed & efficiency | Medium | How much you accomplish in 60 minutes |
Strategies That Work
- Read the codebase yourself first — Don't immediately ask AI to explain everything. Understand the structure, then use AI for specific tasks.
- Give AI context — "Here's the function signature, the test that should pass, and the error I'm getting. Fix the implementation." — much better than "write a function."
- Verify AI output — Run the code. Check edge cases. AI will write plausible-looking code with subtle bugs.
- Use AI for boilerplate, think yourself for logic — AI is great for generating test scaffolding, data classes, and configuration. Use your brain for the actual algorithm.
Common Mistakes That Fail Candidates
- Blindly copying AI output without reading it
- Spending too long prompting when you could write it faster yourself
- Not running/testing code after AI generates it
- Over-relying on AI for simple tasks (wastes time waiting for responses)
- Under-utilizing AI for complex boilerplate (reinventing the wheel)
The Task
Implement cosine similarity search over a collection of vectors. Then discuss how to scale it with approximate nearest neighbors.
Exact Search Implementation
import numpy as np
from typing import List, Tuple
class VectorStore:
def __init__(self, dimension: int):
self.dimension = dimension
self.vectors: list[np.ndarray] = []
self.metadata: list[dict] = []
def add(self, vector: np.ndarray, meta: dict = None) -> int:
assert vector.shape == (self.dimension,)
# Normalize for cosine similarity
norm = np.linalg.norm(vector)
if norm > 0:
vector = vector / norm
self.vectors.append(vector)
self.metadata.append(meta or {})
return len(self.vectors) - 1
def search(self, query: np.ndarray, top_k: int = 5) -> List[Tuple[int, float, dict]]:
query_norm = query / np.linalg.norm(query)
# Cosine similarity = dot product of normalized vectors
if not self.vectors:
return []
matrix = np.stack(self.vectors) # (N, d)
similarities = matrix @ query_norm # (N,)
# Get top-k indices
top_indices = np.argpartition(similarities, -top_k)[-top_k:]
top_indices = top_indices[np.argsort(similarities[top_indices])[::-1]]
return [
(int(idx), float(similarities[idx]), self.metadata[idx])
for idx in top_indices
]
Scaling Discussion: ANN Algorithms
| Algorithm | How It Works | Tradeoff |
|---|---|---|
| HNSW | Hierarchical navigable small world graph — multi-layer graph traversal | Best recall, but high memory (graph overhead) |
| IVF | Inverted file — cluster vectors, search only nearby clusters | Good speed, lower memory, tunable recall |
| PQ | Product quantization — compress vectors to compact codes | Lowest memory, but lower recall |
| IVF-PQ | Combine IVF and PQ | Best memory/speed/recall balance for large scale |
The Discussion They Want
"Exact search is O(n*d) per query — fine for <100K vectors. At millions+ vectors, you need ANN. HNSW is the default choice for most vector databases (Pinecone, Weaviate, Qdrant use it) because it has the best recall at a given latency. The tradeoff is memory — HNSW needs to store the graph structure, roughly 2-4x the raw vector storage. For billion-scale with limited memory, IVF-PQ is better — it compresses vectors to ~32 bytes each (vs. 3072 bytes for a 768-dim FP32 vector). The key parameter to tune is the recall-latency tradeoff: more probes (IVF) or more candidates (HNSW ef_search) = better recall, higher latency."
Frequently Asked Questions
Does Anthropic ask LeetCode?
No. Anthropic's coding interviews focus on progressive system building (like the database question above) and bug fixing. They evaluate code quality, design thinking, and how you handle increasing complexity — not algorithm puzzle solving.
What language should I use?
Python is standard for AI roles. Some companies (Meta, Google) accept C++ or Java. For ML-specific questions (attention implementation), PyTorch is expected. Anthropic's coding round is language-agnostic but most candidates use Python.
How should I prepare for Meta's AI-assisted round?
Practice working with AI coding tools on real projects. The key skill is knowing when to use AI vs. when to code yourself. Practice giving specific, context-rich prompts. And always verify AI output — candidates who blindly accept AI suggestions fail.
How much LeetCode do I still need?
For AI engineering roles specifically: Medium-level proficiency is sufficient. You should be comfortable with arrays, hashmaps, trees, and basic graph algorithms. Hard LeetCode problems are rarely asked for AI roles (except at Google, which still asks traditional coding).
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.