Zero-Shot vs Few-Shot Prompting: When to Use Each Approach

The Spectrum of Example-Based Prompting

When you ask an LLM to perform a task, you can provide zero, one, or several examples of the desired input-output behavior. This choice — how many examples to include — is one of the most impactful decisions in prompt engineering. Each approach has distinct strengths, and understanding when to use which can mean the difference between a 60% and a 95% success rate.

Zero-Shot Prompting

Zero-shot prompting means giving the model a task description with no examples. You rely entirely on the model's pre-trained knowledge to understand what you want.

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": "Classify the sentiment of customer reviews as positive, neutral, or negative. Return only the label."
        },
        {
            "role": "user",
            "content": "The delivery was fast but the packaging was damaged."
        }
    ]
)

print(response.choices[0].message.content)  # "neutral"

Zero-shot works well for tasks the model has seen extensively during training: sentiment analysis, translation, summarization, and simple classification. It is fast to implement and keeps token costs low.

When to use zero-shot: The task is common, the output format is simple, and you need quick iteration without curating examples.

One-Shot Prompting

One-shot prompting provides a single example to anchor the model's understanding. This is often enough to clarify ambiguous formatting or establish a pattern.

messages = [
    {
        "role": "system",
        "content": "Extract structured data from product descriptions."
    },
    {
        "role": "user",
        "content": "Nike Air Max 90, men's running shoe, $129.99, available in black and white"
    },
    {
        "role": "assistant",
        "content": '{"brand": "Nike", "model": "Air Max 90", "category": "running", "price": 129.99, "colors": ["black", "white"]}'
    },
    {
        "role": "user",
        "content": "Adidas Ultraboost 22, women's training shoe, $189.00, available in pink, grey, and navy"
    }
]

The single example communicates the JSON schema, field naming conventions, and how to handle multi-value fields — all without verbose instructions.

Few-Shot Prompting

Few-shot prompting provides 2-8 examples that collectively cover the range of expected inputs and edge cases. This is the most powerful technique for custom or domain-specific tasks.

def build_few_shot_classifier(reviews: list[str]) -> list[dict]:
    examples = [
        ("Absolutely love this product, works perfectly!", "positive"),
        ("It's okay, nothing special but does the job.", "neutral"),
        ("Broke after two days. Complete waste of money.", "negative"),
        ("Good quality but overpriced for what you get.", "neutral"),
        ("Best purchase I've made this year, highly recommend!", "positive"),
    ]

    messages = [
        {
            "role": "system",
            "content": "Classify customer reviews as positive, neutral, or negative."
        }
    ]

    for text, label in examples:
        messages.append({"role": "user", "content": text})
        messages.append({"role": "assistant", "content": label})

    # Add the actual reviews to classify
    for review in reviews:
        messages.append({"role": "user", "content": review})

    return messages

Selecting Good Examples

The quality of your examples matters more than the quantity. Follow these guidelines:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Cover the output space. If you have three classes, include at least one example of each. If outputs vary in length or structure, show that range.

Include edge cases. The mixed-sentiment review ("Good quality but overpriced") is more valuable than another clearly positive example.

Keep examples realistic. Use actual data from your domain, not synthetic toy examples. Models pick up on subtle patterns in real data.

Order matters. Place the most representative examples first and the edge cases last. The model pays more attention to recent examples.

# Bad: all examples are clearly positive or negative
examples = [
    ("Amazing!", "positive"),
    ("Terrible!", "negative"),
    ("Wonderful!", "positive"),
]

# Good: covers the full spectrum including ambiguity
examples = [
    ("Delivery was fast, product matches the description.", "positive"),
    ("Arrived late but the quality is decent.", "neutral"),
    ("Completely broken on arrival, no response from support.", "negative"),
    ("The color is slightly different than pictured but I still like it.", "neutral"),
]

Decision Framework

Use this practical guide:

Approach	Best For	Token Cost	Setup Time
Zero-shot	Common tasks, simple outputs	Low	Minutes
One-shot	Format clarification, schema definition	Low	Minutes
Few-shot	Custom classification, domain-specific tasks	Medium	Hours

Start with zero-shot. If the output is inconsistent or wrong, add one example. If edge cases are mishandled, add more examples targeting those specific failure modes. This incremental approach avoids over-engineering your prompts.

FAQ

How many examples should I use for few-shot prompting?

Three to five examples is the sweet spot for most tasks. Beyond 8 examples, you hit diminishing returns and increasing token costs. If you need more than 8 examples to get reliable results, consider fine-tuning instead.

Can few-shot examples hurt performance?

Yes. Poor-quality examples — ambiguous labels, unrepresentative data, or formatting inconsistencies — actively confuse the model. One bad example can negate three good ones. Always validate that each example unambiguously demonstrates the pattern you want.

Should I randomize the order of few-shot examples?

For classification tasks, vary the label order so the model does not develop a recency bias. If your last three examples are all "positive," the model may lean toward "positive" for the next input. Interleave labels to prevent this.

#FewShotPrompting #ZeroShot #PromptEngineering #LLM #Python #AgenticAI #LearnAI #AIEngineering

Zero-Shot vs Few-Shot Prompting: When to Use Each Approach

The Spectrum of Example-Based Prompting

Zero-Shot Prompting

One-Shot Prompting

Few-Shot Prompting

Selecting Good Examples

Decision Framework

FAQ

How many examples should I use for few-shot prompting?

Can few-shot examples hurt performance?

Should I randomize the order of few-shot examples?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding