Few-Shot Learning
Instead of describing the task you want done, show it. Provide a handful of input-output examples directly in the prompt, and the model learns the pattern on the fly — no fine-tuning, no training data, no code required.
Introduced: Few-Shot Learning was popularized by Brown et al. in the landmark 2020 GPT-3 paper “Language Models are Few-Shot Learners.” The paper demonstrated that sufficiently large language models could learn entirely new tasks from just a handful of examples placed in the prompt — without any gradient updates or fine-tuning. This was a paradigm shift: previously, adapting a model to a new task required collecting labeled datasets and retraining. GPT-3 showed that 2–5 demonstrations were often enough to match or exceed fine-tuned baselines on dozens of NLP benchmarks.
Modern LLM Status: Few-Shot Learning remains the single most widely used prompting technique across all LLM applications. Every major model — Claude, GPT-4, Gemini, Llama — excels at learning from in-context examples. While modern models are increasingly capable at zero-shot tasks (following instructions without examples), few-shot prompting consistently delivers more precise, format-compliant, and reliable outputs. It is the default strategy recommended by virtually every LLM provider’s documentation, and the foundation upon which more advanced techniques like Example Selection and Chain-of-Thought are built.
Show, Don’t Tell
Describing exactly what you want from a language model is surprisingly hard. You might specify the format, the tone, and the level of detail — and still get output that misses the mark. The problem is that natural language instructions are inherently ambiguous. “Be concise” means different things to different people, and to different models.
Few-Shot Learning sidesteps this ambiguity entirely. Instead of describing the desired output, you demonstrate it. You provide 2–5 concrete examples of input-output pairs, and the model reverse-engineers the pattern: the format, the reasoning style, the level of detail, the edge-case handling — all communicated implicitly through examples rather than explicit instructions.
Think of it like training a new employee. You could hand them a 20-page style guide, or you could show them three finished pieces and say “like this.” The examples communicate thousands of implicit decisions that would be exhausting to spell out in words.
A single example implicitly encodes dozens of decisions: output length, vocabulary level, formatting choices, what to include, what to omit, how to handle ambiguity. When you write “Summarize this article in a professional tone,” the model must interpret every word. When you show three summaries you’ve already written, the model can see exactly what “professional” and “summary” mean to you. This is why few-shot prompting consistently outperforms zero-shot instructions for tasks requiring specific formatting or style.
How Few-Shot Learning Works
Three steps from examples to accurate output
Craft Your Demonstrations
Select 2–5 representative input-output pairs that illustrate the task you want the model to perform. Each example should show a clear mapping from input to desired output. Choose examples that cover the range of variations the model might encounter — different categories, edge cases, or difficulty levels.
For a sentiment classification task, include one clearly positive review, one clearly negative review, and one nuanced or mixed review to show how you handle ambiguity.
Structure the Prompt
Place your examples in the prompt using a consistent format. Each example follows the same input-output structure, clearly labeled so the model can distinguish between the demonstration pairs and the actual query. Consistency is critical — the model uses the repeated pattern to understand what it should produce.
Review: “The battery life is incredible and the screen is gorgeous.”
Sentiment: Positive
Review: “Broke after two weeks. Complete waste of money.”
Sentiment: Negative
Review: [your actual input here]
Sentiment:
Model Completes the Pattern
The model reads your examples, infers the underlying task and format, and applies the same pattern to your new input. It generates output that matches the style, format, and reasoning demonstrated in your examples. The more consistent and clear your examples, the more accurately the model reproduces the pattern.
Given the structured examples above, the model outputs a single-word sentiment label that matches the format of the demonstrations — no extra explanation, no hedging, just the classification you showed it how to produce.
See the Difference
Why a few examples dramatically improve output quality
Zero-Shot
Classify the following customer feedback as “Bug Report,” “Feature Request,” or “General Feedback.”
Feedback: “It would be great if you added dark mode to the mobile app.”
This feedback is a Feature Request. The customer is expressing a desire for a new feature (dark mode) to be added to the mobile application. This isn’t reporting a bug or providing general feedback, but rather suggesting a specific enhancement they’d like to see implemented.
Few-Shot
Feedback: “The app crashes every time I try to upload a photo.”
Category: Bug Report
Feedback: “Love the new update! Everything feels so much smoother.”
Category: General Feedback
Feedback: “Can you add support for CSV file exports?”
Category: Feature Request
Feedback: “It would be great if you added dark mode to the mobile app.”
Category:
Feature Request
Practice Responsible AI
Always verify AI-generated content before use. AI systems can produce confident but incorrect responses. When using AI professionally, transparent disclosure is both best practice and increasingly a legal requirement.
48 US states now require AI transparency in key areas. Critical thinking remains your strongest tool against misinformation.
Few-Shot Learning in Action
See how examples teach the model different tasks
Email: “Hi, I noticed I was charged twice for my last order #4521. Could you please refund the duplicate?”
Intent: Billing Issue
Email: “Just wanted to say your customer service rep Sarah was absolutely fantastic! She went above and beyond.”
Intent: Positive Feedback
Email: “I’ve been waiting 3 weeks for my delivery. Your tracking page says it shipped but nothing has arrived.”
Intent: Shipping Complaint
Email: “Can I change my subscription from monthly to annual billing?”
Intent:
Account Management
Technical: “The API endpoint returns a 429 status code when the rate limit is exceeded.”
User-Friendly: “You’ve made too many requests too quickly. Wait a moment and try again.”
Technical: “The SSL/TLS handshake failed due to an expired certificate on the origin server.”
User-Friendly: “We can’t establish a secure connection right now. Our team has been notified and is working on it.”
Technical: “A race condition in the session management module caused intermittent authentication failures.”
User-Friendly:
“You may have been logged out unexpectedly. Please sign in again — we’ve fixed the issue that was causing this.”
Question: A store sells apples for $2 each and oranges for $3 each. If Maria buys 4 apples and 2 oranges, how much does she spend?
Reasoning: Apples: 4 x $2 = $8. Oranges: 2 x $3 = $6. Total: $8 + $6 = $14.
Answer: $14
Question: A train travels at 60 mph for 2.5 hours, then at 80 mph for 1.5 hours. What is the total distance?
Reasoning: First leg: 60 x 2.5 = 150 miles. Second leg: 80 x 1.5 = 120 miles. Total: 150 + 120 = 270 miles.
Answer: 270 miles
Question: A rectangle has a perimeter of 36 cm. If the length is twice the width, what is the area?
Reasoning:
Reasoning: Let width = w. Length = 2w. Perimeter: 2(2w + w) = 36, so 6w = 36, w = 6 cm. Length = 12 cm. Area: 12 x 6 = 72 sq cm.
Answer: 72 square centimeters
When to Use Few-Shot Learning
Best for tasks where showing is easier than telling
Perfect For
When output must follow a precise structure — JSON schemas, table formats, labeling conventions — examples communicate the exact format more reliably than descriptions.
Sentiment analysis, intent detection, content moderation — any task where you need the model to assign items to predefined categories consistently.
When you need output in a specific voice — brand copy, academic writing, casual tone — showing examples conveys nuance that instructions miss.
Converting between formats — technical jargon to plain language, raw data to structured output, one schema to another — where the mapping is best shown by example.
Skip It When
If a clear instruction like “Translate this to Spanish” or “Summarize in one paragraph” gets reliable results, adding examples wastes tokens without improving output.
Each example consumes prompt tokens. When working with strict token limits or high-volume pipelines where cost per call matters, zero-shot may be more economical.
Brainstorming, creative writing, or open-ended exploration can be overly constrained by examples — the model may mimic rather than innovate.
Use Cases
Where Few-Shot Learning delivers the most value
Customer Support Triage
Classify incoming tickets into categories like billing, technical, shipping, or account issues by showing a few labeled examples of each type.
Content Standardization
Rewrite product descriptions, blog excerpts, or marketing copy to match a specific brand voice by demonstrating the desired style through examples.
Code Generation Patterns
Show the model a few examples of your codebase’s conventions — naming patterns, error handling, documentation style — so generated code matches your project standards.
Localization and Translation
Provide translated pairs that capture your preferred terminology, formality level, and regional conventions so all translations stay consistent.
Data Extraction
Extract structured fields from unstructured text — names, dates, amounts, addresses — by showing a few examples of source text mapped to extracted fields.
Content Moderation
Teach the model your specific moderation policies by showing examples of content that passes, gets flagged, or gets removed — calibrating the boundary through demonstrations.
Where Few-Shot Learning Fits
Few-Shot Learning is the foundation of in-context learning
Research consistently shows that 2–5 well-chosen examples outperform 10+ poorly chosen ones. Focus on diversity (covering different cases), consistency (same format every time), and representativeness (examples that match the actual inputs the model will see). When combined with Example Selection techniques, you can dynamically choose the most relevant demonstrations for each new query, getting the best of both worlds.
Related Techniques
Explore techniques that build on few-shot learning
Build Better Prompts with Examples
Put few-shot learning into practice with our interactive tools, or explore the full library of prompting frameworks.