Active Example Selection
Not all demonstrations are created equal. Active Example Selection watches where the model stumbles, then retrieves the specific examples most likely to resolve that uncertainty — turning static few-shot prompting into a dynamic, query-aware feedback loop.
Introduced: Active Example Selection builds on decades of active learning research (Settles, 2009) and adapts those principles specifically for in-context demonstration selection in large language models. Rather than treating few-shot examples as a fixed preamble, this approach emerged from the recognition that different queries benefit from fundamentally different demonstrations. By 2023, researchers had formalized methods for dynamically selecting in-context examples based on model uncertainty, query characteristics, and task-specific performance signals.
Modern LLM Status: Active Example Selection is an active technique that applies active learning principles to the selection of in-context demonstrations. Rather than statically choosing examples, the approach iteratively selects demonstrations based on the model’s current uncertainty or errors. When the model is uncertain about a particular input, the system retrieves examples most likely to resolve that uncertainty. This creates a feedback loop where example selection is tailored to each specific query, improving efficiency and accuracy compared to fixed demonstration sets. The technique is particularly relevant for production systems that handle diverse queries requiring different types of demonstrations.
Let the Model’s Weakness Guide Your Examples
Active Example Selection dynamically chooses which few-shot demonstrations to show the model based on where the model is currently struggling. Instead of using a fixed set of examples for all queries, it identifies the model’s weak points and selects demonstrations that specifically address those weaknesses. The result is a prompt that is precisely calibrated to each individual query rather than generically assembled for all possible inputs.
This is analogous to a tutor who watches a student fail a particular type of problem and then provides a worked example of exactly that problem type — targeted instruction rather than generic review. A math tutor does not re-explain addition when the student is stuck on long division. Similarly, Active Example Selection does not waste context window space on demonstrations the model already handles well.
The approach transforms few-shot prompting from a static configuration step into a responsive, real-time process. Each query triggers a fresh assessment of what the model needs to see, and the demonstration set is assembled accordingly. This closes the gap between generic prompting and fully fine-tuned models by making the most of every token in the context window.
Fixed example sets are a compromise — they try to cover the broadest range of scenarios but inevitably miss the specific nuances any given query demands. A set optimized for common cases will fail on edge cases, while a set loaded with edge cases wastes tokens on the routine majority. Active Example Selection eliminates this trade-off entirely by assembling a bespoke demonstration set for every single query, ensuring the context window always contains the most informative examples possible.
The Active Example Selection Process
Four stages from uncertainty detection to tailored demonstration
Maintain an Example Pool
Curate a large bank of labeled input-output pairs covering the full task distribution. This pool serves as the reservoir from which demonstrations are dynamically drawn. The richer and more diverse the pool, the more precisely the system can match examples to any given query’s needs.
A customer support classification system maintains 500 labeled tickets spanning billing inquiries, technical issues, cancellation requests, feature requests, and account recovery — each with its correct category label and reasoning.
Assess Model Uncertainty
For a given query, evaluate the model’s confidence or uncertainty. This can be measured through output probability distributions, consistency across multiple generations, or entropy of the predicted label distribution. High uncertainty signals that the model needs targeted help to resolve this particular input.
The model receives “I need to change the card on file, but if this doesn’t fix the charge issue I want to close my account.” It assigns 38% probability to “billing inquiry” and 35% to “cancellation request” — high entropy indicates genuine ambiguity.
Select Targeted Examples
Choose examples from the pool that are most relevant to the model’s uncertainty region. Prioritize examples that are semantically similar to the query, represent the uncertain label classes, or cover the specific edge case the model is struggling with. The selection algorithm balances similarity, diversity, and informational value.
The system retrieves three examples from the pool: one where a payment method change was correctly labeled as “billing inquiry,” one where a conditional cancellation threat was labeled “cancellation request,” and one borderline case where both topics appeared but billing was primary — demonstrating the distinguishing signals.
Generate with Tailored Context
Present the dynamically selected examples as in-context demonstrations, then pose the query. The model now has demonstrations specifically chosen to address its weaknesses on this particular input. The result is a response informed by the most relevant possible context rather than generic, one-size-fits-all demonstrations.
With the three targeted examples in context, the model now correctly classifies the ticket as “billing inquiry” with 82% confidence — the borderline example showed that when the primary action requested is a payment method change, the billing label takes precedence even when cancellation is mentioned conditionally.
See the Difference
Why query-aware demonstrations outperform fixed example sets
Static Example Selection
Same three examples used for every query: a clear billing ticket, a clear technical issue, and a clear feature request. These were chosen once during system setup and never change regardless of input.
Model classifies the mixed billing-and-cancellation ticket as “cancellation request” with 52% confidence. The fixed examples provided no guidance on distinguishing overlapping categories.
Active Example Selection
System detects uncertainty between “billing” and “cancellation,” then retrieves: (1) a payment-method-change ticket labeled billing, (2) a conditional cancellation threat labeled cancellation, (3) a borderline case where billing was primary despite cancellation language.
Model correctly classifies as “billing inquiry” with 82% confidence. The targeted examples resolved the exact ambiguity the model was facing.
Practice Responsible AI
Always verify AI-generated content before use. AI systems can produce confident but incorrect responses. When using AI professionally, transparent disclosure is both best practice and increasingly a legal requirement.
48 US states now require AI transparency in key areas. Critical thinking remains your strongest tool against misinformation.
Active Example Selection in Action
See how uncertainty-driven demonstration selection resolves real challenges
A customer support system receives the message: “My subscription renewed but I thought I cancelled it last month. Can you check what happened and reverse the charge? If this keeps happening I’m done with the service.” The model is uncertain whether this is a billing dispute, a cancellation request, or an account inquiry.
Uncertainty detected: The model assigns 30% to “billing dispute,” 28% to “cancellation request,” and 25% to “account inquiry” — near-uniform distribution across three categories.
Examples retrieved: The system pulls three demonstrations from the pool: (1) a ticket where a customer disputed an unexpected charge but explicitly wanted to keep the service — labeled “billing dispute,” (2) a ticket where a customer mentioned a past cancellation attempt and demanded the account be closed — labeled “cancellation request,” (3) a ticket where a customer asked about a renewal error but expressed frustration without a firm exit intent — labeled “billing dispute.”
Result: With these targeted examples in context, the model correctly identifies the primary intent as “billing dispute” because the customer’s core request is to reverse a charge and investigate, while the cancellation language is conditional (“if this keeps happening”) rather than definitive.
A data extraction pipeline encounters the date string “3rd Quarter FY2024 (Oct-Dec 2023)” in a financial document. The model needs to extract a standardized date range, but the fiscal year offset and informal quarter notation create ambiguity that standard date-parsing examples do not cover.
Uncertainty detected: The model generates inconsistent outputs across three attempts — “2024-07 to 2024-09,” “2023-10 to 2023-12,” and “2024-10 to 2024-12” — revealing confusion between calendar quarters and fiscal year offsets.
Examples retrieved: The system retrieves demonstrations showing: (1) a fiscal year date with explicit calendar mapping where the correct output used the parenthetical calendar dates, (2) a quarter notation with a fiscal year offset where “Q3 FY2025” mapped to January through March 2025, (3) a case where parenthetical clarification overrode the fiscal year label as the authoritative date range.
Result: The model correctly extracts “2023-10-01 to 2023-12-31” by following the pattern established in the retrieved examples: when explicit calendar dates appear in parentheses, those take precedence over fiscal year quarter calculations.
An automated code review system needs to categorize a pull request that renames a function from processData to validateAndTransformInput, updates its internal logic to add input validation, and fixes a null-pointer exception in the process. The model is uncertain whether this is a “refactor,” a “bug fix,” or a “feature enhancement.”
Uncertainty detected: The model assigns 34% to “refactor,” 33% to “bug fix,” and 28% to “feature enhancement” — a three-way split with no clear winner.
Examples retrieved: The system selects: (1) a PR that only renamed functions and reorganized code with no behavior change — labeled “refactor,” (2) a PR that renamed a function but also fixed an edge case crash in the same function — labeled “bug fix” because the behavioral fix was the primary motivation, (3) a PR that added input validation to an existing function without fixing any known bug — labeled “feature enhancement.”
Result: The model correctly categorizes the PR as “bug fix” based on the pattern from the retrieved examples: when a rename and logic change accompany a fix for a known defect (the null-pointer exception), the bug fix label takes priority because the defect resolution is the motivating change.
When to Use Active Example Selection
Best for systems where query diversity demands adaptive demonstrations
Perfect For
When the same system handles billing questions, technical issues, and account management — each query type benefits from fundamentally different demonstrations.
Domains where unusual inputs appear frequently — date formats, code patterns, legal language — and a fixed example set cannot anticipate every variant.
When generic examples leave performance gaps that matter — medical triage, financial classification, or content moderation where errors have real consequences.
Systems with hundreds or thousands of labeled examples available for retrieval — the larger the pool, the more precisely the system can match demonstrations to each query.
Skip It When
When the task is straightforward enough that the same fixed examples work consistently across all queries — sentiment analysis on product reviews, for instance.
When there is no infrastructure for real-time example retrieval and uncertainty estimation — the technique requires embedding search, confidence scoring, and dynamic prompt assembly.
When fewer than 50 labeled examples are available, dynamic selection offers little advantage over simply including all of them or manually curating a representative subset.
Use Cases
Where Active Example Selection delivers the most value
Production Classification Systems
Route incoming tickets, documents, or messages to the correct category by selecting demonstrations that address the specific ambiguity each input presents, rather than relying on generic examples.
Adaptive Tutoring
When a student submits an answer, identify the specific misconception and retrieve worked examples that address that exact misunderstanding, creating a personalized learning experience.
Real-Time Content Moderation
When the model is uncertain whether content violates a policy, retrieve examples of borderline cases in the same category to sharpen the distinction between acceptable and prohibited content.
Dynamic Code Review
Categorize pull requests by retrieving examples of similar code changes that were correctly labeled, especially for borderline cases that blend refactoring, bug fixes, and feature additions.
Medical Triage
When symptom descriptions are ambiguous, retrieve case examples with similar presentations but different diagnoses to help the model distinguish between conditions that share overlapping symptoms.
Multilingual Query Handling
When processing queries that mix languages or use culturally specific idioms, retrieve demonstrations in the relevant language combination to help the model handle code-switching and cultural context accurately.
Where Active Example Selection Fits
Active Example Selection bridges static curation and full knowledge-base integration
Active Example Selection is most powerful when deployed as a continuous loop rather than a one-shot process. As the system handles more queries, it can log which examples most effectively resolved uncertainty for similar inputs. Over time, the selection algorithm becomes increasingly precise — learning not just which examples are semantically similar, but which examples actually change the model’s behavior in the desired direction. This creates a self-improving pipeline where example selection quality compounds with usage.
Related Techniques
Explore complementary in-context learning techniques
Target Your Demonstrations
Build query-aware demonstration pipelines or explore our prompting tools to design more effective in-context learning strategies.