Max Mutual Information
Not all few-shot examples are created equal. Max Mutual Information selects demonstrations that maximize the information connection between your examples and the desired output — choosing examples by their predictive power, not just surface similarity to the input.
Introduced: Max Mutual Information (MMI) for prompt selection was formalized in 2022, building on information-theoretic principles applied to in-context learning. The technique addresses a critical problem: few-shot examples chosen by surface similarity (nearest-neighbor in embedding space) often underperform examples chosen by their ability to make the correct output predictable. MMI flips the selection criterion — instead of asking “which examples look like this input?”, it asks “which examples make the model most likely to produce the correct output?”
Modern LLM Status: Example selection remains a high-impact lever for few-shot performance. Even with modern models like Claude and GPT-4, the choice of demonstrations significantly affects output quality. The MMI principle has been absorbed into automated prompt optimization tools and example selection pipelines. For practitioners, the key insight remains actionable: test multiple example sets and measure which ones produce the best outputs, rather than assuming the most similar-looking examples are the most helpful.
The Best Examples Predict the Answer, Not Just Match the Question
When building few-shot prompts, most people instinctively pick examples that look similar to the current input. Classifying a movie review? Choose other movie reviews. Answering a science question? Pick other science questions. This seems logical, but it often leads to suboptimal results because surface similarity doesn’t guarantee that the examples actually help the model produce the right answer.
Max Mutual Information reframes example selection as an optimization problem. Instead of selecting examples by input similarity, MMI selects examples that maximize the mutual information between the demonstration set and the correct output. In practical terms: given a candidate set of examples, choose the combination that makes the model assign the highest probability to the correct answer. The examples that best “inform” the output are the ones to use.
Think of it like choosing study materials for an exam. You could study topics that look like what you expect on the test (similarity-based), or you could study the materials that, based on past experience, most improve your test scores (information-based). MMI takes the second approach — measuring actual predictive impact rather than assumed relevance.
Two examples can be equally similar to your input but have vastly different effects on the output. An example that shares the same structure and reasoning pattern as your problem is more informative than one that shares the same topic but uses different logic. MMI captures this distinction by measuring how much the examples reduce uncertainty about the output, not how much they resemble the input.
The MMI Selection Process
Four stages from example pool to optimally informative demonstrations
Build a Candidate Example Pool
Assemble a library of potential few-shot examples, each consisting of an input-output pair. This pool should be diverse enough to cover the range of patterns the model might encounter. The pool can come from training data, manually crafted examples, or previously successful prompt-response pairs.
For a sentiment classification task, gather 50 labeled reviews spanning positive, negative, and neutral sentiments across different product categories, writing styles, and review lengths.
Score Examples by Output Probability
For each candidate example (or combination of examples), measure how much it increases the model’s probability of generating the correct output for your target problem. This is the mutual information signal: examples that make the correct answer more likely carry more information about the output distribution.
Test each candidate demonstration with your target input and measure the model’s confidence in the correct label. Example A (a sarcastic review correctly labeled negative) might push confidence to 92%, while Example B (a straightforward positive review) only reaches 78%.
Select the Top-K Informative Examples
Rank all candidates by their information score and select the top-k examples that jointly maximize mutual information. This step considers not just individual example quality but also diversity — two nearly identical high-scoring examples may provide less total information than two different high-scoring examples that cover distinct patterns.
From the 50-example pool, MMI selects 3 demonstrations: a sarcastic negative review, a mixed-sentiment review correctly labeled neutral, and a short positive review. Together, these three examples maximize the model’s ability to classify the target review correctly.
Construct the Optimized Prompt
Arrange the selected examples in the prompt, considering order effects (example ordering can impact performance). The final prompt contains demonstrations that are maximally informative for the specific task at hand, rather than generically similar to the input.
The final prompt places the three selected examples in order of increasing difficulty, followed by the target review. The model now classifies the target with the highest possible confidence, guided by the most informative demonstrations available.
See the Difference
Why information-optimized examples outperform similarity-based selection
Similarity-Based Selection
Target: Classify this email as spam or not-spam.
Selected examples (most similar emails by embedding):
1. Newsletter about cooking → not-spam
2. Newsletter about gardening → not-spam
3. Newsletter about fitness → not-spam
Input: “URGENT: Your account has been compromised! Click here to verify your identity immediately.”
This appears to be a legitimate security notification. Classification: not-spam.
MMI Selection
Selected examples (maximizing output probability):
1. “Your package is delayed” → not-spam (legitimate urgency)
2. “CLICK NOW to claim your prize!!!” → spam (urgency + manipulation)
3. “Verify your account at this link” → spam (phishing pattern)
Input: “URGENT: Your account has been compromised! Click here to verify your identity immediately.”
This matches phishing patterns: urgent language, account threat, and a verification link. Classification: spam.
Practice Responsible AI
Always verify AI-generated content before use. AI systems can produce confident but incorrect responses. When using AI professionally, transparent disclosure is both best practice and increasingly a legal requirement.
48 US states now require AI transparency in key areas. Critical thinking remains your strongest tool against misinformation.
MMI Selection in Action
See how information-optimized examples improve task performance
Target input: “Oh great, another software update that breaks everything. Exactly what I needed today.”
Similarity-based would choose: Other software reviews (all straightforwardly negative → model learns “software complaints = negative”, which is correct but misses the WHY)
MMI selects:
1. “This is just wonderful. My flight got cancelled again.” → Negative (sarcasm pattern)
2. “Best purchase I’ve ever made, truly life-changing!” → Positive (genuine enthusiasm)
3. “Sure, because who doesn’t love waiting 3 hours?” → Negative (sarcastic rhetorical question)
The target uses sarcasm (“Oh great”, “Exactly what I needed”). MMI-selected examples teach the model to recognize sarcastic patterns specifically, not just negative language. The genuine positive example provides a contrast that sharpens the model’s ability to distinguish sarcasm from sincere enthusiasm. Result: the model correctly identifies the review as negative with high confidence, understanding the sarcastic register rather than just pattern-matching keywords.
Target: Classify this contract clause as “termination”, “liability”, or “indemnification”.
Input clause: “Party B shall hold harmless and defend Party A against any claims arising from Party B’s negligence, provided notice is given within 30 days.”
MMI selects examples at the decision boundary:
1. Pure indemnification clause (clear case) → indemnification
2. Liability limitation with indemnity language (borderline) → liability
3. Indemnification with termination trigger (borderline) → indemnification
MMI selected examples that highlight the distinctions between categories, not just clear examples of each. The borderline cases teach the model WHERE the boundary is between liability and indemnification clauses. The model correctly classifies the target as “indemnification” based on the “hold harmless and defend” language, despite the “negligence” and “notice” language that could suggest liability.
Target: Classify this bug report as “performance”, “logic error”, or “UI/UX”.
Input: “The search results page takes 8 seconds to load when there are more than 100 results. Users see a blank screen during this time.”
Similarity-based: Other slow-loading bug reports → all classified as performance
MMI selects:
1. “API timeout after 10s on large queries” → performance (pure backend)
2. “Loading spinner not shown during data fetch” → UI/UX (user sees nothing)
3. “Page freezes during sort of 1000+ items” → performance (client-side)
The target bug has BOTH a performance component (8-second load) AND a UI/UX component (blank screen). MMI-selected examples help the model recognize this dual nature. The model correctly identifies the primary issue as “performance” but notes the blank screen is a separate UI/UX concern — a nuanced classification that similarity-based selection would miss by only showing pure performance bugs.
When to Use MMI Selection
Best for classification and structured tasks where example choice directly impacts accuracy
Perfect For
When classification accuracy directly impacts business decisions — medical triage, fraud detection, content moderation — where a 5% accuracy gain from better examples is worth the selection effort.
Tasks with overlapping or confusable classes — where the decision boundary between categories is subtle and examples near that boundary are most informative.
When you have many candidate examples to choose from — the larger the pool, the more opportunity to find truly informative demonstrations rather than settling for merely adequate ones.
Automated systems processing thousands of inputs daily — the upfront cost of example optimization is amortized across many inference calls.
Skip It When
If you only have 5-10 candidate examples, there’s limited room for optimization. The selection effort may not yield meaningful improvement over random or manual selection.
For creative writing, brainstorming, or tasks without a single correct answer, mutual information is hard to define and optimize — there’s no clear “correct output” to maximize probability for.
If you’re only running a prompt once, the overhead of scoring and selecting optimal examples isn’t justified. Just pick examples that cover the main patterns.
Use Cases
Where MMI selection delivers the most value
Fraud Detection
Select demonstration transactions that highlight the subtle boundary between legitimate and fraudulent patterns, teaching the model to spot borderline cases rather than obvious ones.
Medical Triage
Choose case examples that distinguish between conditions with similar presentations — maximizing the model’s ability to differentiate diagnoses that share overlapping symptoms.
Intent Classification
For chatbots and virtual assistants, select examples that disambiguate easily confused intents — “cancel my order” vs. “change my order” vs. “track my order.”
Document Routing
Route documents to the correct department by selecting examples that distinguish between overlapping categories — legal vs. compliance, engineering vs. product, sales vs. marketing.
Content Moderation
Select examples that teach the nuanced boundary between acceptable and policy-violating content — where context, intent, and tone determine the classification.
Data Quality Assessment
Classify data records as clean, suspicious, or corrupted by selecting examples that demonstrate the subtle differences between legitimate anomalies and actual data quality issues.
Where MMI Fits
MMI bridges naive selection and fully automated prompt optimization
Computing exact mutual information is expensive. In practice, approximate MMI by testing each candidate example against a validation set and measuring accuracy. The examples that produce the highest validation accuracy are your MMI-optimal selections. This “try and measure” approach captures the MMI insight without requiring probability calculations.
Related Techniques
Explore complementary example selection techniques
Optimize Your Examples
Start selecting demonstrations by their predictive power and build more effective few-shot prompts with our tools.