Self-Correction

Self-Verification

Work backwards from the answer to catch errors — the mathematical proof-checker for AI reasoning.

Technique Context: 2022-2023

Background: Self-Verification builds on backward reasoning and constraint-checking concepts from formal methods and mathematical proof theory. As a prompting technique, it gained traction in 2022-2023 as researchers demonstrated that explicitly asking LLMs to verify their own answers — by substituting solutions back into problems — significantly improved accuracy on math, logic, and constraint-satisfaction tasks.

Modern LLM Status: Self-Verification remains a valuable and practical prompting technique. While modern LLMs (Claude, GPT-4) show improved reasoning capabilities, they still benefit significantly from explicit verification prompts, especially on multi-step math problems and constraint-heavy tasks. Some models now perform implicit verification in their extended thinking modes, but explicit backward-checking prompts remain more reliable in 2025-2026.

The Concept

The Backward Check

Self-Verification applies a principle every math teacher knows: checking your work is easier than doing it right the first time. After the model generates an answer, it reverses direction — plugging the answer back into the original problem to see if everything holds up.

This works because verification and generation use fundamentally different cognitive paths. Generating the right answer requires exploring a vast solution space, but verification just asks a binary question: "Given this answer, does the original problem check out?" This asymmetry means even models that make mistakes during generation can reliably catch those mistakes during verification.

Key Insight

Finding a needle in a haystack is hard. But once someone hands you a needle, checking whether it came from that haystack is easy. Self-Verification exploits this fundamental asymmetry between search and validation.

The Core Pattern

Step 1: Generate an answer to the problem.

Step 2: Formulate verification conditions — what must be true if this answer is correct?

Step 3: Test each condition against the answer.

Step 4: If any condition fails, flag the error and regenerate.

Why It Works

Three Verification Strategies

Process

The Verification Pipeline

A systematic four-step process that turns every answer into a testable hypothesis.

1

Generate Initial Answer

The model produces its best answer using standard reasoning — chain-of-thought, decomposition, or whatever approach fits the problem. This answer becomes the hypothesis to be tested.

2

Extract Verification Conditions

Analyze the original problem to identify every condition the answer must satisfy. For math: equations that must balance. For scheduling: constraints that must be met. For code: test cases that must pass. Turn abstract correctness into concrete, checkable conditions.

3

Execute Verification Checks

Systematically test the answer against each condition. Substitute values, check constraints, run sanity tests. Document each check as pass or fail with specific evidence.

4

Accept, Revise, or Regenerate

If all checks pass, accept the answer with confidence. If checks fail, either revise the specific failing component or regenerate from scratch with the failure information guiding the new attempt.

Comparison

See the Difference

Without Verification

Q: A store has a "buy 2, get 1 free" deal. If each item costs $15, how much do you pay for 7 items?

With "buy 2, get 1 free," every 3 items costs $30 (you pay for 2). For 7 items: 7 ÷ 3 = 2 groups with 1 remaining. Cost: 2 × $30 + $15 = $75.

Looks plausible, but is it correct? No verification step — the user must manually check.

VS

With Self-Verification

Same question, with backward check:

Answer: $75

Verification:

• 7 items in groups of 3: [2 paid + 1 free], [2 paid + 1 free], [1 remaining]

• Free items: 2. Paid items: 5.

• Total: 5 × $15 = $75 ✔

• Validation check: $75 for 7 items at $15 each (max $105) — 29% discount from a "buy 2 get 1 free" deal seems correct ✔

In Practice

Self-Verification in Action

Problem

"Write a SQL query to find customers who placed more than 3 orders in the last 30 days but have never returned an item."

Generated query:

SELECT c.customer_id, c.name FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY) GROUP BY c.customer_id HAVING COUNT(o.order_id) > 3

Verification Checklist

Constraint 1: More than 3 orders? ✔ HAVING COUNT > 3

Constraint 2: In the last 30 days? ✔ WHERE clause filters dates

Constraint 3: Never returned an item? ✘ MISSING! No check against returns table.

Verdict: FAIL. Query needs a NOT EXISTS subquery or LEFT JOIN against the returns table to exclude customers with any return history.

Revised: Add AND c.customer_id NOT IN (SELECT customer_id FROM returns)

Problem

"A farmer has a fox, a chicken, and a sack of grain. He needs to cross a river in a boat that can only carry himself and one item. The fox will eat the chicken if left alone, and the chicken will eat the grain."

Proposed solution: 1) Take chicken across. 2) Return. 3) Take fox across. 4) Bring chicken back. 5) Take grain across. 6) Return. 7) Take chicken across.

Step-by-Step Verification

After step 1: Left bank: fox, grain. Right: chicken. ✔ Fox doesn't eat grain.

After step 3: Left: grain. Right: fox, chicken. ✘ Fox eats chicken!

Wait — step 4 brings chicken back. Re-reading: Steps 3-4 happen together (take fox, return with chicken).

After step 4: Left: chicken. Right: fox. ✔

After step 5: Left: chicken. Right: fox, grain. ✔ Fox doesn't eat grain.

After step 7: Left: empty. Right: fox, chicken, grain. ✔ All across safely.

Verdict: PASS. All constraints satisfied at every intermediate state.

Problem

"A car travels at 60 mph for 2.5 hours. How far does it travel in kilometers?"

Generated answer: 60 × 2.5 = 150 miles = 150 × 1.6 = 240 km

Multi-Layer Verification

Backward check: 240 km ÷ 1.609 = 149.2 miles. 149.2 ÷ 2.5 hours = 59.7 mph ≈ 60 mph ✔

Precision check: Exact conversion is 1.609 km/mile, not 1.6. Precise answer: 150 × 1.609 = 241.4 km. The 1.6 approximation introduced a 0.6% error.

Validation check: ~240 km in 2.5 hours at highway speed — consistent with real-world driving. ✔

Verdict: PASS with note. Answer is approximately correct. For engineering contexts, recommend using the precise 1.609 conversion factor.

Implementation

Verification Patterns

Three ways to integrate Self-Verification into your prompting workflow, from simple to robust.

When to Use

Perfect For

Math & Logic Problems

Where answers can be substituted back into equations to confirm correctness — the most natural fit for backward verification.

Constraint Satisfaction

Scheduling, resource allocation, and configuration tasks where every constraint can be independently verified against the solution.

Testable Code Generation

Code outputs that can be tested against requirements — run the code, check the results, verify edge cases.

Clear Correctness Criteria

Any problem with well-defined correctness criteria that can be checked independently of the generation process.

Limitations

Skip It When

Subjective Tasks

Creative writing and design opinions where “correct” is a matter of taste, not verifiable criteria.

Open-Ended Questions

Questions without definite verification criteria — there’s nothing concrete to check the answer against.

Same-Knowledge Blind Spots

When verification requires the same knowledge that produced the error — the model can’t catch what it doesn’t know it got wrong.

Applications

Use Case Showcase

Mathematical Problem Solving

Plug answers back into original equations to catch arithmetic errors, sign mistakes, and misapplied formulas — the most natural fit for backward verification.

SQL and Database Queries

Check that every WHERE clause, JOIN condition, and GROUP BY column actually addresses a requirement from the original question. Catches the "forgot a constraint" error pattern.

Regex Pattern Matching

Test the generated regex against example inputs — both strings that should match and strings that shouldn't. Verification reveals over-matching and under-matching immediately.

Meeting and Event Scheduling

Verify that proposed schedules satisfy every constraint: time zones, availability windows, duration requirements, room capacity, and buffer time between events.

Configuration Files

Validate that generated configs (YAML, JSON, TOML) meet all specified requirements: correct ports, proper environment variables, matching service dependencies, and valid syntax.

Legal and Compliance Checks

Verify that drafted policies or contract clauses satisfy all specified regulatory requirements — checking each compliance point as a constraint against the generated text.

Context

Verification vs. Other Self-Correction

Self-Verification focuses on answer correctness — not quality or style. It answers "is this right?" while other frameworks ask "is this good enough?"

Correctness

Self-Verification

Binary pass/fail checking — does the answer satisfy all constraints? Best for problems with definite right answers.

Quality

Self-Refine

Spectrum-based improvement — is the answer good enough, and how can it be better? Best for writing and creative tasks.

Reliability

Self-Calibration

Confidence assessment — how certain is the model about its answer? Best for flagging uncertain responses before they cause problems.

Catch Errors Before They Matter

Add verification steps to your prompts with our interactive tools, or explore more self-correction frameworks.