Self-Refine

Technique Context: 2023

Introduced: Self-Refine was published in 2023 by Madaan et al. The paper demonstrated that a single LLM can substantially improve its own outputs through iterative generate-critique-revise cycles — without any external feedback, fine-tuning, or additional training data.

Modern LLM Status: Self-Refine remains a highly practical and widely-used prompting technique. Modern LLMs do not natively perform iterative self-improvement — they produce a single response unless explicitly instructed to critique and revise. The generate-critique-refine pattern is one of the most effective ways to improve output quality in 2025-2026, especially for writing, documentation, and creative tasks.

The Concept

The Draft-Critique-Revise Loop

Every great writer knows: writing is rewriting. Self-Refine applies this principle to AI by separating generation from evaluation. The model first produces a draft, then switches roles to become its own critic, and finally revises based on its own feedback — repeating until the output meets quality standards.

What makes Self-Refine powerful is a surprising asymmetry: language models are often better at identifying problems in text than avoiding those problems during generation. A model that writes a vague paragraph can reliably point out that it's vague when asked to review it. Self-Refine exploits this gap between creation and evaluation to systematically ratchet up quality.

Key Insight

Models can critique better than they create. Self-Refine turns this weakness into a strength — using the same model as both writer and editor to achieve quality neither role could reach alone.

The Three Roles

Generator: Creates the initial output — a first draft with no pressure for perfection.

Critic: Reviews the draft against quality criteria, identifying specific weaknesses with actionable feedback.

Refiner: Incorporates the feedback to produce an improved version, preserving strengths while fixing weaknesses.

Repeat the Critic → Refiner cycle until satisfied or diminishing returns are reached.

Why It Works

The Psychology of Self-Editing

Generation-Evaluation Gap

Models generate text token-by-token but evaluate holistically. This means they can see structural issues, tone mismatches, and logical gaps in completed text that were invisible during sequential generation.

Diminishing Error Density

Each refinement cycle removes the most obvious issues first, progressively raising the quality floor. The first revision typically captures 60-80% of possible improvements, making even a single cycle highly valuable.

Focused Attention

During generation, the model splits attention across content, style, accuracy, and coherence simultaneously. Critique-then-refine lets it focus on one quality dimension at a time, leading to more thorough improvement.

Process

The Self-Refine Pipeline

A structured cycle that turns rough drafts into polished outputs through targeted self-critique.

1

Generate Initial Output

The model produces a complete first response to the task. This draft serves as raw material — no pressure for perfection, just a starting point that captures the core ideas and structure.

2

Self-Critique with Criteria

The model reviews its own output against specific quality dimensions: accuracy, clarity, completeness, tone, structure, and task alignment. Each identified weakness comes with a concrete, actionable improvement suggestion.

3

Refine and Incorporate

The model produces an improved version that directly addresses each piece of feedback while preserving existing strengths. The revision is targeted — fixing what's broken without rewriting what works.

4

Evaluate or Iterate

Check if the output now meets quality standards. If significant issues remain, cycle back to step 2. Typically, 2-3 cycles yield the best cost-to-quality ratio — beyond that, improvements become marginal.

Strategies

Feedback Approaches

Different critique strategies suit different tasks. Choose the feedback lens that matches your quality goals.

Criteria-Based

Score the output against explicit dimensions (accuracy: 7/10, clarity: 5/10, completeness: 8/10). Gives structured, measurable feedback with clear improvement priorities.

Role-Based

Critique from a specific persona — a skeptical customer, a domain expert, or the target audience. This surfaces blind spots the generator persona might miss.

Error-Focused

Systematically hunt for factual errors, logical fallacies, unsupported claims, and internal contradictions. Best for technical or factual content where accuracy is paramount.

Comparison

See the Difference

First Draft (No Refinement)

Task: Explain microservices to a non-technical manager

Microservices is an architectural style that structures an application as a collection of services that are independently deployable, loosely coupled, organized around business capabilities, and owned by small teams. Each service implements a single business capability and communicates through well-defined APIs, typically using HTTP/REST or message queues.

Problem: Too technical, jargon-heavy, and doesn't connect to what a manager actually cares about (cost, speed, risk).

VS

After 2 Self-Refine Cycles

Task: Explain microservices to a non-technical manager

Think of our current app as one large machine where everything is connected — if one part breaks, the whole thing can go down. Microservices means rebuilding it as a set of independent smaller machines, each handling one specific job (payments, user accounts, notifications).

Why it matters to you: Teams can update the payment system without risking the login system. New features ship faster because teams work in parallel. When Black Friday traffic spikes, we scale only the parts that need it instead of duplicating the entire application.

In Practice

Self-Refine in Action

Email Drafting: Sales Outreach

Step 1: Generate

"Write a cold outreach email to a VP of Engineering about our monitoring tool."

Draft: "Dear VP, I'm reaching out from MonitorPro. Our platform provides comprehensive observability including distributed tracing, log aggregation, and real-time alerting with 99.9% uptime SLA. We serve 500+ enterprise clients. Would you be interested in a demo?"

Step 2: Critique

Problems identified:

• Opens with company, not recipient's problem — no reason to keep reading

• Feature-dump instead of value proposition

• "Dear VP" is impersonal — shows no research

• Ask is too big (demo) for a first contact

Step 3: Refined Output

"Hi [Name], I noticed your team is scaling from monolith to microservices (congrats on the Series B). That transition usually means a 3x spike in debugging time as observability fragments across services.

We helped [Similar Company]'s platform team cut incident response from 45 minutes to 8 by unifying their monitoring before the complexity hit.

Worth a 15-minute chat about what you're seeing on your end?"

Code Documentation: API Endpoint

Step 1: Generate

"Document this API endpoint: POST /api/users"

Draft: "Creates a new user. Accepts name, email, and password. Returns user object with ID."

Step 2: Critique

Problems identified:

• No request/response format examples

• Missing error cases (duplicate email, invalid password format)

• Doesn't mention authentication requirements

• No rate limiting or validation rules mentioned

• No HTTP status codes

Step 3: Refined Output

POST /api/users — Create a new user account.

Auth: None (public registration endpoint)

Rate limit: 5 requests/minute per IP

Body: { name: string (2-50 chars), email: string (valid format, unique), password: string (min 8 chars, 1 uppercase, 1 number) }

201: { id, name, email, created_at } — user created

409: { error: "email_taken" } — duplicate email

422: { errors: [...] } — validation failures

Creative Writing: Story Opening

Step 1: Generate

"Write the opening paragraph of a mystery novel set in a lighthouse."

Draft: "The lighthouse stood tall against the stormy sky. Sarah arrived at the old building on a dark and rainy night. She had been sent to investigate the disappearance of the lighthouse keeper. The place was creepy and she felt afraid."

Step 2: Critique

Problems identified:

• Opens with weather cliché ("dark and stormy night")

• Tells instead of shows emotion ("she felt afraid")

• Passive protagonist — things happen to her, she doesn't act

• No sensory details beyond visual

• Dumps exposition instead of creating intrigue

Step 3: Refined Output

"The logbook entry for November 12th was the last one written in Thomas Marsh's hand. After that, someone else had taken over — the letters cramped, the ink darker, the weather observations growing increasingly wrong. Sarah traced her finger along the final genuine entry: 'Light operational. Seas moderate. Nothing unusual.' She looked up at the spiral staircase disappearing into the lamp room above. The beam hadn't turned since Tuesday."

When to Use

Perfect For

Polish-Critical Writing

Emails, reports, and content marketing where quality, tone, and polish matter — refinement cycles catch what first drafts miss.

Thorough Documentation

Documentation that needs to cover edge cases, error scenarios, and prerequisites — critique reveals gaps you didn’t notice.

Rough-Draft Creative Work

Creative work where first drafts are rough and benefit from structural critique before final delivery.

Manual Iteration Replacement

Any output where you’d normally iterate manually — Self-Refine automates the revision loop so you don’t have to.

Limitations

Skip It When

Speed Over Polish

When fast output matters more than perfection — each refinement cycle adds latency and token cost.

External Verification Needed

Self-Refine can’t catch errors it doesn’t recognize as errors — tasks requiring ground-truth checks need a different approach.

Simple Factual Lookups

Yes/no questions and straightforward lookups where iteration adds no value to an already correct answer.

Applications

Use Case Showcase

Marketing Copy

Turn flat product descriptions into compelling copy — the critique step catches missing benefits, weak CTAs, and mismatched tone before they reach customers.

Technical Writing

API docs, README files, and internal wikis improve dramatically when the model critiques for missing examples, unclear prerequisites, and assumed knowledge.

Code Review Prep

Before submitting code for review, have the model critique its own generated code for naming conventions, edge cases, and design pattern violations.

Executive Summaries

Condense complex reports into executive summaries, then refine for jargon removal, action-oriented language, and appropriate detail level for the audience.

Educational Content

Lesson plans, tutorials, and study guides benefit from critique that checks for assumed prior knowledge, scaffolding gaps, and engagement-killing monotony.

Interview Preparation

Draft interview answers, then refine by critiquing for specificity, STAR format compliance, and relevance to the role — turning generic answers into memorable ones.

Context

The Refinement Family

Self-Refine is the simplest iterative improvement method — no external tools, no memory, no multi-agent complexity. Just a model talking to itself.

Internal Only

Pure self-critique using only the model's own judgment — fast, simple, and effective for writing and style tasks.

Tool-Augmented

CRITIC

Extends self-critique with external tools (search, calculators) for factual verification beyond the model's knowledge.

Memory-Enhanced

Reflexion

Adds persistent memory so insights from past failures carry forward to future attempts — learning across iterations.

Related Techniques

Tool-Augmented CRITIC When self-critique isn't enough — CRITIC adds external tools like search engines and calculators to verify factual claims the model can't self-assess.

With Memory Reflexion Adds persistent memory to the refinement loop — insights from past failures carry forward so the model doesn't repeat the same mistakes.

Verification Self-Verification Focuses specifically on answer correctness rather than quality — works backward from the answer to check if it holds up under scrutiny.

Polish Every Output

Build Self-Refine cycles into your prompts with our interactive tools, or explore more self-correction frameworks.

Prompt Builder All Foundations