Self-Refine
Your AI's inner editor — generate, critique, and polish outputs through iterative self-improvement cycles.
Introduced: Self-Refine was published in 2023 by Madaan et al. The paper demonstrated that a single LLM can substantially improve its own outputs through iterative generate-critique-revise cycles — without any external feedback, fine-tuning, or additional training data.
Modern LLM Status: Self-Refine remains a highly practical and widely-used prompting technique. Modern LLMs do not natively perform iterative self-improvement — they produce a single response unless explicitly instructed to critique and revise. The generate-critique-refine pattern is one of the most effective ways to improve output quality in 2025-2026, especially for writing, documentation, and creative tasks.
The Draft-Critique-Revise Loop
Every great writer knows: writing is rewriting. Self-Refine applies this principle to AI by separating generation from evaluation. The model first produces a draft, then switches roles to become its own critic, and finally revises based on its own feedback — repeating until the output meets quality standards.
What makes Self-Refine powerful is a surprising asymmetry: language models are often better at identifying problems in text than avoiding those problems during generation. A model that writes a vague paragraph can reliably point out that it's vague when asked to review it. Self-Refine exploits this gap between creation and evaluation to systematically ratchet up quality.
Models can critique better than they create. Self-Refine turns this weakness into a strength — using the same model as both writer and editor to achieve quality neither role could reach alone.
Generator: Creates the initial output — a first draft with no pressure for perfection.
Critic: Reviews the draft against quality criteria, identifying specific weaknesses with actionable feedback.
Refiner: Incorporates the feedback to produce an improved version, preserving strengths while fixing weaknesses.
Repeat the Critic → Refiner cycle until satisfied or diminishing returns are reached.
The Psychology of Self-Editing
Generation-Evaluation Gap
Models generate text token-by-token but evaluate holistically. This means they can see structural issues, tone mismatches, and logical gaps in completed text that were invisible during sequential generation.
Diminishing Error Density
Each refinement cycle removes the most obvious issues first, progressively raising the quality floor. The first revision typically captures 60-80% of possible improvements, making even a single cycle highly valuable.
Focused Attention
During generation, the model splits attention across content, style, accuracy, and coherence simultaneously. Critique-then-refine lets it focus on one quality dimension at a time, leading to more thorough improvement.
The Self-Refine Pipeline
A structured cycle that turns rough drafts into polished outputs through targeted self-critique.
Generate Initial Output
The model produces a complete first response to the task. This draft serves as raw material — no pressure for perfection, just a starting point that captures the core ideas and structure.
Self-Critique with Criteria
The model reviews its own output against specific quality dimensions: accuracy, clarity, completeness, tone, structure, and task alignment. Each identified weakness comes with a concrete, actionable improvement suggestion.
Refine and Incorporate
The model produces an improved version that directly addresses each piece of feedback while preserving existing strengths. The revision is targeted — fixing what's broken without rewriting what works.
Evaluate or Iterate
Check if the output now meets quality standards. If significant issues remain, cycle back to step 2. Typically, 2-3 cycles yield the best cost-to-quality ratio — beyond that, improvements become marginal.
Feedback Approaches
Different critique strategies suit different tasks. Choose the feedback lens that matches your quality goals.
Criteria-Based
Score the output against explicit dimensions (accuracy: 7/10, clarity: 5/10, completeness: 8/10). Gives structured, measurable feedback with clear improvement priorities.
Role-Based
Critique from a specific persona — a skeptical customer, a domain expert, or the target audience. This surfaces blind spots the generator persona might miss.
Error-Focused
Systematically hunt for factual errors, logical fallacies, unsupported claims, and internal contradictions. Best for technical or factual content where accuracy is paramount.
See the Difference
First Draft (No Refinement)
Task: Explain microservices to a non-technical manager
Microservices is an architectural style that structures an application as a collection of services that are independently deployable, loosely coupled, organized around business capabilities, and owned by small teams. Each service implements a single business capability and communicates through well-defined APIs, typically using HTTP/REST or message queues.
Problem: Too technical, jargon-heavy, and doesn't connect to what a manager actually cares about (cost, speed, risk).
After 2 Self-Refine Cycles
Task: Explain microservices to a non-technical manager
Think of our current app as one large machine where everything is connected — if one part breaks, the whole thing can go down. Microservices means rebuilding it as a set of independent smaller machines, each handling one specific job (payments, user accounts, notifications).
Why it matters to you: Teams can update the payment system without risking the login system. New features ship faster because teams work in parallel. When Black Friday traffic spikes, we scale only the parts that need it instead of duplicating the entire application.
Self-Refine in Action
"Write a cold outreach email to a VP of Engineering about our monitoring tool."
Draft: "Dear VP, I'm reaching out from MonitorPro. Our platform provides comprehensive observability including distributed tracing, log aggregation, and real-time alerting with 99.9% uptime SLA. We serve 500+ enterprise clients. Would you be interested in a demo?"
Problems identified:
• Opens with company, not recipient's problem — no reason to keep reading
• Feature-dump instead of value proposition
• "Dear VP" is impersonal — shows no research
• Ask is too big (demo) for a first contact
"Hi [Name], I noticed your team is scaling from monolith to microservices (congrats on the Series B). That transition usually means a 3x spike in debugging time as observability fragments across services.
We helped [Similar Company]'s platform team cut incident response from 45 minutes to 8 by unifying their monitoring before the complexity hit.
Worth a 15-minute chat about what you're seeing on your end?"
"Document this API endpoint: POST /api/users"
Draft: "Creates a new user. Accepts name, email, and password. Returns user object with ID."
Problems identified:
• No request/response format examples
• Missing error cases (duplicate email, invalid password format)
• Doesn't mention authentication requirements
• No rate limiting or validation rules mentioned
• No HTTP status codes
POST /api/users — Create a new user account.
Auth: None (public registration endpoint)
Rate limit: 5 requests/minute per IP
Body: { name: string (2-50 chars), email: string (valid format, unique), password: string (min 8 chars, 1 uppercase, 1 number) }
201: { id, name, email, created_at } — user created
409: { error: "email_taken" } — duplicate email
422: { errors: [...] } — validation failures
"Write the opening paragraph of a mystery novel set in a lighthouse."
Draft: "The lighthouse stood tall against the stormy sky. Sarah arrived at the old building on a dark and rainy night. She had been sent to investigate the disappearance of the lighthouse keeper. The place was creepy and she felt afraid."
Problems identified:
• Opens with weather cliché ("dark and stormy night")
• Tells instead of shows emotion ("she felt afraid")
• Passive protagonist — things happen to her, she doesn't act
• No sensory details beyond visual
• Dumps exposition instead of creating intrigue
"The logbook entry for November 12th was the last one written in Thomas Marsh's hand. After that, someone else had taken over — the letters cramped, the ink darker, the weather observations growing increasingly wrong. Sarah traced her finger along the final genuine entry: 'Light operational. Seas moderate. Nothing unusual.' She looked up at the spiral staircase disappearing into the lamp room above. The beam hadn't turned since Tuesday."
Perfect For
Emails, reports, and content marketing where quality, tone, and polish matter — refinement cycles catch what first drafts miss.
Documentation that needs to cover edge cases, error scenarios, and prerequisites — critique reveals gaps you didn’t notice.
Creative work where first drafts are rough and benefit from structural critique before final delivery.
Any output where you’d normally iterate manually — Self-Refine automates the revision loop so you don’t have to.
Skip It When
When fast output matters more than perfection — each refinement cycle adds latency and token cost.
Self-Refine can’t catch errors it doesn’t recognize as errors — tasks requiring ground-truth checks need a different approach.
Yes/no questions and straightforward lookups where iteration adds no value to an already correct answer.
Use Case Showcase
Marketing Copy
Turn flat product descriptions into compelling copy — the critique step catches missing benefits, weak CTAs, and mismatched tone before they reach customers.
Technical Writing
API docs, README files, and internal wikis improve dramatically when the model critiques for missing examples, unclear prerequisites, and assumed knowledge.
Code Review Prep
Before submitting code for review, have the model critique its own generated code for naming conventions, edge cases, and design pattern violations.
Executive Summaries
Condense complex reports into executive summaries, then refine for jargon removal, action-oriented language, and appropriate detail level for the audience.
Educational Content
Lesson plans, tutorials, and study guides benefit from critique that checks for assumed prior knowledge, scaffolding gaps, and engagement-killing monotony.
Interview Preparation
Draft interview answers, then refine by critiquing for specificity, STAR format compliance, and relevance to the role — turning generic answers into memorable ones.
The Refinement Family
Self-Refine is the simplest iterative improvement method — no external tools, no memory, no multi-agent complexity. Just a model talking to itself.
Self-Refine
Pure self-critique using only the model's own judgment — fast, simple, and effective for writing and style tasks.
CRITIC
Extends self-critique with external tools (search, calculators) for factual verification beyond the model's knowledge.
Reflexion
Adds persistent memory so insights from past failures carry forward to future attempts — learning across iterations.
Related Techniques
Polish Every Output
Build Self-Refine cycles into your prompts with our interactive tools, or explore more self-correction frameworks.