Chain-of-Thought Prompting
When AI shows its work, everything changes. Chain-of-Thought prompting unlocks step-by-step reasoning — transforming opaque leaps into transparent, verifiable logic that dramatically reduces errors on complex problems.
Introduced: Chain-of-Thought (CoT) prompting was published in 2022 by Wei et al. at Google Brain. The landmark paper demonstrated that including intermediate reasoning steps in few-shot examples — rather than just input-output pairs — dramatically improved LLM performance on arithmetic, commonsense, and symbolic reasoning benchmarks. The technique was deceptively simple: show the model how to think, not just what to answer, and it would follow suit. This single insight launched an entire family of reasoning-enhancement techniques.
Modern LLM Status: Chain-of-Thought reasoning has been deeply integrated into modern LLM architectures. Claude, GPT-4, and Gemini all employ internal reasoning processes inspired by CoT. Many models now “think” step-by-step by default for complex queries. However, explicitly prompting for Chain-of-Thought remains valuable when you need visible, auditable reasoning trails, when working with smaller models that do not reason automatically, or when tackling problems where the default reasoning depth is insufficient. CoT is the foundation technique from which nearly all modern prompt-based reasoning methods descend.
Make the Thinking Visible
When you ask a language model a complex question, it normally generates an answer in a single leap — internally compressing all reasoning into a final token prediction. This works for simple tasks, but for multi-step problems the model can silently skip steps, conflate variables, or lose track of intermediate results. The error hides inside the black box.
Chain-of-Thought changes the game by externalizing the reasoning process. By providing few-shot examples that include intermediate steps — not just questions and answers, but questions, step-by-step reasoning, and then answers — you teach the model to generate its own reasoning chain before committing to a conclusion. Each generated token in the chain becomes context for the next, creating a scaffold that guides the model toward the correct answer.
Think of it like showing your work on a math exam. The answer alone might be right or wrong, but the work reveals exactly where the logic holds or breaks. Chain-of-Thought gives LLMs the same “scratch paper” that humans rely on for complex problem-solving.
When a model generates intermediate reasoning tokens, each step constrains the probability distribution for the next step. A correct “Step 1” makes a correct “Step 2” far more likely — the reasoning chain creates a self-reinforcing path toward accuracy. Without these intermediate tokens, the model must compress all reasoning into a single forward pass, which is where complex problems overwhelm its capacity and errors emerge.
The Chain-of-Thought Process
Three stages from problem to reasoned answer
Provide Reasoning Demonstrations
Include few-shot examples in your prompt that demonstrate not just the final answer, but every intermediate reasoning step. Each example should walk through the problem methodically — identifying relevant information, performing intermediate calculations or inferences, and arriving at the answer through visible logic.
“Q: Roger has 5 tennis balls. He buys 2 more cans of 3 tennis balls each. How many does he have now? A: Roger started with 5 balls. 2 cans of 3 balls each means 2 × 3 = 6 new balls. 5 + 6 = 11. The answer is 11.”
Present the Target Problem
After one or more reasoning demonstrations, present your actual question in the same format. The model recognizes the pattern from your examples and generates its own chain of reasoning steps before producing a final answer. The demonstrations serve as a template that the model follows structurally.
“Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many do they have?” — The model now knows to show its work before answering.
Model Generates Reasoning Chain and Answer
The model produces a step-by-step reasoning chain mirroring the structure of your demonstrations, then derives its final answer from the chain. Each intermediate step is visible, auditable, and debuggable. If the answer is wrong, you can trace exactly where the reasoning went off track and refine your approach accordingly.
“A: The cafeteria started with 23 apples. They used 20, leaving 23 − 20 = 3 apples. They bought 6 more, so 3 + 6 = 9. The answer is 9.”
See the Difference
Why showing reasoning steps produces dramatically better answers
Standard Prompting
A store sells notebooks for $4 each. Sarah buys 3 notebooks and pays with a $20 bill. She then uses her change to buy as many pencils as possible at $1.50 each. How many pencils can she buy?
Sarah can buy 6 pencils.
Chain-of-Thought
Step 1: Calculate the cost of notebooks.
3 notebooks × $4 each = $12.
Step 2: Calculate Sarah’s change.
$20 − $12 = $8 remaining.
Step 3: Determine how many pencils she can buy.
$8 ÷ $1.50 per pencil = 5.33 pencils.
Since she cannot buy a fraction of a pencil, she can buy 5 pencils.
Sarah can buy 5 pencils. She spends $12 on notebooks, receives $8 in change, and each pencil costs $1.50 — so 5 pencils cost $7.50 with $0.50 left over.
Practice Responsible AI
Always verify AI-generated content before use. AI systems can produce confident but incorrect responses. When using AI professionally, transparent disclosure is both best practice and increasingly a legal requirement.
48 US states now require AI transparency in key areas. Critical thinking remains your strongest tool against misinformation.
Chain-of-Thought in Action
See how step-by-step reasoning improves different problem types
“A farmer has 3 fields. The first field produces 240 bushels of wheat, the second produces 180, and the third produces 310. He needs to reserve 15% of his total harvest for next year’s seeds. How many bushels can he sell?”
Step 1: Calculate total harvest.
240 + 180 + 310 = 730 bushels total.
Step 2: Calculate seed reserve (15% of total).
730 × 0.15 = 109.5 bushels reserved for seeds.
Step 3: Calculate sellable bushels.
730 − 109.5 = 620.5 bushels available to sell.
Answer: The farmer can sell 620.5 bushels of wheat (or 620 whole bushels if partial bushels cannot be sold).
“Three friends — Alice, Bob, and Carol — each ordered a different drink: coffee, tea, and juice. Alice did not order coffee. The person who ordered tea is sitting between the other two. Bob is sitting on the far left. What did each person order?”
Step 1: Establish the seating arrangement.
Bob is on the far left. The tea drinker sits between the other two, so the tea drinker is in the middle seat.
Step 2: Determine who sits in the middle.
Since Bob is on the far left, the middle seat is either Alice or Carol. The person on the far right is the remaining friend.
Step 3: Apply the constraint about Alice.
Alice did not order coffee. If Alice were in the middle, she would drink tea (not coffee) — that works. If Alice were on the right, she could drink tea or juice but not coffee.
Step 4: Test Alice in the middle seat.
If Alice is in the middle, she drinks tea. Carol is on the right. Bob is on the left. Bob and Carol drink coffee and juice in some order. No constraints prevent this. This arrangement is valid.
Step 5: Verify by testing Carol in the middle.
If Carol is in the middle, Carol drinks tea. Alice is on the right. Bob is on the left. Bob and Alice drink coffee and juice. But Alice cannot order coffee, so Alice drinks juice and Bob drinks coffee. This also works, giving us two possible solutions. However, re-reading the puzzle: “The person who ordered tea is sitting between the other two” — both solutions satisfy all constraints.
Answer: With the given constraints, there are two valid solutions: (1) Bob has coffee, Alice has tea (middle), Carol has juice; or (2) Bob has coffee, Carol has tea (middle), Alice has juice. Both satisfy all stated conditions.
“A company’s revenue grew 20% in Q1, declined 10% in Q2, grew 15% in Q3, and declined 5% in Q4. If they started the year at $1 million in quarterly revenue, what was their Q4 revenue, and did they end the year higher or lower than they started?”
Step 1: Calculate Q1 revenue.
Starting at $1,000,000 with 20% growth: $1,000,000 × 1.20 = $1,200,000.
Step 2: Calculate Q2 revenue.
Q1 revenue with 10% decline: $1,200,000 × 0.90 = $1,080,000.
Step 3: Calculate Q3 revenue.
Q2 revenue with 15% growth: $1,080,000 × 1.15 = $1,242,000.
Step 4: Calculate Q4 revenue.
Q3 revenue with 5% decline: $1,242,000 × 0.95 = $1,179,900.
Step 5: Compare to starting revenue.
$1,179,900 vs. $1,000,000 starting revenue — that is $179,900 higher, or about 18% above where they started.
Answer: Q4 revenue was $1,179,900. Despite two quarters of decline, the company ended the year approximately 18% higher than where they started.
When to Use Chain-of-Thought
Best for problems that require multi-step reasoning
Perfect For
Multi-step calculations, percentages, unit conversions, and word problems that require combining several operations in sequence.
Questions that require chaining everyday knowledge together — understanding cause and effect, physical relationships, or social dynamics across multiple steps.
Constraint satisfaction, deductive logic, pattern recognition, and formal reasoning tasks where each inference builds on the previous one.
Tracing through code logic, identifying where a process breaks, or systematically evaluating potential root causes of a failure.
Skip It When
Questions with single-step answers — “What is the capital of France?” does not benefit from step-by-step reasoning.
Writing stories, poetry, or brainstorming sessions where freeform generation is the goal — structured reasoning steps can constrain creative output.
When speed matters more than accuracy — CoT generates significantly more tokens, increasing both response time and API costs for each query.
Use Cases
Where Chain-of-Thought delivers the most value
Financial Calculations
Walk through compound interest, tax computations, investment returns, and budgeting problems where each calculation feeds into the next.
Code Debugging
Trace execution paths step by step, identify where variable states diverge from expectations, and systematically isolate the root cause of bugs.
Scientific Analysis
Work through experimental data interpretation, hypothesis testing, and multi-variable analysis where each conclusion depends on prior findings.
Educational Tutoring
Demonstrate problem-solving methods to students by making every reasoning step explicit and instructive, turning answers into learning opportunities.
Decision Analysis
Evaluate complex decisions by reasoning through criteria, trade-offs, and consequences step by step, producing transparent and well-supported recommendations.
Legal and Policy Reasoning
Trace through regulatory requirements, case law, and policy implications step by step to build well-reasoned compliance assessments or legal arguments.
Where Chain-of-Thought Fits
CoT is the foundation technique that launched modern prompt-based reasoning
Chain-of-Thought was the breakthrough that proved prompting alone could unlock reasoning capabilities in large language models. Before Wei et al.’s 2022 paper, the prevailing assumption was that LLMs needed fine-tuning or architectural changes to reason well. CoT showed that simply demonstrating reasoning in the prompt was enough — and that insight spawned an entire research ecosystem of reasoning-enhancement techniques: Zero-Shot CoT, Self-Consistency, Tree of Thought, Self-Ask, and dozens more.
Related Techniques
Explore techniques that extend Chain-of-Thought reasoning
Make Your Reasoning Visible
Try Chain-of-Thought reasoning on your own complex problems or build step-by-step prompts with our interactive tools.