Reasoning via Planning (RAP)

Technique Context: 2023

Introduced: Reasoning via Planning (RAP) was published in 2023 by Hao et al. The technique repurposes the LLM in a dual role: as a world model that predicts the consequences of reasoning steps (states), and as a reasoning agent that proposes next actions. Combined with Monte Carlo Tree Search (MCTS) — the same algorithm family behind AlphaGo — RAP enables strategic exploration of the reasoning space. The model doesn’t just follow one chain of thought; it explores multiple paths, backtracks from dead ends, and selects the most promising reasoning trajectory. RAP outperformed standard Chain-of-Thought on math, logic, and commonsense reasoning tasks.

Modern LLM Status: Reasoning via Planning (RAP) was ahead of its time, combining planning algorithms with LLM reasoning. In 2026, the MCTS-based approach has been adopted by agent frameworks like LATS (Language Agent Tree Search). The core insight — that reasoning can be treated as a planning problem with search — is now foundational to agentic AI systems. While the computational overhead of full MCTS remains significant, the principle of deliberate, search-guided reasoning has influenced how modern models handle complex multi-step problems internally.

The Core Insight

Reasoning as Strategic Search

Standard Chain-of-Thought prompting produces a single reasoning path: the model starts at the beginning, thinks step by step, and arrives at an answer. But what if that first path leads to a dead end? What if the model makes an error in step two that corrupts everything downstream? Standard CoT has no mechanism to backtrack, explore alternatives, or compare different reasoning strategies.

RAP reframes reasoning as a search problem. Each reasoning step is a “move” in a game tree. The LLM acts as the world model (predicting what state results from each move) and as the agent (choosing which move to make). MCTS then orchestrates the search: it explores multiple branches, evaluates their promise through simulated rollouts, and concentrates computational effort on the most promising paths.

Think of it like a chess grandmaster who doesn’t just play the first good-looking move. They mentally explore several candidate moves, simulate the opponent’s responses, evaluate resulting positions, and then choose the move with the best expected outcome — except here, the “opponent” is the complexity of the problem itself.

The LLM’s Dual Role

RAP’s innovation is using the same LLM for two distinct functions simultaneously. As the world model, it predicts: “If I take this reasoning step, what will the resulting state look like?” As the agent, it decides: “Given the current state, what reasoning step should I try next?” This self-play dynamic, guided by MCTS, allows the model to deliberate rather than merely generate — trading speed for substantially better accuracy on hard problems.

The RAP Process

Four stages from problem to search-optimized solution

1

Define the Reasoning Problem as a State Space

Frame the problem in terms of states (what the model knows at each point) and actions (reasoning steps it can take). The initial state is the problem statement; the goal state is a valid, well-supported answer. Each intermediate state represents the accumulated reasoning so far.

Example

Problem: “A farmer has a wolf, a goat, and a cabbage. He needs to cross a river in a boat that can carry only the farmer and one item. If left alone, the wolf eats the goat, and the goat eats the cabbage. How does the farmer get everything across?”
Initial state: {wolf, goat, cabbage, farmer} on left bank. Goal: All on right bank, nothing eaten.

2

Generate Candidate Actions (Agent Role)

At each state, the LLM proposes multiple possible next reasoning steps. Unlike Chain-of-Thought which commits to one path, RAP generates several candidate actions — different ways to advance the reasoning. The model considers what moves are available given the current state’s constraints.

Example

Candidate actions from initial state:
Action A: Take the wolf across first
Action B: Take the goat across first
Action C: Take the cabbage across first
Each action leads to a different state that needs evaluation.

3

Simulate and Evaluate via MCTS (World Model Role)

For each candidate action, the LLM predicts the resulting state (world model role). MCTS then simulates rollouts from each resulting state — continuing the reasoning to see where each path leads. States that lead toward the goal get higher value scores; dead ends get penalized. The search tree is built iteratively, with each simulation updating the value estimates.

Example

Simulation results:
Action A (wolf first) → goat eats cabbage on left bank → Dead end, score: 0
Action B (goat first) → wolf and cabbage safe on left → Promising, score: 0.7
Action C (cabbage first) → wolf eats goat on left bank → Dead end, score: 0
MCTS concentrates further exploration on the Action B branch.

4

Select the Optimal Reasoning Path

After sufficient exploration, MCTS selects the action with the highest value at each step, tracing a complete path from initial state to goal. The result is a reasoning chain that has been tested against alternatives — dead ends were explored and rejected, and the chosen path survived competitive evaluation against other candidates.

Example

Optimal path found: Take goat across → Return alone → Take wolf across → Bring goat back → Take cabbage across → Return alone → Take goat across. All items arrive safely. This solution was found by eliminating dead-end branches through search, not by hoping the first reasoning attempt was correct. Always verify AI-generated solutions against the problem constraints.

See the Difference

Why search-guided reasoning outperforms linear chain-of-thought

Linear Reasoning

“Let me think step by step. First, I’ll take the wolf across... wait, that leaves the goat with the cabbage. Okay, let me try taking the cabbage first... no, that leaves the wolf with the goat. Let me try the goat... yes, then I’ll come back and take the wolf...”

Result

The model stumbles through trial-and-error within a single generation pass. It may find the answer, or it may get confused mid-chain and produce an incorrect solution that it cannot backtrack from.

Linear, no backtracking, errors compound through the chain

VS

Search-Guided Reasoning

MCTS explores three branches simultaneously: wolf-first (dead end detected in 1 step), cabbage-first (dead end detected in 1 step), goat-first (promising — expand further). The search concentrates on the goat-first branch, exploring sub-branches for the second move, pruning failures, until a complete valid path emerges.

Result

Systematic exploration finds the optimal 7-step solution. Dead ends are detected and abandoned early. The final reasoning path is selected from a tree of evaluated alternatives, not generated in a single pass.

Exploratory, self-correcting, provably optimal among explored paths

RAP in Action

See how search-guided reasoning tackles problems that stump linear approaches

Multi-Step Math Problem

Problem

“A store sells notebooks for $4 each and pens for $2 each. Maria bought some notebooks and pens, spending exactly $26. She bought more pens than notebooks. How many of each did she buy?”

RAP Search Process

Branch 1: Try 1 notebook ($4) + remaining $22 in pens (11 pens). Check: 11 > 1? Yes. Valid solution: (1, 11).

Branch 2: Try 2 notebooks ($8) + remaining $18 in pens (9 pens). Check: 9 > 2? Yes. Valid solution: (2, 9).

Branch 3: Try 3 notebooks ($12) + remaining $14 in pens (7 pens). Check: 7 > 3? Yes. Valid solution: (3, 7).

Branch 4: Try 4 notebooks ($16) + remaining $10 in pens (5 pens). Check: 5 > 4? Yes. Valid solution: (4, 5).

Branch 5: Try 5 notebooks ($20) + remaining $6 in pens (3 pens). Check: 3 > 5? No. Invalid — prune.

RAP result: Four valid solutions exist. Unlike CoT which might stop at the first one found, RAP’s search reveals the complete solution space. If the problem requires a unique answer, the ambiguity itself is valuable information. Always verify mathematical solutions by substituting back into the original constraints.

Commonsense Reasoning Challenge

Problem

“You need to move a grand piano from a second-floor apartment to a ground-floor apartment across the street. You have two professional movers, a moving truck, furniture blankets, and a dolly. What is the safest and most efficient approach?”

RAP Search Process

Branch A — Stairs approach: Wrap piano in blankets → Tilt onto dolly → Navigate down stairwell → State evaluation: High risk (tight turns, weight on stairs), possible damage. Score: 0.4

Branch B — Window/crane approach: Remove window frame → Rig crane/hoist to lower piano → State evaluation: Requires equipment not listed (crane). Infeasible with given resources. Score: 0.1

Branch C — Disassemble approach: Check if piano has removable legs/lid → State evaluation: Grand piano legs are removable, significantly reducing dimensions. Score: 0.8

Expanding Branch C (highest score): Remove legs and wrap them separately → Wrap piano body in blankets → Two movers carry body down stairs (lighter, more compact) → Load onto dolly at ground level → Roll to truck → Drive across street → Reverse process at destination.

RAP result: Branch C selected as optimal. The search identified that partial disassembly (removable legs) transforms the problem from extremely difficult to manageable with the given resources. For actual piano moving, always consult professional movers who specialize in piano transport.

Logical Deduction Puzzle

Problem

“Three friends — Alice, Bob, and Carol — each have a different pet (cat, dog, fish) and a different favorite color (red, blue, green). Alice doesn’t like red. The person with the cat likes blue. Bob doesn’t have a fish. Carol likes green. Who has which pet and which color?”

RAP Search Process

Constraint propagation (state initialization):
Clue 4: Carol = green. Remaining colors: {red, blue} for {Alice, Bob}.
Clue 1: Alice ≠ red. Therefore Alice = blue, Bob = red.

Branch from color assignments:
Clue 2: Cat owner = blue = Alice. Therefore Alice has the cat.
Clue 3: Bob ≠ fish. Remaining pets: {dog, fish} for {Bob, Carol}. Bob = dog, Carol = fish.

Verification rollout: Alice (cat, blue), Bob (dog, red), Carol (fish, green). Check all constraints: Alice not red (pass), cat owner is blue/Alice (pass), Bob no fish (pass), Carol green (pass). All constraints satisfied.

RAP result: The search confirmed a unique solution with no backtracking needed — constraint propagation from the color assignments resolved everything. In harder puzzles with ambiguous early choices, MCTS would explore multiple assignment branches before finding the valid one. Always verify logical deductions by checking all original constraints against the final answer.

When to Use RAP

Best for complex problems where exploration and backtracking improve outcomes

Perfect For

Constraint Satisfaction Problems

Logic puzzles, scheduling, and planning problems where wrong early choices lead to dead ends that require backtracking to resolve.

Mathematical Reasoning

Multi-step math problems where different solution strategies exist — RAP can explore algebraic, geometric, and numerical approaches in parallel.

Strategic Decision-Making

Problems where the best immediate action depends on looking several steps ahead — like game playing, resource allocation, or negotiation strategy.

Agentic AI Workflows

Building AI agents that need to plan sequences of actions in environments with uncertain outcomes — RAP provides the deliberation framework.

Skip It When

Simple, Linear Tasks

Tasks that follow a straightforward sequence with no branching — summarization, translation, or single-step Q&A don’t benefit from search overhead.

Latency-Sensitive Applications

MCTS requires many LLM calls per problem (one per simulation rollout) — this is too slow for real-time interactions and significantly multiplies compute costs.

Open-Ended Creative Tasks

When there is no clear “correct” solution to search for — creative writing, brainstorming, and opinion tasks lack the evaluation function MCTS needs to guide the search.

Use Cases

Where RAP delivers the most value

Mathematical Problem Solving

Explore multiple solution strategies for complex math problems, systematically pruning approaches that hit dead ends and concentrating effort on promising paths.

Code Generation and Debugging

Treat program synthesis as planning — explore different algorithmic approaches, test intermediate states against expected outputs, and backtrack from bugs.

Scientific Hypothesis Testing

Explore multiple hypotheses in parallel, simulating the evidence each would predict, and concentrating investigation on the most promising explanations.

Multi-Turn Dialogue Planning

Plan conversation strategies by simulating how different opening moves lead to different dialogue trajectories, selecting the approach most likely to achieve the communication goal.

Red Team Analysis

Systematically explore attack vectors by treating adversarial reasoning as a search problem — what can go wrong, in what sequence, with what probability.

Resource Scheduling

Optimize scheduling and allocation problems where dependencies create complex constraint networks that benefit from look-ahead search to avoid bottlenecks.

Where RAP Fits

RAP bridges simple reasoning chains and full agentic planning systems

Chain-of-Thought Linear Reasoning Single-path step-by-step

Tree of Thought Branching Search Multiple reasoning paths with BFS/DFS

RAP MCTS-Guided Planning Strategic search with world model

LATS Agent Tree Search MCTS for tool-using agents

From Reasoning to Agency

RAP demonstrated that the same search algorithms powering game-playing AI (like AlphaGo’s MCTS) can be applied to language reasoning. This insight opened the door to a new generation of agentic systems. LATS (Language Agent Tree Search) extends RAP’s approach to agents that interact with external tools and environments. Everything of Thoughts (XoT) pushes further by pre-training the search policy. If you’re building AI agents that need to plan and reason, RAP’s architecture is the conceptual foundation.

Related Techniques

Explore the planning and search reasoning ecosystem

Foundation Tree of Thought The branching reasoning framework that introduced search over thought paths — RAP enhances this with MCTS for more strategic exploration and the world model concept.

Evolution LATS (Language Agent Tree Search) Extends RAP’s MCTS approach to agents that interact with external tools and environments, combining planning with real-world action execution.

Complement Everything of Thoughts (XoT) Goes beyond RAP by pre-training the search policy with reinforcement learning, reducing the computational cost of MCTS while maintaining its exploratory benefits.

Plan Your Reasoning Strategy

Explore search-guided reasoning techniques or build structured prompts with our tools.

Prompt Builder Technique Finder

Reasoning via Planning (RAP)

Reasoning as Strategic Search

The RAP Process

Define the Reasoning Problem as a State Space

Generate Candidate Actions (Agent Role)

Simulate and Evaluate via MCTS (World Model Role)

Select the Optimal Reasoning Path

See the Difference

Chain-of-Thought

RAP with MCTS

Practice Responsible AI

RAP in Action

When to Use RAP

Perfect For

Skip It When

Use Cases

Mathematical Problem Solving

Code Generation and Debugging

Scientific Hypothesis Testing

Multi-Turn Dialogue Planning

Red Team Analysis

Resource Scheduling

Where RAP Fits

Related Techniques

Plan Your Reasoning Strategy