LATS (Language Agent Tree Search)
Most AI agents reason in a straight line — one thought, one action, one outcome. LATS breaks that limitation by giving agents a tree of possibilities, using Monte Carlo Tree Search to explore, evaluate, and backtrack through multiple reasoning paths before committing to the best one.
Introduced: LATS (Language Agent Tree Search) was introduced in 2023 by Zhou et al. The technique addresses a fundamental limitation of sequential agent frameworks: once an agent takes a wrong action, it has no mechanism to recover. LATS solves this by unifying three capabilities — reasoning (Chain-of-Thought), acting (ReAct), and planning (Monte Carlo Tree Search) — into a single agent framework. By treating the environment’s feedback as value signals for tree search, LATS achieved state-of-the-art 92.7% pass@1 on HumanEval with GPT-4, dramatically outperforming sequential approaches.
Modern LLM Status: LATS is one of the most impactful agent frameworks from 2023. In 2026, its combination of tree search with LLM agents is widely used in production coding assistants, automated testing pipelines, and complex problem-solving systems. The framework demonstrated that structured search dramatically outperforms simple sequential reasoning for agentic tasks. Many modern agent orchestration platforms (LangChain, AutoGen, CrewAI) have incorporated LATS-inspired search-and-backtrack patterns into their core architectures.
Search Before You Commit
Traditional AI agents follow a single path: think, act, observe, repeat. If the agent takes a wrong turn three steps in, it has no way to undo that decision — it just keeps going, compounding the error. This is like navigating a maze by only ever moving forward, never allowed to retrace your steps.
LATS gives the agent a bird’s-eye view of the maze. Instead of committing to one path, the agent explores multiple branches simultaneously. At each decision point, it generates several possible actions, evaluates which branches look most promising using environment feedback as value signals, and can backtrack to try alternative paths when one branch fails. The result is an agent that systematically searches the space of possible solutions rather than gambling on a single trajectory.
Think of it like a chess engine: instead of making the first reasonable move it sees, it explores thousands of possible game states before choosing the move with the highest expected value. LATS brings this same strategic depth to language agents.
Sequential agents (like basic ReAct) are fragile: one bad action can derail the entire chain. LATS mitigates this by maintaining a search tree where each node represents a state, each edge represents an action, and the agent can freely explore and backtrack. Environment feedback (success, failure, partial progress) serves as the value function that guides which branches to explore next. This transforms agentic problem-solving from a single-shot gamble into a systematic search with recovery.
The LATS Process
Four stages from problem to verified solution via tree search
Selection — Choose the Most Promising Node
Starting from the root (the initial problem state), LATS traverses the existing search tree to find the most promising node to expand. It uses Upper Confidence Bound (UCB) scoring — the same algorithm used in AlphaGo — to balance exploring new branches versus exploiting branches that have already shown promise. This ensures the agent does not get stuck repeatedly trying the same approach.
Given a coding task, the agent has already tried two approaches: a recursive solution (score: 0.6) and an iterative solution (score: 0.3). UCB scoring selects the recursive branch for further exploration because it has the higher value, but also reserves exploration budget for untried approaches.
Expansion — Generate Multiple Candidate Actions
At the selected node, the LLM generates multiple candidate actions (typically 3–5) rather than just one. Each candidate represents a different reasoning path or action the agent could take. This is where LATS leverages the LLM’s ability to produce diverse outputs — by sampling at higher temperatures or using different prompting strategies to ensure genuine variety in the proposed actions.
Expanding the recursive branch, the LLM generates three candidates: (a) implement with memoization, (b) implement with tail recursion, (c) convert to dynamic programming. Each becomes a new child node in the search tree.
Evaluation — Score Actions Using Environment Feedback
Each candidate action is executed in the environment, and the resulting feedback (test results, error messages, partial outputs) is used to assign a value score. This is the key innovation of LATS: instead of relying on the LLM to self-evaluate (which is unreliable), it uses real environment signals as the ground truth. The LLM then reflects on this feedback to generate an improved value estimate for each node.
The memoization approach passes 8/10 test cases (value: 0.8), tail recursion hits a stack overflow on large inputs (value: 0.2), and the dynamic programming version passes all 10 tests (value: 1.0). These real execution results, not LLM guesses, drive the search.
Backpropagation — Update the Tree and Iterate
The value scores from evaluation are propagated back up the tree, updating ancestor nodes so future selection decisions reflect what the agent has learned. If the best leaf node solves the problem, the search terminates successfully. If not, the cycle repeats: select, expand, evaluate, backpropagate — each iteration making the agent smarter about which branches to explore and which to abandon.
The dynamic programming path scored 1.0, so the search terminates with a verified solution. If no path had fully succeeded, the agent would backtrack to the highest-value unexplored branch and try new approaches — potentially combining insights from failed branches into a hybrid solution.
See the Difference
Why tree search outperforms sequential agent reasoning
Sequential Agent (ReAct)
Thought: I need to write a function to find the longest palindromic substring.
Action: Write a brute-force O(n³) solution.
Observation: Time Limit Exceeded on test case 47.
Thought: I should optimize... but I already committed to brute-force structure.
Action: Add minor optimizations to the same approach.
Observation: Still TLE. No path to recovery.
Agent stuck in a suboptimal approach with no mechanism to try fundamentally different strategies.
LATS Agent
Expand: Generate 3 approaches — brute-force, expand-around-center, Manacher’s algorithm.
Evaluate: Brute-force TLE (0.2), expand-around-center passes 9/10 (0.9), Manacher’s has a bug (0.4).
Select: Expand the expand-around-center branch.
Expand: Fix edge case handling for even-length palindromes.
Evaluate: All 10 tests pass (1.0). Solution found.
Agent explored three fundamentally different approaches in parallel, identified the most promising one via real test feedback, and refined it to a verified solution — all within a structured search framework.
Practice Responsible AI
Always verify AI-generated content before use. AI systems can produce confident but incorrect responses. When using AI professionally, transparent disclosure is both best practice and increasingly a legal requirement.
48 US states now require AI transparency in key areas. Critical thinking remains your strongest tool against misinformation.
LATS in Action
See how tree search transforms agent problem-solving across domains
“Write a Python function that determines whether a given Sudoku board is valid. The board is a 9x9 grid where empty cells are represented by ‘.’ characters.”
Iteration 1 — Expand: Generate 3 candidate solutions:
(a) Nested loops checking rows, columns, and boxes separately
(b) Hash set approach checking all constraints in one pass
(c) Bit manipulation approach using bitmasks
Iteration 1 — Evaluate:
(a) Passes 18/20 tests — fails on boards with duplicate in 3x3 box (value: 0.7)
(b) Passes 20/20 tests (value: 1.0)
(c) Has syntax error in bit shift (value: 0.1)
Result: Branch (b) solves the problem on first expansion. Search terminates with verified solution.
Note: Always review AI-generated code against your test suite and edge cases before deploying to production. Automated tests are necessary but may not cover all real-world scenarios.
“Find the return policy for electronics purchases on an e-commerce site and summarize the key conditions.”
Iteration 1 — Expand: Generate candidate navigation paths:
(a) Click “Help” → search “return policy”
(b) Click “Customer Service” → “Returns & Refunds”
(c) Scroll to footer → “Policies” link
Iteration 1 — Evaluate:
(a) Search returns general FAQ, not electronics-specific (value: 0.4)
(b) Reaches returns page but it covers all categories (value: 0.6)
(c) Footer link leads to a 404 page (value: 0.0)
Iteration 2 — Expand branch (b):
(b1) Filter by “Electronics” category on returns page
(b2) Search within page for “electronics”
Iteration 2 — Evaluate:
(b1) Successfully shows electronics-specific return policy (value: 1.0)
Result: Agent navigated to the correct policy page through systematic exploration, backtracking from dead ends automatically.
Note: Always verify that information retrieved by AI agents matches the current published policy. Websites update their policies regularly, and cached or outdated information can lead to incorrect conclusions.
“Solve: A farmer has chickens and cows. He counts 30 heads and 86 legs. How many chickens and how many cows does he have?”
Iteration 1 — Expand: Generate 3 reasoning approaches:
(a) Set up system of equations: c + w = 30, 2c + 4w = 86
(b) Assume all chickens: 30 heads = 60 legs, surplus = 26, each cow adds 2 legs → 13 cows
(c) Trial and error starting from midpoint: 15 chickens + 15 cows = 90 legs (too many)
Iteration 1 — Evaluate:
(a) Solves to w = 13, c = 17. Verify: 17(2) + 13(4) = 34 + 52 = 86. Correct (value: 1.0)
(b) 13 cows, 17 chickens. Same answer. Correct (value: 1.0)
(c) Needs more iterations to converge (value: 0.5)
Result: Two independent approaches converged on the same answer (17 chickens, 13 cows), providing high confidence. Cross-verification through multiple solution paths is a key strength of LATS.
Note: Even when AI produces mathematically correct-looking work, verify the arithmetic independently. Cross-checking via multiple approaches, as LATS does naturally, is good practice for any quantitative reasoning task.
When to Use LATS
Best for complex agentic tasks where exploration and recovery matter
Perfect For
Where test cases provide clear pass/fail signals — LATS can explore multiple algorithms and verify each against real execution results.
Tasks with multiple valid strategies where the best approach is not obvious upfront — LATS systematically evaluates alternatives before committing.
Agents that interact with external systems where wrong actions have real consequences — LATS can explore paths and backtrack from dead ends.
Scenarios where getting the answer right matters more than getting it fast — LATS trades compute for accuracy through systematic exploration.
Skip It When
Tasks that have one obvious correct approach — tree search adds unnecessary overhead when the first action is almost certainly right.
Real-time chat or interactive scenarios where users expect instant responses — LATS requires multiple LLM calls per iteration, multiplied by tree depth.
Tasks where there is no way to evaluate intermediate results — LATS depends on environment signals to guide its search. Without them, it cannot distinguish good branches from bad ones.
Use Cases
Where LATS delivers the most value
Code Generation & Repair
Explore multiple algorithmic approaches, run tests against each, and converge on verified solutions. LATS achieved 92.7% on HumanEval by systematically searching through code strategies.
Automated Testing Pipelines
Generate test strategies, execute them against the system under test, and use pass/fail results to guide the search toward comprehensive test coverage.
Web Agent Navigation
Navigate complex websites by exploring multiple paths, backtracking from dead ends (404s, login walls), and systematically finding the target information.
Complex Reasoning Tasks
Math competitions, logic puzzles, and multi-step reasoning where exploring multiple solution strategies and verifying intermediate results prevents compounding errors.
Security Penetration Testing
Systematically explore attack vectors against a target system, using real feedback (success/failure of each probe) to guide the search toward vulnerabilities.
Data Pipeline Optimization
Explore different ETL configurations, query optimizations, and schema designs, using execution benchmarks as the value signal to converge on the highest-performing pipeline.
Where LATS Fits
LATS unifies reasoning, acting, and planning into a search framework
LATS borrows Monte Carlo Tree Search (MCTS) from game AI — the same algorithm behind AlphaGo’s superhuman Go play. In games, MCTS explores possible moves and uses simulated outcomes to evaluate them. LATS applies this same principle to language agents: instead of simulated games, it uses real environment feedback (test results, API responses, page content) as the value signal. This grounds the search in reality rather than the LLM’s self-assessment, which is the key insight that makes LATS so effective.
Related Techniques & Frameworks
Explore the agent planning ecosystem
Search Smarter, Not Harder
Explore LATS-inspired agent planning or build structured prompts with our interactive tools.