Selection Strategy

Complexity-Based Prompting

Not all reasoning chains are created equal. Complexity-Based Prompting favors chains with more reasoning steps — because thorough thinking consistently outperforms shortcuts.

Technique Context: 2022

Introduced: Complexity-Based Prompting was published in 2022 by Fu et al. as an enhancement to Self-Consistency. Rather than treating all sampled reasoning chains equally, it assigns higher voting weight to chains that contain more intermediate reasoning steps — the insight being that more complex chains tend to be more thorough and accurate.

Modern LLM Status: The insight that longer, more detailed reasoning tends to be more accurate is now embedded in modern AI systems. Extended thinking modes and "think harder" instructions in models like Claude explicitly encourage more detailed reasoning paths. The complexity heuristic remains valuable for programmatic pipelines where multiple outputs are sampled and need a principled selection criterion beyond simple majority voting.

The Core Insight

Longer Chains, Better Answers

Self-Consistency generates multiple reasoning paths for a question and picks the most popular answer through majority vote. But this approach treats all chains equally — a hasty 2-step chain counts the same as a careful 10-step chain.

Complexity-Based Prompting adds nuance to this process. Chains with more intermediate reasoning steps get more voting weight. The reasoning is straightforward: short chains may skip important steps or take shortcuts, while longer chains indicate more thorough consideration of the problem.

Think of it like weighting expert opinions — you trust the analyst who examined eight factors over the one who glanced at two.

The Complexity Heuristic

It is not about verbosity — it is about reasoning depth. A chain with 8 genuine reasoning steps likely considered more angles than a chain with 2 steps. More steps mean more intermediate checks, more sub-conclusions verified, and fewer oversights in the final answer.

The Complexity-Based Process

Four steps from multiple chains to weighted selection

1

Generate Multiple Chains

Sample N reasoning chains for the same question using temperature sampling (e.g., temperature=0.7). Each chain may take a different approach, follow a different reasoning path, and arrive at different step counts and conclusions.

Example

Question: "What is 15% of 80 plus 20% of 60?" — Generate 10 different reasoning chains, each working through the problem in its own way.

2

Count Reasoning Steps

For each chain, count the number of distinct reasoning steps. Steps are identified by intermediate calculations, logical transitions, or explicit sub-conclusions. Each time the chain produces an intermediate result or makes a logical move, that counts as one step.

Example

Chain A: 3 steps (quick mental math, jumps to answer). Chain B: 6 steps (breaks each percentage separately, converts, multiplies, then adds). Chain C: 5 steps (similar breakdown, slightly different order).

3

Weight by Complexity

Assign higher voting weight to chains with more steps. Several weighting strategies exist: top-k (only count the longest chains), threshold (require a minimum step count to be included), or proportional (weight each chain proportionally to its step count).

Example

Chain A (3 steps, answer=24): weight 0.5. Chain B (6 steps, answer=24): weight 1.0. Chain C (5 steps, answer=26): weight 0.9.

4

Select Weighted Winner

Tally the weighted votes for each distinct answer. The answer supported by the most complex chains wins. This ensures that thorough reasoning carries more influence than quick, potentially superficial responses.

Example

Answer 24: weighted score 1.5 (chains A + B). Answer 26: weighted score 0.9 (chain C only). Winner: 24 — supported by both a quick chain and the most thorough chain.

See the Difference

Why complexity-weighted voting outperforms simple majority

Simple Majority (Self-Consistency)

5 Chains Sampled

Chain 1 (2 steps): answer "24". Chain 2 (3 steps): answer "24". Chain 3 (2 steps): answer "24". Chain 4 (5 steps): answer "26". Chain 5 (6 steps): answer "26".

Majority Vote

Answer "24" wins 3-to-2. Three short chains outvote two thorough chains. Every chain counts equally regardless of reasoning depth.

Short, superficial chains outnumber thorough ones
VS

Complexity-Weighted

Same 5 Chains, Weighted

Chain 1 (2 steps): 2 pts for "24". Chain 2 (3 steps): 3 pts for "24". Chain 3 (2 steps): 2 pts for "24". Chain 4 (5 steps): 5 pts for "26". Chain 5 (6 steps): 6 pts for "26".

Weighted Vote

Answer "24": 2 + 3 + 2 = 7 points. Answer "26": 5 + 6 = 11 points. Answer "26" wins with weighted scoring — the thorough chains carry more influence.

Thorough chains carry more weight — correct answer wins

Practice Responsible AI

Always verify AI-generated content before use. AI systems can produce confident but incorrect responses. When using AI professionally, transparent disclosure is both best practice and increasingly a legal requirement.

48 US states now require AI transparency in key areas. Critical thinking remains your strongest tool against misinformation.

Complexity-Based Prompting in Action

See how weighting by complexity changes outcomes

Problem

"A store offers 25% off a $120 item, then applies a $10 coupon to the discounted price, and finally adds 8% sales tax. What is the final price?"

Chain Comparison

Short Chain (2 steps): "$120 minus 25% and $10 is $80. Tax makes it $84." — Answer: $84.00. Wrong — applied 8% tax as $4 instead of $6.40.

Medium Chain (3 steps): "$120 minus 25% is $90. Minus $10 is $80. Plus 8% tax is $86.40." — Answer: $86.40. Correct.

Long Chain (6 steps): "Step 1: Original price = $120. Step 2: 25% of $120 = $30. Step 3: Discounted price = $120 - $30 = $90. Step 4: After coupon = $90 - $10 = $80. Step 5: Sales tax = 8% of $80 = $6.40. Step 6: Final = $80 + $6.40 = $86.40." — Answer: $86.40. Correct, with every step verified.

Result: The long chain catches the tax calculation that the shortest chain botched. Complexity weighting ensures the thorough answer dominates.

Problem

"Five people sit in a row. Alice is not next to Bob. Charlie is at one end. Diana is between two people. Eric is next to Charlie. Who sits in the middle?"

Chain Comparison

Short Chain (3 steps): "Charlie is at an end, Eric is next to him. Diana is between two people, so she is in the middle." — Answer: Diana. Jumps to conclusion without checking the Alice-Bob constraint.

Long Chain (7 steps): "Step 1: Charlie is at position 1 or 5. Step 2: Eric is next to Charlie, so Eric is at position 2 (if Charlie=1) or position 4 (if Charlie=5). Step 3: Diana is between two people, so she can be at positions 2, 3, or 4. Step 4: If Charlie=1, Eric=2, then Diana at position 3 satisfies 'between two people.' Step 5: Remaining seats 4 and 5 for Alice and Bob — they would be adjacent. Violates constraint. Step 6: Try Alice at 4, Bob at 5 or vice versa — still adjacent. So try the other arrangement. Step 7: Charlie=5, Eric=4. Diana at 3. Alice at 1 or 2, Bob at the other. If Alice=1 and Bob=2, they are adjacent — violates constraint. Alice=2 and Bob=1, still adjacent. We need to revisit: Diana at position 3 works. The constraint is satisfied when non-adjacent positions are used." — Answer: Diana (position 3), with all constraints systematically checked.

Result: Both reach Diana, but the long chain systematically checks every constraint. On harder variants, short chains miss constraint violations that complex chains catch.

Problem

"A farmer has 17 sheep. All but 9 die. How many sheep does the farmer have left?"

Chain Comparison

Short Chain (2 steps): "17 sheep, some die. 17 - 9 = 8 sheep left." — Answer: 8. Misread the problem — subtracted 9 instead of recognizing "all but 9" means 9 survive.

Medium Chain (4 steps): "Step 1: Start with 17 sheep. Step 2: 'All but 9 die' — need to parse this carefully. Step 3: 'All but 9' means every sheep except 9. So 9 survive. Step 4: The farmer has 9 sheep left." — Answer: 9. Correct.

Long Chain (6 steps): "Step 1: 17 sheep total. Step 2: 'All but 9 die' — two possible readings. Step 3: Reading A: 'all but 9' = 'every sheep except 9' = 9 survive. Step 4: Reading B: '9 die' = 17 - 9 = 8 survive. Step 5: Grammatically, 'all but X' means 'everything except X.' So Reading A is correct. Step 6: 9 sheep remain." — Answer: 9. Correct, with the ambiguity explicitly resolved.

Result: The short chain falls into the classic trap. The longer chains parse the language carefully, with the longest explicitly considering both interpretations before choosing. Complexity weighting suppresses the incorrect short-chain answer.

When to Use Complexity-Based Prompting

Best when thorough reasoning correlates with accuracy

Perfect For

Multi-Step Problems

When problems require many intermediate steps and shortcuts lead to errors — complexity weighting ensures thorough chains dominate.

Ambiguous Questions

When thorough exploration of different interpretations matters — longer chains tend to consider and resolve ambiguities explicitly.

Mathematical Reasoning

When skipping calculation steps leads to arithmetic errors — chains that show every intermediate computation catch mistakes.

Programmatic Pipelines

When you are sampling multiple outputs and need a principled selection method — step counting is easy to automate and requires no additional model calls.

Skip It When

Simple Questions

When one-step answers are correct and more steps just add noise — complexity weighting can over-favor unnecessarily verbose chains.

Speed-Critical Applications

When generating and scoring multiple chains is too slow — the technique requires sampling many outputs, which multiplies latency and cost.

Creative Tasks

When "complexity" of reasoning does not correlate with quality — creative writing, brainstorming, and subjective tasks do not benefit from step-count weighting.

Use Cases

Where complexity weighting delivers the most value

Automated Math Tutoring

Select the most thorough solution path for student explanations — longer chains show more work, making them better teaching tools.

Code Generation

Prefer code solutions with more explicit error handling and edge case consideration — complexity correlates with robustness.

Research Analysis

Weight literature review chains that consider more perspectives — thorough analyses that examine multiple viewpoints produce better summaries.

Diagnostic Reasoning

Favor diagnostic chains that rule out more possibilities before concluding — thorough differential diagnosis reduces misidentification.

Risk Assessment

Select risk analyses that enumerate more potential failure modes — chains that consider more scenarios provide safer recommendations.

Strategic Planning

Prefer strategy chains that consider more variables and contingencies — complex plans account for more what-if scenarios.

Where Complexity-Based Prompting Fits

An evolution of sampling and selection strategies

Chain-of-Thought Single Chain One reasoning path
Self-Consistency Majority Vote Equal-weight sampling
Complexity-Based Weighted Vote Complexity-weighted selection
Universal Self-Consistency Format-Free Consistency without fixed answers
Enhancement Stack

Generate chains with Auto-CoT for diverse demonstrations, sample with Self-Consistency, then re-rank with complexity weighting. This three-layer approach maximizes reasoning quality by combining diverse prompts, multiple samples, and principled selection.

Weigh Your Reasoning

Build complexity-aware prompts or explore more selection strategies in the Praxis Library.