Complexity-Based Prompting
Not all reasoning chains are created equal. Complexity-Based Prompting favors chains with more reasoning steps — because thorough thinking consistently outperforms shortcuts.
Introduced: Complexity-Based Prompting was published in 2022 by Fu et al. as an enhancement to Self-Consistency. Rather than treating all sampled reasoning chains equally, it assigns higher voting weight to chains that contain more intermediate reasoning steps — the insight being that more complex chains tend to be more thorough and accurate.
Modern LLM Status: The insight that longer, more detailed reasoning tends to be more accurate is now embedded in modern AI systems. Extended thinking modes and "think harder" instructions in models like Claude explicitly encourage more detailed reasoning paths. The complexity heuristic remains valuable for programmatic pipelines where multiple outputs are sampled and need a principled selection criterion beyond simple majority voting.
Longer Chains, Better Answers
Self-Consistency generates multiple reasoning paths for a question and picks the most popular answer through majority vote. But this approach treats all chains equally — a hasty 2-step chain counts the same as a careful 10-step chain.
Complexity-Based Prompting adds nuance to this process. Chains with more intermediate reasoning steps get more voting weight. The reasoning is straightforward: short chains may skip important steps or take shortcuts, while longer chains indicate more thorough consideration of the problem.
Think of it like weighting expert opinions — you trust the analyst who examined eight factors over the one who glanced at two.
It is not about verbosity — it is about reasoning depth. A chain with 8 genuine reasoning steps likely considered more angles than a chain with 2 steps. More steps mean more intermediate checks, more sub-conclusions verified, and fewer oversights in the final answer.
The Complexity-Based Process
Four steps from multiple chains to weighted selection
Generate Multiple Chains
Sample N reasoning chains for the same question using temperature sampling (e.g., temperature=0.7). Each chain may take a different approach, follow a different reasoning path, and arrive at different step counts and conclusions.
Question: "What is 15% of 80 plus 20% of 60?" — Generate 10 different reasoning chains, each working through the problem in its own way.
Count Reasoning Steps
For each chain, count the number of distinct reasoning steps. Steps are identified by intermediate calculations, logical transitions, or explicit sub-conclusions. Each time the chain produces an intermediate result or makes a logical move, that counts as one step.
Chain A: 3 steps (quick mental math, jumps to answer). Chain B: 6 steps (breaks each percentage separately, converts, multiplies, then adds). Chain C: 5 steps (similar breakdown, slightly different order).
Weight by Complexity
Assign higher voting weight to chains with more steps. Several weighting strategies exist: top-k (only count the longest chains), threshold (require a minimum step count to be included), or proportional (weight each chain proportionally to its step count).
Chain A (3 steps, answer=24): weight 0.5. Chain B (6 steps, answer=24): weight 1.0. Chain C (5 steps, answer=26): weight 0.9.
Select Weighted Winner
Tally the weighted votes for each distinct answer. The answer supported by the most complex chains wins. This ensures that thorough reasoning carries more influence than quick, potentially superficial responses.
Answer 24: weighted score 1.5 (chains A + B). Answer 26: weighted score 0.9 (chain C only). Winner: 24 — supported by both a quick chain and the most thorough chain.
See the Difference
Why complexity-weighted voting outperforms simple majority
Simple Majority (Self-Consistency)
Chain 1 (2 steps): answer "24". Chain 2 (3 steps): answer "24". Chain 3 (2 steps): answer "24". Chain 4 (5 steps): answer "26". Chain 5 (6 steps): answer "26".
Answer "24" wins 3-to-2. Three short chains outvote two thorough chains. Every chain counts equally regardless of reasoning depth.
Complexity-Weighted
Chain 1 (2 steps): 2 pts for "24". Chain 2 (3 steps): 3 pts for "24". Chain 3 (2 steps): 2 pts for "24". Chain 4 (5 steps): 5 pts for "26". Chain 5 (6 steps): 6 pts for "26".
Answer "24": 2 + 3 + 2 = 7 points. Answer "26": 5 + 6 = 11 points. Answer "26" wins with weighted scoring — the thorough chains carry more influence.
Practice Responsible AI
Always verify AI-generated content before use. AI systems can produce confident but incorrect responses. When using AI professionally, transparent disclosure is both best practice and increasingly a legal requirement.
48 US states now require AI transparency in key areas. Critical thinking remains your strongest tool against misinformation.
Complexity-Based Prompting in Action
See how weighting by complexity changes outcomes
"A store offers 25% off a $120 item, then applies a $10 coupon to the discounted price, and finally adds 8% sales tax. What is the final price?"
Short Chain (2 steps): "$120 minus 25% and $10 is $80. Tax makes it $84." — Answer: $84.00. Wrong — applied 8% tax as $4 instead of $6.40.
Medium Chain (3 steps): "$120 minus 25% is $90. Minus $10 is $80. Plus 8% tax is $86.40." — Answer: $86.40. Correct.
Long Chain (6 steps): "Step 1: Original price = $120. Step 2: 25% of $120 = $30. Step 3: Discounted price = $120 - $30 = $90. Step 4: After coupon = $90 - $10 = $80. Step 5: Sales tax = 8% of $80 = $6.40. Step 6: Final = $80 + $6.40 = $86.40." — Answer: $86.40. Correct, with every step verified.
Result: The long chain catches the tax calculation that the shortest chain botched. Complexity weighting ensures the thorough answer dominates.
"Five people sit in a row. Alice is not next to Bob. Charlie is at one end. Diana is between two people. Eric is next to Charlie. Who sits in the middle?"
Short Chain (3 steps): "Charlie is at an end, Eric is next to him. Diana is between two people, so she is in the middle." — Answer: Diana. Jumps to conclusion without checking the Alice-Bob constraint.
Long Chain (7 steps): "Step 1: Charlie is at position 1 or 5. Step 2: Eric is next to Charlie, so Eric is at position 2 (if Charlie=1) or position 4 (if Charlie=5). Step 3: Diana is between two people, so she can be at positions 2, 3, or 4. Step 4: If Charlie=1, Eric=2, then Diana at position 3 satisfies 'between two people.' Step 5: Remaining seats 4 and 5 for Alice and Bob — they would be adjacent. Violates constraint. Step 6: Try Alice at 4, Bob at 5 or vice versa — still adjacent. So try the other arrangement. Step 7: Charlie=5, Eric=4. Diana at 3. Alice at 1 or 2, Bob at the other. If Alice=1 and Bob=2, they are adjacent — violates constraint. Alice=2 and Bob=1, still adjacent. We need to revisit: Diana at position 3 works. The constraint is satisfied when non-adjacent positions are used." — Answer: Diana (position 3), with all constraints systematically checked.
Result: Both reach Diana, but the long chain systematically checks every constraint. On harder variants, short chains miss constraint violations that complex chains catch.
"A farmer has 17 sheep. All but 9 die. How many sheep does the farmer have left?"
Short Chain (2 steps): "17 sheep, some die. 17 - 9 = 8 sheep left." — Answer: 8. Misread the problem — subtracted 9 instead of recognizing "all but 9" means 9 survive.
Medium Chain (4 steps): "Step 1: Start with 17 sheep. Step 2: 'All but 9 die' — need to parse this carefully. Step 3: 'All but 9' means every sheep except 9. So 9 survive. Step 4: The farmer has 9 sheep left." — Answer: 9. Correct.
Long Chain (6 steps): "Step 1: 17 sheep total. Step 2: 'All but 9 die' — two possible readings. Step 3: Reading A: 'all but 9' = 'every sheep except 9' = 9 survive. Step 4: Reading B: '9 die' = 17 - 9 = 8 survive. Step 5: Grammatically, 'all but X' means 'everything except X.' So Reading A is correct. Step 6: 9 sheep remain." — Answer: 9. Correct, with the ambiguity explicitly resolved.
Result: The short chain falls into the classic trap. The longer chains parse the language carefully, with the longest explicitly considering both interpretations before choosing. Complexity weighting suppresses the incorrect short-chain answer.
When to Use Complexity-Based Prompting
Best when thorough reasoning correlates with accuracy
Perfect For
When problems require many intermediate steps and shortcuts lead to errors — complexity weighting ensures thorough chains dominate.
When thorough exploration of different interpretations matters — longer chains tend to consider and resolve ambiguities explicitly.
When skipping calculation steps leads to arithmetic errors — chains that show every intermediate computation catch mistakes.
When you are sampling multiple outputs and need a principled selection method — step counting is easy to automate and requires no additional model calls.
Skip It When
When one-step answers are correct and more steps just add noise — complexity weighting can over-favor unnecessarily verbose chains.
When generating and scoring multiple chains is too slow — the technique requires sampling many outputs, which multiplies latency and cost.
When "complexity" of reasoning does not correlate with quality — creative writing, brainstorming, and subjective tasks do not benefit from step-count weighting.
Use Cases
Where complexity weighting delivers the most value
Automated Math Tutoring
Select the most thorough solution path for student explanations — longer chains show more work, making them better teaching tools.
Code Generation
Prefer code solutions with more explicit error handling and edge case consideration — complexity correlates with robustness.
Research Analysis
Weight literature review chains that consider more perspectives — thorough analyses that examine multiple viewpoints produce better summaries.
Diagnostic Reasoning
Favor diagnostic chains that rule out more possibilities before concluding — thorough differential diagnosis reduces misidentification.
Risk Assessment
Select risk analyses that enumerate more potential failure modes — chains that consider more scenarios provide safer recommendations.
Strategic Planning
Prefer strategy chains that consider more variables and contingencies — complex plans account for more what-if scenarios.
Where Complexity-Based Prompting Fits
An evolution of sampling and selection strategies
Generate chains with Auto-CoT for diverse demonstrations, sample with Self-Consistency, then re-rank with complexity weighting. This three-layer approach maximizes reasoning quality by combining diverse prompts, multiple samples, and principled selection.
Related Techniques
Explore the techniques that connect to complexity-based selection
Weigh Your Reasoning
Build complexity-aware prompts or explore more selection strategies in the Praxis Library.