Memory Systems

Memory of Thought

Instead of starting every problem from zero, Memory of Thought stores the model’s own successful reasoning chains and retrieves the most relevant ones as demonstrations for new problems — turning past performance into self-improving guidance.

Technique Context: 2023

Introduced: Memory of Thought (MoT) was proposed in 2023 by Li et al. The technique addresses a fundamental limitation of few-shot prompting: hand-crafted demonstrations are static and may not match the structure of the current problem. MoT introduces a self-improving cycle — the model first solves problems using chain-of-thought reasoning, stores successful reasoning chains in a memory bank, and then retrieves the most similar chains as dynamic demonstrations for new problems. This creates an experience-driven loop where the model’s own past successes guide its future reasoning.

Modern LLM Status: The core insight of MoT — using the model’s own successful reasoning as future demonstrations — has influenced the design of retrieval-augmented reasoning systems. Modern agents and multi-turn systems often maintain memory of past interactions. However, the explicit store-retrieve-apply pipeline remains valuable in production systems where you want to build a growing library of verified reasoning patterns that improve performance over time without retraining. MoT is most relevant for batch processing, automated pipelines, and domain-specific applications where a curated reasoning memory can accumulate.

The Core Insight

Your Best Teacher Is Your Own Past Success

Standard few-shot prompting relies on hand-picked examples that may not resemble the current problem. You carefully craft three demonstrations for math problems, but then face a logistics optimization question where those examples are structurally irrelevant. The demonstrations are static — frozen in time regardless of what the model has already proven it can solve.

Memory of Thought turns the model into its own example curator. After successfully solving a problem with chain-of-thought reasoning, MoT stores that complete reasoning chain — problem, steps, and answer — in a searchable memory bank. When a new problem arrives, MoT retrieves the most structurally similar past chains and uses them as few-shot demonstrations. The examples are always relevant because they come from problems the model has already solved correctly.

Think of it like a doctor who keeps a case journal. Each successfully diagnosed patient becomes a reference for future cases. When a new patient arrives with similar symptoms, the doctor reviews their own past cases to guide the current diagnosis — not a textbook’s generic examples, but their own proven reasoning.

Why Self-Generated Examples Beat Hand-Crafted Ones

Hand-crafted demonstrations are limited by the prompt engineer’s ability to anticipate problem structures. Self-generated memories are automatically diverse and structurally matched to the problems the model actually encounters. As the memory bank grows, the quality of retrieved demonstrations improves — creating a positive feedback loop where success breeds better future reasoning.

The Memory of Thought Process

Four stages from problem to growing reasoning memory

1

Pre-Think: Solve with Chain-of-Thought

Given a problem, the model first solves it using standard chain-of-thought reasoning. This produces a complete reasoning chain: the original question, each intermediate step, and the final answer. The model reasons through the problem as it normally would, generating a full trace of its thought process.

Example

Problem: “A store has 240 items. If 15% are returned, and half the returns are reshelved, how many items remain in stock?” → Chain: 240 × 0.15 = 36 returned → 36 / 2 = 18 reshelved → 240 - 36 + 18 = 222 items.

2

Store: Save Successful Chains to Memory

When a reasoning chain produces a verified correct answer, the complete chain is stored in the memory bank. Each entry includes the original problem, the full reasoning trace, and the final answer. The memory is indexed for retrieval — typically using embedding-based similarity so that structurally similar problems can be found quickly.

Example

Memory entry stored: {problem: “store inventory with returns and reshelving”, chain: [percentage calculation → fraction operation → net adjustment], answer: 222, tags: [arithmetic, multi-step, inventory]}.

3

Retrieve: Find Similar Past Reasoning

When a new problem arrives, MoT searches the memory bank for the most similar past problems. Similarity is measured by structural resemblance — not just surface keywords, but the type of reasoning required. The top-k most relevant chains are retrieved and formatted as few-shot demonstrations.

Example

New problem: “A factory produces 500 units. 20% fail inspection, and a third of failures are reworked successfully. How many units ship?” → Retrieved: the inventory problem above (same structure: percentage → fraction → net calculation).

4

Apply: Solve with Retrieved Demonstrations

The retrieved reasoning chains are placed in the prompt as few-shot demonstrations, and the model solves the new problem guided by these structurally similar examples. The demonstrations show the model exactly how it successfully reasoned through similar problems before, priming it to follow the same logical structure for the current problem.

Example

Prompt includes the inventory chain as a demonstration, then presents the factory problem. The model follows the same pattern: 500 × 0.20 = 100 failures → 100 / 3 ≈ 33 reworked → 500 - 100 + 33 = 433 units ship.

See the Difference

Why dynamic self-generated demonstrations outperform static examples

Static Few-Shot

Prompt

Here are some examples of solving math problems:
[Example 1: Basic addition]
[Example 2: Simple multiplication]

Now solve: “A shipping company charges $12/kg for the first 50kg and $8/kg after. An order weighs 120kg but gets a 15% bulk discount on the overage rate. What’s the total cost?”

Response

The total cost is $12 × 120 = $1,440. With 15% discount: $1,440 × 0.85 = $1,224.

Generic examples don’t match the tiered-pricing structure — model misapplies the discount
VS

Memory of Thought

Retrieved Demonstration

Past problem (from memory): “A utility charges $0.10/kWh for the first 500kWh, then $0.15/kWh. A customer uses 800kWh with a 10% efficiency credit on the higher tier.”
Past chain: First 500 × $0.10 = $50 → Overage: 300 × $0.15 = $45 → Discount on overage: $45 × 0.90 = $40.50 → Total: $50 + $40.50 = $90.50

New Problem Answer

First 50kg × $12 = $600 → Overage: 70kg × $8 = $560 → Bulk discount on overage: $560 × 0.85 = $476 → Total: $600 + $476 = $1,076.

Retrieved chain matches the tiered-pricing structure — model correctly separates tiers and applies discount only to overage

Practice Responsible AI

Always verify AI-generated content before use. AI systems can produce confident but incorrect responses. When using AI professionally, transparent disclosure is both best practice and increasingly a legal requirement.

48 US states now require AI transparency in key areas. Critical thinking remains your strongest tool against misinformation.

Memory of Thought in Action

See how retrieved reasoning chains improve performance on new problems

New Problem

“A bakery makes 300 loaves daily. 10% are samples, and 25% of the remaining are sold at a premium. Premium loaves cost $6 and regular loaves cost $4. What is the daily revenue?”

Retrieved Memory + Solution

Retrieved chain (from memory): A factory produces 400 units. 5% are quality samples. 30% of remaining sell at premium ($15), rest at standard ($10). → Samples: 400 × 0.05 = 20 → Sellable: 380 → Premium: 380 × 0.30 = 114 at $15 = $1,710 → Standard: 266 at $10 = $2,660 → Total: $4,370.

New solution (guided by memory):
Samples: 300 × 0.10 = 30 loaves (not sold)
Sellable: 300 - 30 = 270 loaves
Premium: 270 × 0.25 = 67.5 ≈ 68 loaves at $6 = $408
Regular: 270 - 68 = 202 loaves at $4 = $808
Daily revenue: $408 + $808 = $1,216

New Problem

“Five people (A-E) sit in a row. A must be next to B. C cannot be next to D. E must be at one end. How many valid arrangements exist?”

Retrieved Memory + Solution

Retrieved chain (from memory): Four people (W-Z) in a row. W next to X. Y not next to Z. → Strategy: treat W-X as a block → arrange the block + remaining → subtract invalid (Y next to Z). Found: 8 valid arrangements.

New solution (guided by memory):
Step 1: Fix E at one end (position 1 or 5) → 2 choices
Step 2: Treat A-B as a block → arrange block + C + D in remaining 4 spots → block has 2 internal orders (AB, BA) → 3! = 6 arrangements per E position × 2 = 12 per end
Step 3: Subtract cases where C is next to D
Step 4: For each E position, enumerate and count C-adjacent-to-D cases, subtract
Result: 24 total - invalid C-D adjacencies = valid count

New Problem

“A contract clause states: if Party A delivers late AND Party B has not waived the deadline, then Party B may terminate. Party A delivered 3 days late. Party B emailed saying ‘no rush, take your time’ before the deadline. Can Party B terminate?”

Retrieved Memory + Solution

Retrieved chain (from memory): Medical: “If patient has symptom X AND no contraindication Y, prescribe treatment Z. Patient has symptom X. Records show previous allergic reaction to Z.” → Check condition 1: symptom X present (yes) → Check condition 2: no contraindication Y → allergic reaction IS a contraindication → condition 2 fails → cannot prescribe Z.

New solution (guided by memory pattern):
Structure: IF condition1 AND condition2, THEN consequence
Check condition 1: Party A delivered late → Yes (3 days late)
Check condition 2: Party B has NOT waived the deadline → “No rush, take your time” likely constitutes a waiver of the deadline → condition 2 fails
Conclusion: Party B likely cannot terminate, because the email may constitute a waiver of the deadline requirement.

When to Use Memory of Thought

Best for repetitive problem domains where past reasoning accumulates value

Perfect For

Batch Processing Pipelines

When you process hundreds of similar problems — each solved problem’s reasoning chain becomes a resource for the next, improving accuracy over the batch.

Domain-Specific Reasoning

Medical diagnosis, legal analysis, financial modeling — domains where past cases create a valuable library of proven reasoning patterns.

Progressive Skill Building

When problems increase in complexity over time — earlier solutions provide scaffolding for harder problems, creating a natural curriculum.

Reducing Prompt Engineering Effort

Instead of hand-crafting demonstrations for every problem type, let the system build its own example library from verified successes.

Skip It When

One-Off Questions

If you’re solving a single unique problem, there’s no memory to build from and no future problems to benefit — standard prompting works fine.

Completely Novel Domains

When the problem type has never been encountered before and no structurally similar chains exist in memory — MoT needs a bootstrap phase before it adds value.

Latency-Critical Applications

Memory retrieval adds overhead — searching the memory bank and formatting retrieved chains takes time. For real-time chat, the retrieval step may be too slow.

Use Cases

Where Memory of Thought delivers the most value

Medical Diagnosis Pipelines

Store successful diagnostic reasoning chains from confirmed cases, then retrieve similar symptom-pattern chains when new patients present — building an ever-growing clinical reasoning library.

Automated Grading Systems

Grade student work by retrieving past grading chains for similar submissions — ensuring consistent rubric application and reasoning across hundreds of papers.

Security Incident Classification

Store reasoning chains from analyzed security incidents, then retrieve similar patterns when new alerts arrive — accelerating triage with proven classification logic.

Research Literature Review

Build a memory of how you analyzed past papers — methodology assessment, finding extraction, quality scoring — and apply those patterns consistently across new papers.

Financial Report Analysis

Store reasoning chains from quarterly earnings analyses, then retrieve them when analyzing new reports from similar companies or sectors — maintaining analytical consistency.

Customer Ticket Resolution

Store successful resolution chains from closed tickets, then retrieve similar troubleshooting logic for new tickets — creating an AI-powered knowledge base that grows with every resolved case.

Where Memory of Thought Fits

MoT bridges static demonstrations and adaptive reasoning systems

Few-Shot Learning Static Examples Hand-crafted demonstrations
Memory of Thought Dynamic Retrieval Self-generated reasoning memories
Reflexion Learning from Failure Iterating on unsuccessful attempts
Agent Memory Persistent Systems Long-lived agents with experience
Memory Quality Matters

The value of MoT depends entirely on the quality of stored chains. Verify answers before storing them in memory — one incorrect chain stored as a “success” can propagate errors to every future problem that retrieves it. Consider pairing MoT with Self-Verification or Chain-of-Verification to validate chains before they enter the memory bank.

Build Your Reasoning Memory

Start storing successful reasoning chains and let your AI system learn from its own experience with our prompt building tools.