Reflection

A self-evaluation step in which an agent critiques its own outputs or reasoning and revises them to improve quality.

What Is Reflection?

Reflection is a technique in which an AI agent evaluates its own output, reasoning, or plan — and then uses that evaluation to produce an improved version. Instead of treating the first response as final, the agent asks itself: "Is this correct? Is this complete? How could this be better?" — and iterates until quality criteria are met.

The technique was formalized in the "Reflexion" paper (2023), which showed that agents that reflect on their mistakes significantly outperform those that do not, particularly on tasks requiring planning and multi-step reasoning.

How Reflection Works

A basic reflection loop:

Generate — The agent produces an initial response, plan, or action.
Critique — The agent (or a separate critic model) evaluates the output against explicit criteria: accuracy, completeness, consistency, format, safety.
Revise — The agent rewrites or refines its output based on the critique.
Repeat or accept — If the revised output meets quality criteria, it is accepted. Otherwise, another critique-revise cycle begins.

This loop can run for a fixed number of iterations or until the agent's self-assessment reaches a threshold of confidence.

Types of Reflection

Output Reflection

The agent reviews its final answer for errors, gaps, or quality issues before delivering it to the user. Most common and least expensive — typically one extra LLM call.

Step-Level Reflection

After each action in the agent loop, the agent evaluates whether the action achieved its intent. If not, it adapts the next step accordingly. Slower but more reliable for complex tasks.

Plan Reflection

Before execution begins, the agent reviews its generated plan for logical flaws, missing steps, or incorrect assumptions. Catches planning mistakes early.

Cross-Agent Critique

In multi-agent systems, a dedicated critic agent reviews the outputs of worker agents. This separates the generation and critique responsibilities, often producing higher-quality evaluations.

A Concrete Example

An agent tasked with extracting competitive intelligence from a website calls KnowledgeSDK's /v1/extract and receives structured data. In a reflection step, it might reason:

"I have pricing information and product names. However, the user asked for a comparison including customer segments. The extracted data does not include that. I should scrape the 'About' and 'Customers' pages as well before answering."

This self-critique triggers additional extraction calls rather than delivering an incomplete answer.

Reflection vs. ReAct

ReAct interleaves reasoning and acting within a single loop — the model reasons before each action. Reflection adds an additional dimension: the model evaluates and revises after producing an output, potentially triggering new rounds of acting.

These approaches are complementary: a ReAct agent with reflection reasons before acting and reflects after producing results — a powerful combination for high-stakes research or analysis tasks.

When to Use Reflection

Reflection is most valuable when:

The task has objectively verifiable criteria (code that runs, facts that can be checked).
Quality is more important than speed.
The task involves synthesis of complex information where first drafts are often incomplete.
Errors in the output would be costly (customer-facing reports, automated decisions).

Reflection adds latency and token cost. For simple lookup tasks, it is unnecessary overhead. Apply it selectively to the steps in an agent pipeline where quality matters most.