ReIn: Conversational Error Recovery with Reasoning Inception (ICLR 2026)

Takyoung Kim^1,, Jinseok Nam², Chandrayee Basu², Xing Fan², Chengyuan Ma², Heng Ji¹, Gokhan Tur¹, Dilek Hakkani-Tür¹
_{¹University of Illinois Urbana-Champaign, ²Amazon
_{^Work done during an internship at Amazon}}

A self-contained implementation of Reasoning Inception (ReIn), a test-time intervention method that enables conversational agents to recover from contextually flawed interactions — without modifying the agent's parameters or system prompt.

Disclaimer: We provide a minimal implementation of the proposed method to facilitate future use, since this work builds on a benchmark with a complicated codebase.

Concept

When LLM-based agents interact with users through multi-turn tool-calling conversations, it is inevitable to encounter user-side errors. We assume two types of user-side errors in this work:

Ambiguous requests — the user's intent is unclear (e.g., "cancel it" could mean different things)
Unsupported requests — the user asks for something the system cannot do (e.g., creating a wishlist in a flight reservation scenario)

ReIn addresses these errors through a two-stage mechanism applied at test time:

┌─────────────────────────────────────────────────────────────┐
│                          ReIn                               │
│                                                             │
│  Stage 1: INCEPTION MODULE                                  │
│  ┌───────────────────────────────────────────────────────┐  │
│  │ External LLM analyzes the dialogue context:           │  │
│  │   - Are there error signals? (user frustration, etc.) │  │
│  │   - If YES → generate recovery reasoning              │  │
│  │   - If NO  → do nothing                               │  │
│  └───────────────────────────────────────────────────────┘  │
│                           │                                 │
│                     error detected?                         │
│                      /          \                           │
│                    YES           NO                         │
│                    /               \                        │
│  Stage 2: INJECT inception_block    proceed normally        │
│  ┌──────────────────────┐                                   │
│  │ Plant a "seed" of    │                                   │
│  │ recovery reasoning   │                                   │
│  │ into the agent's     │                                   │
│  │ initial context      │                                   │
│  └──────────────────────┘                                   │
│              │                                              │
│  Stage 3: AGENT ACTION LOOP                                 │
│  ┌───────────────────────────────────────────────────────┐  │
│  │ The task agent generates actions with the augmented   │  │
│  │ context. It may call tools or respond to the user.    │  │
│  │ The injected reasoning guides it toward recovery      │  │
│  │ actions (e.g., filing an ambiguity report, or         │  │
│  │ transferring to a human agent).                       │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

The key innovation is that ReIn works without modifying the agent's system prompt or model weights. Instead, it injects a single reasoning block (via the think tool) into the agent's internal context, which steers the agent's subsequent behavior through the instruction hierarchy.

As discussed in Section 4.6, defining a recovery tool is a key to override instruction hierarchy, which is safer solution given that tool schema definition can only be done by service providers.

File Structure

├── src/
│   ├── agents.py          # LLM wrappers: TaskAgent, InceptionModule, UserSimulator
│   ├── environment.py     # Simulated retail environment: database, tools, schemas
│   ├── rein.py            # Core Algorithm 1: run_agent_turn() + run_conversation()
│   └── run.py             # CLI entry point
├── assets/
│   ├── errors.json        # Error taxonomy (paper Appendix D.1.1)
│   ├── plans.json         # Recovery plans (paper Appendix D.1.2)
│   ├── prompt_rein.md     # Inception module prompt (paper Appendix D.5)
│   ├── prompt_usersim.md  # User simulator prompt (paper Appendix D.6)
│   └── prompt_system_retail.md  # Task agent system prompt (paper Appendix D.7.2)
├── scenarios.json         # 4 test scenarios
├── run_all.sh             # Runs all scenarios in ReIn and baseline modes
└── pyproject.toml         # uv project configuration

Reading order: Start with src/rein.py — it implements the core algorithm and is the paper's main contribution. Then src/agents.py for the LLM actors, and src/environment.py for the simulated tool environment.

Setup

Requires uv.

# Install dependencies into a local .venv
uv sync

# Set your API key (for OpenAI)
export OPENAI_API_KEY="sk-..."

Works with any OpenAI-compatible API endpoint (OpenAI, vLLM, etc.).

Usage

# Run all scenarios with ReIn enabled
uv run python src/run.py

# Run baseline (no ReIn) for comparison
uv run python src/run.py --no-rein

# Run a single scenario with verbose output
uv run python src/run.py --scenario ambiguous_anaphora_02 --verbose

# Use different models
uv run python src/run.py --agent-model gpt-4o --inception-model gpt-4o-mini

# Use a local model via vLLM (requires --enable-auto-tool-choice --tool-call-parser hermes)
uv run python src/run.py --base-url http://localhost:8000/v1 --agent-model Qwen/Qwen3-32B

Shell script

# Run all scenarios (ReIn + baseline)
bash run_all.sh

# Override model or server via environment variables
BASE_URL=http://localhost:8000/v1 AGENT_MODEL=Qwen/Qwen3-32B bash run_all.sh
VERBOSE=--verbose bash run_all.sh

vLLM setup

vLLM requires tool-calling flags to be enabled at startup:

vllm serve Qwen/Qwen3-32B \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --tensor-parallel-size 4

Without --enable-auto-tool-choice and --tool-call-parser, the server will reject requests that use tools.

Scenarios

Each scenario provides an initial 3-turn context {u₁, a₁, u₂} where the agent has already made an error. ReIn intervenes at the next turn to guide recovery.

Results below are averaged over 5 runs at temperature 0.7 using Qwen/Qwen3-32B on vLLM.

ID	Error Type	Subtype	Description	ReIn (Pass@5)	Baseline (Pass@5)
`ambiguous_anaphora_02`	Ambiguous	Anaphora	"the item" is ambiguous — order has both a mouse and a lamp	4/5 (80%)	0/5 (0%)
`ambiguous_multiple_interpretation_02`	Ambiguous	Multiple interpretation	"update my order" — agent picks shipping address, user meant payment method	4/5 (80%)	1/5 (20%)
`unsupported_action_02`	Unsupported	Action	User asks to set up automatic monthly reorders	5/5 (100%)	5/5 (100%)
`unsupported_parameter_02`	Unsupported	Parameter	User wants store credit as refund method (not supported)	4/5 (80%)	1/5 (20%)
Average				85%	35%

How it maps to the paper

Paper concept	Code location
Algorithm 1	`src/rein.py:run_agent_turn()`
Inception module F	`src/agents.py:InceptionModule`
Task agent π_c	`src/agents.py:TaskAgent`
think[ρ_t] injection	`src/rein.py:run_agent_turn()` lines 60-99
Recovery tools (ambiguity_report, transfer_to_human_agents)	`src/environment.py:RETAIL_TOOL_SCHEMAS`
Error definitions (Φ)	`assets/errors.json`
Recovery plans	`assets/plans.json`
Inception prompt (S')	`assets/prompt_rein.md`
User simulator prompt	`assets/prompt_usersim.md`
Task agent system prompt	`assets/prompt_system_retail.md`
Multi-turn conversation loop	`src/rein.py:run_conversation()`

Evaluation

Pass/fail is determined by whether the expected recovery tool was called at any point in the conversation:

Error type	Expected recovery tool	Rationale
Ambiguous	`ambiguity_report`	Logs the misunderstanding to the internal system (paper §3.2.2)
Unsupported	`transfer_to_human_agents`	Escalates requests beyond the agent's capability (paper §3.2.2)

This matches the paper's Pass@1 criterion: the task is considered complete if the agent invokes the correct recovery tool at least once during the session.

Citation

@inproceedings{kim2026rein,
    title={ReIn: Conversational Error Recovery with Reasoning Inception},
    author={Takyoung Kim and Jinseok Nam and Chandrayee Basu and Xing Fan and Chengyuan Ma and Heng Ji and Gokhan Tur and Dilek Hakkani-T{\"u}r},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=4J3kkHI6m5}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
run_all.sh		run_all.sh
scenarios.json		scenarios.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReIn: Conversational Error Recovery with Reasoning Inception (ICLR 2026)

Takyoung Kim^1,, Jinseok Nam², Chandrayee Basu², Xing Fan², Chengyuan Ma², Heng Ji¹, Gokhan Tur¹, Dilek Hakkani-Tür¹
_{¹University of Illinois Urbana-Champaign, ²Amazon
_{^Work done during an internship at Amazon}}

Concept

File Structure

Setup

Usage

Shell script

vLLM setup

Scenarios

How it maps to the paper

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ReIn: Conversational Error Recovery with Reasoning Inception (ICLR 2026)

Takyoung Kim1,*, Jinseok Nam2, Chandrayee Basu2, Xing Fan2, Chengyuan Ma2, Heng Ji1, Gokhan Tur1, Dilek Hakkani-Tür1 1University of Illinois Urbana-Champaign, 2Amazon *Work done during an internship at Amazon

Concept

File Structure

Setup

Usage

Shell script

vLLM setup

Scenarios

How it maps to the paper

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Takyoung Kim^1,, Jinseok Nam², Chandrayee Basu², Xing Fan², Chengyuan Ma², Heng Ji¹, Gokhan Tur¹, Dilek Hakkani-Tür¹
_{¹University of Illinois Urbana-Champaign, ²Amazon
_{^Work done during an internship at Amazon}}

Packages