Skip to content

youngerous/rein

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReIn: Conversational Error Recovery with Reasoning Inception (ICLR 2026)

Paper Paper

Takyoung Kim1,*, Jinseok Nam2, Chandrayee Basu2, Xing Fan2, Chengyuan Ma2, Heng Ji1, Gokhan Tur1, Dilek Hakkani-Tür1
1University of Illinois Urbana-Champaign, 2Amazon
*Work done during an internship at Amazon

A self-contained implementation of Reasoning Inception (ReIn), a test-time intervention method that enables conversational agents to recover from contextually flawed interactions — without modifying the agent's parameters or system prompt.

Disclaimer: We provide a minimal implementation of the proposed method to facilitate future use, since this work builds on a benchmark with a complicated codebase.

Concept

When LLM-based agents interact with users through multi-turn tool-calling conversations, it is inevitable to encounter user-side errors. We assume two types of user-side errors in this work:

  1. Ambiguous requests — the user's intent is unclear (e.g., "cancel it" could mean different things)
  2. Unsupported requests — the user asks for something the system cannot do (e.g., creating a wishlist in a flight reservation scenario)

ReIn addresses these errors through a two-stage mechanism applied at test time:

┌─────────────────────────────────────────────────────────────┐
│                          ReIn                               │
│                                                             │
│  Stage 1: INCEPTION MODULE                                  │
│  ┌───────────────────────────────────────────────────────┐  │
│  │ External LLM analyzes the dialogue context:           │  │
│  │   - Are there error signals? (user frustration, etc.) │  │
│  │   - If YES → generate recovery reasoning              │  │
│  │   - If NO  → do nothing                               │  │
│  └───────────────────────────────────────────────────────┘  │
│                           │                                 │
│                     error detected?                         │
│                      /          \                           │
│                    YES           NO                         │
│                    /               \                        │
│  Stage 2: INJECT inception_block    proceed normally        │
│  ┌──────────────────────┐                                   │
│  │ Plant a "seed" of    │                                   │
│  │ recovery reasoning   │                                   │
│  │ into the agent's     │                                   │
│  │ initial context      │                                   │
│  └──────────────────────┘                                   │
│              │                                              │
│  Stage 3: AGENT ACTION LOOP                                 │
│  ┌───────────────────────────────────────────────────────┐  │
│  │ The task agent generates actions with the augmented   │  │
│  │ context. It may call tools or respond to the user.    │  │
│  │ The injected reasoning guides it toward recovery      │  │
│  │ actions (e.g., filing an ambiguity report, or         │  │
│  │ transferring to a human agent).                       │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

The key innovation is that ReIn works without modifying the agent's system prompt or model weights. Instead, it injects a single reasoning block (via the think tool) into the agent's internal context, which steers the agent's subsequent behavior through the instruction hierarchy.

As discussed in Section 4.6, defining a recovery tool is a key to override instruction hierarchy, which is safer solution given that tool schema definition can only be done by service providers.

File Structure

├── src/
│   ├── agents.py          # LLM wrappers: TaskAgent, InceptionModule, UserSimulator
│   ├── environment.py     # Simulated retail environment: database, tools, schemas
│   ├── rein.py            # Core Algorithm 1: run_agent_turn() + run_conversation()
│   └── run.py             # CLI entry point
├── assets/
│   ├── errors.json        # Error taxonomy (paper Appendix D.1.1)
│   ├── plans.json         # Recovery plans (paper Appendix D.1.2)
│   ├── prompt_rein.md     # Inception module prompt (paper Appendix D.5)
│   ├── prompt_usersim.md  # User simulator prompt (paper Appendix D.6)
│   └── prompt_system_retail.md  # Task agent system prompt (paper Appendix D.7.2)
├── scenarios.json         # 4 test scenarios
├── run_all.sh             # Runs all scenarios in ReIn and baseline modes
└── pyproject.toml         # uv project configuration

Reading order: Start with src/rein.py — it implements the core algorithm and is the paper's main contribution. Then src/agents.py for the LLM actors, and src/environment.py for the simulated tool environment.

Setup

Requires uv.

# Install dependencies into a local .venv
uv sync

# Set your API key (for OpenAI)
export OPENAI_API_KEY="sk-..."

Works with any OpenAI-compatible API endpoint (OpenAI, vLLM, etc.).

Usage

# Run all scenarios with ReIn enabled
uv run python src/run.py

# Run baseline (no ReIn) for comparison
uv run python src/run.py --no-rein

# Run a single scenario with verbose output
uv run python src/run.py --scenario ambiguous_anaphora_02 --verbose

# Use different models
uv run python src/run.py --agent-model gpt-4o --inception-model gpt-4o-mini

# Use a local model via vLLM (requires --enable-auto-tool-choice --tool-call-parser hermes)
uv run python src/run.py --base-url http://localhost:8000/v1 --agent-model Qwen/Qwen3-32B

Shell script

# Run all scenarios (ReIn + baseline)
bash run_all.sh

# Override model or server via environment variables
BASE_URL=http://localhost:8000/v1 AGENT_MODEL=Qwen/Qwen3-32B bash run_all.sh
VERBOSE=--verbose bash run_all.sh

vLLM setup

vLLM requires tool-calling flags to be enabled at startup:

vllm serve Qwen/Qwen3-32B \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --tensor-parallel-size 4

Without --enable-auto-tool-choice and --tool-call-parser, the server will reject requests that use tools.

Scenarios

Each scenario provides an initial 3-turn context {u₁, a₁, u₂} where the agent has already made an error. ReIn intervenes at the next turn to guide recovery.

Results below are averaged over 5 runs at temperature 0.7 using Qwen/Qwen3-32B on vLLM.

ID Error Type Subtype Description ReIn (Pass@5) Baseline (Pass@5)
ambiguous_anaphora_02 Ambiguous Anaphora "the item" is ambiguous — order has both a mouse and a lamp 4/5 (80%) 0/5 (0%)
ambiguous_multiple_interpretation_02 Ambiguous Multiple interpretation "update my order" — agent picks shipping address, user meant payment method 4/5 (80%) 1/5 (20%)
unsupported_action_02 Unsupported Action User asks to set up automatic monthly reorders 5/5 (100%) 5/5 (100%)
unsupported_parameter_02 Unsupported Parameter User wants store credit as refund method (not supported) 4/5 (80%) 1/5 (20%)
Average 85% 35%

How it maps to the paper

Paper concept Code location
Algorithm 1 src/rein.py:run_agent_turn()
Inception module F src/agents.py:InceptionModule
Task agent π_c src/agents.py:TaskAgent
think[ρ_t] injection src/rein.py:run_agent_turn() lines 60-99
Recovery tools (ambiguity_report, transfer_to_human_agents) src/environment.py:RETAIL_TOOL_SCHEMAS
Error definitions (Φ) assets/errors.json
Recovery plans assets/plans.json
Inception prompt (S') assets/prompt_rein.md
User simulator prompt assets/prompt_usersim.md
Task agent system prompt assets/prompt_system_retail.md
Multi-turn conversation loop src/rein.py:run_conversation()

Evaluation

Pass/fail is determined by whether the expected recovery tool was called at any point in the conversation:

Error type Expected recovery tool Rationale
Ambiguous ambiguity_report Logs the misunderstanding to the internal system (paper §3.2.2)
Unsupported transfer_to_human_agents Escalates requests beyond the agent's capability (paper §3.2.2)

This matches the paper's Pass@1 criterion: the task is considered complete if the agent invokes the correct recovery tool at least once during the session.

Citation

@inproceedings{kim2026rein,
    title={ReIn: Conversational Error Recovery with Reasoning Inception},
    author={Takyoung Kim and Jinseok Nam and Chandrayee Basu and Xing Fan and Chengyuan Ma and Heng Ji and Gokhan Tur and Dilek Hakkani-T{\"u}r},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=4J3kkHI6m5}
}

About

[ICLR'26] ReIn: Conversational Error Recovery with Reasoning Inception

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors