Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/sphinx_doc/assets/bcp_reward.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/sphinx_doc/assets/bcp_searchcall.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
132 changes: 132 additions & 0 deletions examples/browse_comp_plus/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Example of Training a BrowseComp-Plus Search Agent

This example demonstrates how to train a web search and information retrieval agent on the **BrowseComp-Plus** dataset using the ReAct (Reasoning and Acting) paradigm.

BrowseComp-Plus is a comprehensive benchmark for evaluating information retrieval and question answering capabilities. The original dataset and benchmark can be found at [BrowseComp-Plus GitHub](https://github.com/texttron/BrowseComp-Plus).

The config file is located in [`bcp_config.yaml`](bcp_config.yaml).

## Key Features

* **Training ReAct Agent**: The workflow trains a ReAct agent that can reason and act with search tools to find information and answer questions.
* **Local Search Integration**: The agent uses local BM25 or dense retrieval search (no external API required) via BrowseComp-Plus's built-in searcher.
* **Tool-based Interaction**: The agent can:
* **Search**: Query the search index to find relevant documents
* **Get Document** (optional): Retrieve full document content by document ID
* **LLM-as-Judge Evaluation**: The agent's final answer is evaluated by an auxiliary "judge" LLM against ground-truth answers to generate reward signals for training.
* **Asynchronous Execution**: The workflow is designed to run asynchronously for better performance.

## Prerequisites

Before running this workflow, please complete the following setup steps.

### 1. Install BrowseComp-Plus

Clone and set up the BrowseComp-Plus repository:

```bash
# Clone the repository
git clone https://github.com/texttron/BrowseComp-Plus.git

# Set the environment variable (add this to your ~/.bashrc or ~/.zshrc for persistence)
export BROWSECOMP_PATH="/path/to/BrowseComp-Plus"

# Install dependencies
cd $BROWSECOMP_PATH
pip install -r requirements.txt
```

### 2. Download and Decrypt the Dataset

Follow the instructions in BrowseComp-Plus to download and decrypt the dataset:

```bash
cd $BROWSECOMP_PATH

# Download the encrypted dataset
# Follow instructions at: https://github.com/texttron/BrowseComp-Plus#data
python scripts_build_index/decrypt_dataset.py --output data/browsecomp_plus_decrypted.jsonl --generate-tsv topics-qrels/queries.tsv
```

### 3. Build the Search Index

Build the BM25 search index (or other index types if preferred):

```bash
cd $BROWSECOMP_PATH

# Build Search index
bash scripts_build_index/download_indexes.sh

# (Optional) To try out other retrieval index methods, please refer to instructions in BrowseComp-Plus Repo
```

### 4. Generate Trinity-RFT Format Dataset

Convert the BrowseComp-Plus dataset to Trinity-RFT format:

```bash
# From the Trinity-RFT root directory
python examples/browse_comp_plus/get_browse_comp_data_for_trinity.py \
--input $BROWSECOMP_PATH/data/browsecomp_plus_decrypted.jsonl \
--output_dir data/trinity_format \
--train_size 400 \
--test_size 200 \
--seed 42
```

This will create:
- `data/trinity_format/train.jsonl`: Training set (400 samples)
- `data/trinity_format/test.jsonl`: Test set (200 samples)

### 5. Set Environment Variables and Config

The configuration file uses environment variables with sensible defaults. Set the required variables:

```bash
# Required: Path to BrowseComp-Plus directory
export BROWSECOMP_PATH="/path/to/BrowseComp-Plus"
```

You should also set the `model_path` and the `auxiliary_model_path` in `bcp_config.yaml`.

## Running the Training

Once everything is configured, start the training:

```bash
# Make sure environment variables are set
export BROWSECOMP_PATH="/path/to/BrowseComp-Plus"
export TRINITY_TASKSET_PATH="data/trinity_format"

# start the ray server
ray start --head

# Run training
trinity run --config examples/browse_comp_plus/bcp_config.yaml
```

### Workflow Arguments

The `workflow_args` section controls the agent's behavior:

* **`searcher_type`**: Type of search index to use (e.g. `"bm25"`, etc.)
* **`index_path`**: Path to the search index (uses `BROWSECOMP_INDEX_PATH` env variable)
* **`browsecomp_path`**: Path to BrowseComp-Plus directory (uses `BROWSECOMP_PATH` env variable)
* **`max_iterations`**: Maximum number of search/reasoning steps (default: 30)
* **`top_k`**: Number of search results returned per query (default: 5)
* **`snippet_max_tokens`**: Maximum tokens to include from each document snippet (default: 512)
* **`include_get_document`**: Whether to enable the `get_document` tool (default: false)


## Results

From the below curve you can see that the agent learns to leverage more search calls to gain more accurate answers.

Reward curve:

![](../../docs/sphinx_doc/assets/bcp_reward.png)

Search call curve:

![](../../docs/sphinx_doc/assets/bcp_searchcall.png)
135 changes: 135 additions & 0 deletions examples/browse_comp_plus/bcp_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
project: "Trinity_BrowseComp_Plus"
name: "BrowseComp_Plus_Simple_React_Agent"
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}

algorithm:
algorithm_type: multi_step_grpo
repeat_times: 8 # Number of rollouts per sample for GRPO
advantage_fn_args:
std_threshold: 0.001
optimizer:
lr: 1e-6

model:
# Main agent model for rollout
model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-4B-Instruct-2507}
max_response_tokens: 10000
max_model_len: 64000

cluster:
node_num: 1
gpu_per_node: 8

buffer:
total_epochs: 128
batch_size: 64
train_batch_size: 512 # Total batch size: batch_size * gpu_per_node * gradient_accumulation

explorer_input:
# Training dataset
taskset:
name: browsecomp_train
storage_type: file
path: ${oc.env:TRINITY_TASKSET_PATH,data/trinity_format}
split: train
format:
prompt_key: 'query' # Field name for the query
response_key: 'answer' # Field name for ground truth answer
workflow_args:
# Uses local searcher (no MCP server required)
max_iterations: 30 # Maximum conversation rounds
max_model_tokens: 64000 # Filter experiences longer than this
# Local searcher configuration
searcher_type: "bm25" # Type of searcher: bm25, dense, etc.
index_path: ${oc.env:BROWSECOMP_INDEX_PATH,indexes/bm25} # Path to search index (relative to BROWSECOMP_PATH)
browsecomp_path: ${oc.env:BROWSECOMP_PATH,null} # Path to BrowseComp-Plus directory
top_k: 5 # Number of search results per query
snippet_max_tokens: 512 # Max tokens per document snippet
include_get_document: false # Whether to include get_document tool
rollout_args:
temperature: 1.0
top_p: 1.0
max_tokens: 10000
enable_progress_bar: true

# Evaluation datasets
eval_tasksets:
- name: browsecomp_eval
storage_type: file
path: ${oc.env:TRINITY_TASKSET_PATH,data/trinity_format}
split: test
format:
prompt_key: 'query'
response_key: 'answer'
workflow_args:
max_iterations: 30
max_model_tokens: 64000
searcher_type: "bm25"
index_path: ${oc.env:BROWSECOMP_INDEX_PATH,indexes/bm25}
browsecomp_path: ${oc.env:BROWSECOMP_PATH,null}
top_k: 5
snippet_max_tokens: 512
include_get_document: false
rollout_args:
temperature: 1.0
max_tokens: 10000
top_p: 1.0
enable_progress_bar: true

default_workflow_type: 'bcp_simple_react_workflow'

trainer_input:
experience_buffer:
name: experience_buffer
storage_type: queue
max_read_timeout: 7200
replay_buffer:
enable: true

explorer:
eval_interval: 10 # Evaluate every 10 training iterations
max_repeat_times_per_runner: 4
max_timeout: 3600 # 1 hour timeout per rollout
runner_per_model: 16

# Rollout model configuration (agent model)
rollout_model:
enable_thinking: true
enable_history: true
enable_openai_api: true
enable_auto_tool_choice: true # Enable automatic tool calling
tool_call_parser: hermes # Tool call parser format
engine_num: 2 # Number of vLLM engines
tensor_parallel_size: 1 # Tensor parallelism per engine
enable_prefix_caching: false
enforce_eager: true
dtype: bfloat16
seed: 42
gpu_memory_utilization: 0.7
enable_chunked_prefill: true

# Auxiliary models (judge model for evaluation)
auxiliary_models:
- model_path: ${oc.env:TRINITY_JUDGE_MODEL_PATH,qwen/Qwen3-30B-A3B-Instruct-2507}
engine_num: 1
tensor_parallel_size: 2 # Use 2 GPUs for the larger judge model
enable_thinking: false
max_prompt_tokens: 20480
max_response_tokens: 8192
max_model_len: 32000

synchronizer:
sync_style: dynamic_by_explorer
sync_method: 'nccl'
sync_interval: 4 # Sync every 4 batches
sync_timeout: 7200

trainer:
save_interval: 20 # Save checkpoint every 20 iterations
grad_clip: 1.0
use_dynamic_bsz: true
max_token_len_per_gpu: 16384
ulysses_sequence_parallel_size: 4

monitor:
monitor_type: wandb
Loading
Loading