GitHub - EvolvingLMMs-Lab/EASI: Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

EASI: Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

TL;DR

EASI is a unified evaluation suite for Spatial Intelligence in multimodal LLMs.
EASI supports two evaluation backends: VLMEvalKit and lmms-eval.
After installation, you can run all EASI-8 benchmarks with a single command:

Using VLMEvalKit backend (default):

python scripts/submissions/run_easi_eval.py \
  --model sensenova/SenseNova-SI-1.3-InternVL3-8B \
  --nproc 4

Using lmms-eval backend:

python scripts/submissions/run_easi_eval.py \
  --backend lmms-eval \
  --model internvl2 \
  --model-args "pretrained=sensenova/SenseNova-SI-1.3-InternVL3-8B" \
  --nproc 4

Under the hood, EASI wraps VLMEvalKit and lmms-eval with a unified CLI. See the respective repos for advanced usage and adding custom models.

Overview

EASI is a unified evaluation suite for Spatial Intelligence. It benchmarks state-of-the-art proprietary and open-source multimodal LLMs across a growing set of spatial benchmarks.

Comprehensive Support: Currently EASI(v0.2.1) supports 23 Spatial Intelligence models and 27 spatial benchmarks.
Dual Backends:
- VLMEvalKit: Rich model zoo with built-in judging capabilities.
- lmms-eval: Lightweight, accelerate-based distributed evaluation.

Full details are available at 👉 Supported Models & Benchmarks. EASI also provides transparent 👉 Benchmark Verification against official scores.

🗓️ News

🌟 [2026-02-09] EASI v0.2.1 is released. Major updates include:

Expanded benchmark support: Added ERIQ and OSI-Bench.
Bug fixes: Fixed VLMEvalKit evaluation issues on MuirBench.
Benchmark verification: Added more lmms-eval benchmark verification entries.

🌟 [2026-01-16] EASI v0.2.0 is released. Major updates include:

New Backend Support: Integrated lmms-eval alongside VLMEvalKit, offering flexible evaluation options.
Expanded benchmark support: Added DSR-Bench.

For the full release history and detailed changelog, please see 👉 Changelog.

🛠️ QuickStart

Installation

Option 1: Local environment (Recommended)

The setup script installs both evaluation backends (VLMEvalKit and lmms-eval) with pinned dependencies:

git clone --recursive https://github.com/EvolvingLMMs-Lab/EASI.git
cd EASI
bash scripts/setup.sh
source .venv/bin/activate

This creates a Python 3.11 virtual environment with both backends, flash-attn, and all required dependencies. See scripts/setup.sh for details.

Option 2: Docker-based environment

bash dockerfiles/EASI/build_runtime_docker.sh

docker run --gpus all -it --rm \
  -v /path/to/your/data:/mnt/data \
  --name easi-runtime \
  VLMEvalKit_EASI:latest \
  /bin/bash

Evaluation

EASI provides a unified evaluation script that supports both VLMEvalKit and lmms-eval backends. The script handles dataset preparation, evaluation, result collection, and optional leaderboard submission.

Using the Unified Evaluation Script (Recommended)

VLMEvalKit backend (default):

# Run EASI-8 core benchmarks on 4 GPUs
python scripts/submissions/run_easi_eval.py \
  --model sensenova/SenseNova-SI-1.3-InternVL3-8B \
  --nproc 4

lmms-eval backend:

# Run EASI-8 core benchmarks on 4 GPUs
python scripts/submissions/run_easi_eval.py \
  --backend lmms-eval \
  --model internvl2 \
  --model-args "pretrained=sensenova/SenseNova-SI-1.3-InternVL3-8B" \
  --nproc 4

With automated submission:

python scripts/submissions/run_easi_eval.py \
  --backend lmms-eval \
  --model internvl2 \
  --model-args "pretrained=sensenova/SenseNova-SI-1.3-InternVL3-8B" \
  --nproc 4 \
  --submit \
  --submission-configs '{
    "modelName": "sensenova/SenseNova-SI-1.3-InternVL3-8B",
    "modelType": "instruction",
    "precision": "bfloat16"
  }'

More options:

# Run specific benchmarks only
python scripts/submissions/run_easi_eval.py \
  --model Qwen/Qwen2.5-VL-7B-Instruct \
  --benchmarks vsi_bench,blink,sitebench

# Include extra benchmarks (MMSI-Video, OmniSpatial, SPAR-Bench, VSI-Debiased)
python scripts/submissions/run_easi_eval.py \
  --model Qwen/Qwen2.5-VL-7B-Instruct \
  --nproc 8 --include-extra

# Force re-evaluation (ignore previous results)
python scripts/submissions/run_easi_eval.py \
  --model Qwen/Qwen2.5-VL-7B-Instruct \
  --nproc 8 --rerun

# lmms-eval OOM recovery: complete failed benchmarks in single-GPU mode
python scripts/submissions/run_easi_eval.py \
  --backend lmms-eval \
  --model qwen3_vl \
  --model-args "pretrained=Qwen/Qwen3-VL-8B-Instruct,attn_implementation=flash_attention_2" \
  --no-accelerate

Full CLI options and submission config details at 👉 Submission Guide.

Using Backends Directly

For advanced usage or custom model integration, you can also call the backends directly:

VLMEvalKit:

cd VLMEvalKit/
python run.py --data MindCubeBench_tiny_raw_qa \
              --model SenseNova-SI-1.3-InternVL3-8B \
              --verbose --reuse --judge extract_matching

lmms-eval:

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch \
    --num_processes=4 -m lmms_eval \
    --model internvl2 \
    --model_args=pretrained=sensenova/SenseNova-SI-1.3-InternVL3-8B \
    --tasks vsibench_multiimage \
    --batch_size 1 --log_samples --output_path ./logs/

For more details, refer to the VLMEvalKit documentation and lmms-eval documentation.

Configuration

Supported Models & Benchmarks: Summarized in Supported Models & Benchmarks.
VLMEvalKit Models: Defined in vlmeval/config.py. Verify inference with vlmutil check {MODEL_NAME}.
lmms-eval Models: Supports various model types (qwen2_5_vl, llava, internvl2, etc.). See the lmms-eval models directory.

Submission

You can submit your evaluation results at 👉 EASI Leaderboard Submission.

Full details and file format examples are available at 👉 Submission Guide.

🤝 Contribution

EASI is an open and evolving evaluation suite. We warmly welcome community contributions, including:

New spatial benchmarks
New model baselines
Evaluation tools

If you are interested in contributing, or have questions about integration, please contact us at 📧 [email protected]

🖊️ Citation

@article{easi2025,
  title={Holistic Evaluation of Multimodal LLMs on Spatial Intelligence},
  author={Cai, Zhongang and Wang, Yubo and Sun, Qingping and Wang, Ruisi and Gu, Chenyang and Yin, Wanqi and Lin, Zhiqian and Yang, Zhitao and Wei, Chen and Shi, Xuanke and Deng, Kewang and Han, Xiaoyang and Chen, Zukai and Li, Jiaqi and Fan, Xiangyu and Deng, Hanming and Lu, Lewei and Li, Bo and Liu, Ziwei and Wang, Quan and Lin, Dahua and Yang, Lei},
  journal={arXiv preprint arXiv:2508.13142},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
VLMEvalKit @ b40838e		VLMEvalKit @ b40838e
assets		assets
dockerfiles		dockerfiles
docs		docs
examples/lmms-eval		examples/lmms-eval
lmms-eval @ 5753e51		lmms-eval @ 5753e51
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TL;DR

Overview

🗓️ News

🛠️ QuickStart

Installation

Option 1: Local environment (Recommended)

Option 2: Docker-based environment

Evaluation

Using the Unified Evaluation Script (Recommended)

Using Backends Directly

Configuration

Submission

🤝 Contribution

🖊️ Citation

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TL;DR

Overview

🗓️ News

🛠️ QuickStart

Installation

Option 1: Local environment (Recommended)

Option 2: Docker-based environment

Evaluation

Using the Unified Evaluation Script (Recommended)

Using Backends Directly

Configuration

Submission

🤝 Contribution

🖊️ Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages