AI-native query and agent tooling for the papercopilot/paperlists conference-paper corpus.
This repo is the dedicated home for the agent layer that started in
papercopilot/paperlists#29.
It keeps the data corpus and the agent/query product separate:
| Surface | Directory | Use it when |
|---|---|---|
| FastAPI query service | query-api/ |
You want HTTPS or localhost access to the corpus |
| MCP server | mcp-server/ |
You want Claude Code, Cursor, Codex, Claude Desktop, or another MCP host to query papers |
| Cross-tool Skill + CLI | skill/ |
You want a portable markdown skill and stdlib-only command-line client |
The core verbs are research-evolution oriented, not just keyword search:
topic_trend: yearly topic volume and citation-weighted volumetopic_evolution: per-year/per-window keywords, venues, and landmark paperscompare_periods: emerged/faded/sustained terms, authors, and affiliationsauthor_trajectory: papers by author across yearsfield_landscape: single-year field snapshotcorpus_manifest: corpus freshness/provenance contract for the data pipeline
Use the hosted demo only for evaluation:
export PAPERLISTS_API_URL=https://api-production-18d3.up.railway.app
python3 skill/scripts/paperlists.py coverage
python3 skill/scripts/paperlists.py corpus_manifest # confirm api.version/build identity
python3 skill/scripts/paperlists.py topic_evolution q="LLM reasoning" year_from=2024 year_to=2025 conferences=iclr,nips,icml,acl,emnlp match_mode=token_andFor longitudinal claims, require corpus_manifest.api.version >= 0.2.0. For
deployment canaries that must prove the endpoint is current HEAD, require a
known corpus_manifest.api.git_sha; version alone only rejects pre-0.2 demos.
Older demos used token-AND query semantics without match_mode,
query_expression, venue_diff, or query-noise metadata.
For a local API:
cd query-api
uv run python -m paperlists_api.indexer /path/to/paperlists ./papers.db
PAPERLISTS_DB=$PWD/papers.db uv run uvicorn paperlists_api.main:app --reloadThen visit http://127.0.0.1:8000/docs.
The root Dockerfile fetches the upstream paperlists JSON archive during build
and bakes a sqlite FTS5 index into the runtime image. This avoids committing
or uploading the raw data.
Railway can deploy from the repo root:
railway upRuntime knobs:
WEB_CONCURRENCY=4PAPERLISTS_RATE_PER_MIN=60PAPERLISTS_RATE_BURST=20PAPERLISTS_TRUST_PROXY=autoPAPERLISTS_DB=/app/papers.dbPAPERLISTS_GIT_SHA,PAPERLISTS_GIT_BRANCH,PAPERLISTS_DEPLOYMENT_ID,PAPERLISTS_ENVIRONMENTare exposed in/,/healthz, and/v1/corpus_manifest; the Dockerfile maps Railway's Git build args into these fields when Railway provides them.
cd query-api
uv run --extra dev pytest -q
uv run python -m compileall paperlists_api ../mcp-server/paperlists_mcp ../skill/scripts/paperlists.pyThe API currently indexes 237,735 papers across 31 venues in the hosted demo.
Local papers.db files are generated artifacts and must not be committed.