Most-read articles on English Wikipedia — daily rankings from July 1, 2015 to yesterday.
Live site: https://statswiki.info/
Bluesky: https://bsky.app/profile/statswiki.bsky.social
Preprint (Wikirace) — From Events to Encyclopedic Attention
MIT license — fork for another language or project → ADAPT.md.
| Data | Wikimedia Pageviews API → Parquet → static JSON |
| Site | Vue 3 SPA on GitHub Pages (no runtime API calls) |
| Updates | Daily cron + manual backfill for history |
| Enrichment | Wikidata QID, label, description, image |
| Rankings | Top 50 per day, month, year, all-time |
Three live panels (top 50 each), with fallback to the latest available period when yesterday / this month are not yet ingested:
- Yesterday (or latest day)
- This month (or latest month)
- This year
| View | URL | Content |
|---|---|---|
| Day | /2026/05/31 |
Top 50 that day |
| Month | /2026/05 |
Top 50 aggregated over the month |
| Year | /2026 |
Top 50 aggregated over the year |
| All time | /alltime |
Top 50 since July 2015 |
Browse via Year / Month / Day dropdowns in the header (no date in the page title).
Click a Wikidata QID in any table → /q/Q22686 with monthly / yearly view charts, total views, peak period.
Each row: rank, Wikipedia link, QID, description, thumbnail (links to Wikimedia Commons), view count.
Compare daily Wikipedia pageviews for a group of articles over any date range. Methodology in the Wikirace preprint above.
| View | URL | Content |
|---|---|---|
| Builder | /wikirace |
Search catalog, pick articles, set dates |
| Race | /wikirace/Q1+Q2/YYYY-MM-DD/YYYY-MM-DD |
Chart, Race% table, shareable link |
| Help | /wikirace/help |
Public guide (from docs/wikirace-help.md) |
Race% = one article’s views as a % of the group total (area under the curve). Data is fetched live from the Wikimedia Pageviews API.
Docs: docs/wikirace.md (maintainer README) · docs/wikirace-help.md (public help → npm run build:help)
Wikimedia Pageviews API one HTTP request per day
│
▼
data/pageviews/ Parquet (date, article, views, rank)
data/articles.parquet Wikidata catalog
│
▼ aggregate + merge by QID
web/public/data/ static JSON (top 50 per period)
│
▼
Vue 3 SPA GitHub Pages CDN
Day → month → year: months and years are sums of daily rows, never fetched separately. See consolidation below.
Redirects: old article titles that share a Wikidata item have views merged before ranking.
# Pipeline
cd pipeline && python3 -m venv .venv && source .venv/bin/activate
pip install -e .
sw-fetch --date 2026-05-01 # one day
sw-backfill --year 2026 # full year
sw-daily # yesterday + export
sw-export-qids # QID time-series JSON
# Frontend
cd web && npm ci && npm run dev
# → http://localhost:5173/Custom domain: statswiki.info — DNS at the registrar, web/public/CNAME, and Settings → Pages → Custom domain on thepriben/StatsWiki.
- Settings → Pages → Source: GitHub Actions (one-time).
- Push to
main— Deploy Pages runs whenweb/ordata/changes. - Backfill and daily workflows commit data, then deploy in the same run.
| Workflow | Trigger | Role |
|---|---|---|
| Deploy Pages | Push or manual | Build Vue → publish |
| Daily update | 08:00 & 14:00 UTC or manual | Yesterday → daily top 5 + period posts → commit → deploy |
| Backfill | Manual (pick year) | One year of history |
| Backfill sequence | Manual | 2025 → 2016 in one job |
- Current year first — homepage needs recent data.
- Backfill sequence (or year-by-year) down to 2015 (July 1 for 2015).
- Leave Daily update enabled.
~5–10 minutes per year on GitHub Actions.
Wikimedia publishes top/day pageviews roughly 24 hours after UTC midnight. The workflow runs twice:
| Run | UTC | Purpose |
|---|---|---|
| Primary | 08:00 | Fetch yesterday, enrich, export |
| Retry | 14:00 | Same pipeline if morning data was not ready |
If data is not available yet: the fetch retries up to 3× per attempt (with backoff), then the job exits without commit or deploy. The 14:00 run tries again automatically.
If yesterday is already in the database (e.g. after a successful morning run), the fetch is skipped but enrich/export still run — useful if Wikidata mapping changed.
After each successful daily run:
| Trigger | When | Post |
|---|---|---|
| Day | Every run | Top 5 for yesterday |
| Week | Yesterday was Sunday | Top 5 for Mon–Sun (e.g. Mon 26 May – Sun 1 Jun 2026) |
| Month | Yesterday was the last day of the month | Top 5 for that month |
| Year | Yesterday was 31 December | Top 5 for that year |
Manual dry-run: sw-period-posts --dry-run --date YYYY-MM-DD --force
StatsWiki/
├── web/ # Vue 3 frontend
│ ├── src/
│ │ ├── App.vue # routing, header, home
│ │ ├── QidPage.vue # article stats + chart
│ │ ├── RankingTable.vue
│ │ ├── wikirace/ # Wikirace feature
│ │ └── lib.js
│ ├── public/wikirace/ # groups.json, catalog.json, help.json
│ └── public/data/ # generated JSON (+ q/Q*.json)
├── docs/
│ ├── wikirace.md # Wikirace maintainer README
│ └── wikirace-help.md # Wikirace public help (English)
├── data/ # Parquet source of truth
│ ├── pageviews/year=Y/month=M/
│ ├── articles.parquet
│ └── manifest.json
├── pipeline/src/statswiki/ # Python ETL
└── .github/workflows/
| Command | Purpose |
|---|---|
sw-fetch --date YYYY-MM-DD |
Ingest one day |
sw-backfill --year YYYY |
Ingest year + Wikidata top 1000 + export |
sw-daily |
Yesterday + enrich + export recent |
sw-enrich --top 500 |
Re-enrich top articles by total views |
sw-enrich --refresh-shadows 100 |
Retry unresolved QIDs |
sw-export --recent |
Rebuild yesterday / month / year / alltime JSON |
sw-export --year YYYY |
Export all periods for one year |
sw-export-qids |
Export data/q/Q*.json time series for charts |
sw-wikirace-catalog |
Export web/public/wikirace/catalog.json for autocomplete |
sw-period-posts |
Post week/month/year top 5 to X and Bluesky when due |
npm (in web/) |
Purpose |
|---|---|
npm run build:help |
docs/wikirace-help.md → web/public/wikirace/help.json |
All ingest is idempotent — existing days are skipped.
| Column | Description |
|---|---|
date |
Day |
article |
Title with underscores (as in API) |
views |
View count |
rank |
Position in daily top ~1000 |
| Column | Description |
|---|---|
article |
Pageview title |
qid |
Wikidata QID (e.g. Q22686) |
resolved_title |
Canonical title after Wikipedia redirects |
label, description, image |
From Wikidata |
updated_at |
Last enrichment |
Each file has period, lines (array of ranked articles), and optionally nav (sub-links on year/month views).
| Field | Description |
|---|---|
rank |
1–50 |
title |
Wikipedia title (Article_Name) |
label |
Display name from Wikidata |
description |
Short Wikidata description |
views |
View count for the period |
qid |
Wikidata ID (e.g. Q12345) |
image |
Commons thumbnail URL |
manifest.json — start, end, updated, language.
1 API call / day → Parquet row per (date, article)
│
├─ SUM(days in month) → month/YYYY/MM.json
├─ SUM(days in year) → year/YYYY.json
└─ SUM(all days) → alltime.json
Batched enrichment (50 titles / request):
- QID — Wikipedia
pageprops, follows redirects - Fallbacks — Wikidata search + opensearch
- Entity — label, description, image (P18 / P154)
- Export — merge views by QID before top-50 ranking
Manual overrides in filters.py for edge cases. Shadow QIDs (Q_en_…) retried on high-traffic articles.
Modules: wikidata.py, mapping.py, qid_export.py.
This repo tracks English Wikipedia only. To run StatsWiki for French, German, Japanese, etc.:
→ ADAPT.md — step-by-step fork guide (config, Pages URL, Wikidata language, backfill).
Multi-language in a single site is not implemented. One fork per language is the intended model. Pull requests to this repo are not accepted — fork under MIT and maintain your own copy.
Code: MIT
Data (Wikipedia / Wikidata content shown on the site): Wikimedia Terms of Use, Wikidata CC0 (Commons images retain their own licenses).