This project builds city, district, or state level change layers from OlmoEarth embeddings and Sentinel-2 annual composites, adds WorldPop population change, then exports:
overlay.geojsonfor map overlayssummary.jsonfor downstream apps or newsroom pipelinesreport.mdfor a quick written briefui/static files for an interactive Leaflet overlay
The generator is designed around India by default. State and district lookups use geoBoundaries, and city lookups use OpenStreetMap geocoding with an optional --state hint for disambiguation.
uv syncuv run python scripts/generate_change_data.py \
--country IND \
--state "Uttar Pradesh" \
--district "Gautam Buddha Nagar" \
--output-dir outputs/noida \
--base-year 2025 \
--periods 1 5 10uv run python scripts/generate_change_data.py \
--country IND \
--state "Karnataka" \
--city "Bengaluru" \
--output-dir outputs/bengaluru \
--base-year 2025 \
--periods 1 5 10If you want one script to run directly from Colab, use:
python scripts/colab_generate_data.py \
--state "Uttar Pradesh" \
--district "Gautam Buddha Nagar" \
--output-dir /content/outputs/noida \
--periods 1 5 \
--enable-historical-imagery \
--max-tiles 1 \
--zip-outputNotes:
- it bootstraps missing Python packages automatically unless you pass
--skip-install - it can use
--device autoto pick GPU on Colab when available - historical imagery export is opt-in in this Colab script; pass
--enable-historical-imageryto generate preview PNGs + manifest - with
--zip-output, it creates a downloadable archive next to the output folder
Useful runtime controls:
--max-tiles 1for a quick pilot run only--tile-size-m 1280or2560to control area per tile--resolution-m 20for a much faster coarse run (default is 10)--model tinyfor the best CPU tradeoff--workers 4(or higher) for parallel tile processing on CPU--display-aggregation 4to keep the UI responsive--skip-populationif you want the fastest possible run and do not need WorldPop--skip-pollutionif you want to skip the Sentinel-2 aerosol pollution proxy--skip-wardsif you want to skip OSM ward aggregation and keep only cell overlays--skip-change-rastersto skip writing per-tile change rasters and reduce I/O--no-save-compositesto skip writing per-tile yearly composite GeoTIFFs--enable-historical-imageryto export historical imagery preview PNGs andhistorical_imagery.json
Reruns also reuse per-tile processed caches when inputs are unchanged:
outputs/.../years/<year>/<tile_id>_processed_display.npzoutputs/.../years/<year>/<tile_id>_processed_display.meta.json
Browser previews for the historical map toggle are also exported per year:
outputs/.../historical_imagery/<year>/<tile_id>.pngoutputs/.../historical_imagery.json
Cache effectiveness is reported in summary.json under metadata fields:
tile_display_cache_hitstile_display_cache_missestile_display_cache_hit_rate
For a boundary-shaped final map, do not use --max-tiles. That flag intentionally processes only the top overlap tiles, which is useful for fast smoke tests but not for a complete city/district/state overlay.
If you are seeing multi-hour runtime (for example, 1,000+ tiles), run in stages:
For full-quality runs on CPU, start by adding --workers before reducing quality options.
- Fast scan (quickly identify hotspots):
uv run python scripts/generate_change_data.py \
--country IND \
--state "Uttar Pradesh" \
--district "Gautam Buddha Nagar" \
--output-dir outputs/noida-fast \
--base-year 2025 \
--periods 1 5 \
--model nano \
--resolution-m 20 \
--skip-population \
--skip-pollution \
--skip-wards \
--skip-change-rasters- Balanced run (better quality with still lower runtime):
uv run python scripts/generate_change_data.py \
--country IND \
--state "Uttar Pradesh" \
--district "Gautam Buddha Nagar" \
--output-dir outputs/noida-balanced \
--base-year 2025 \
--periods 1 5 \
--model tiny \
--resolution-m 20 \
--skip-wards- Final publication run (full quality):
uv run python scripts/generate_change_data.py \
--country IND \
--state "Uttar Pradesh" \
--district "Gautam Buddha Nagar" \
--output-dir outputs/noida-final \
--base-year 2025 \
--periods 1 5 10 \
--model tiny \
--workers 6 \
--resolution-m 10Serve the output directory so the browser can fetch summary.json and overlay.geojson and optional ward_overlay.geojson:
cd outputs/noida
python -m http.server 8000Then open http://localhost:8000/ui/.
Do not open ui/index.html directly with file://. That can break local fetch(...) calls, and OpenStreetMap's tile usage policy expects requests to include a valid HTTP Referer, which direct file opens do not provide.
The UI includes:
- basemap selector for OSM, light basemap, imagery, or no basemap
- historical-image selector for the timeline-matched annual Sentinel-2 composite or the base year composite
- analysis-unit switcher for cell overlays and ward overlays when ward boundaries are available
- metric selector for embedding shift, vegetation, water, urbanization, bare soil, pollution proxy, and population delta
- period slider for 1y / 5y / 10y comparisons
- color-scaled overlays on a Leaflet base map
- hotspot cards sourced from the generated summary
If you want to suppress the basemap intentionally, open:
http://localhost:8000/ui/?basemap=none
uv run python scripts/run_india_news_scan.py \
--config config/target.json \
--output-dir outputs/india-news-scan \
--max-tiles 1That script keeps the run intentionally small and is meant for newsroom scouting or rapid prototyping. For a full city or district analysis, run generate_change_data.py without --max-tiles.
For each requested year snapshot, the pipeline:
- Resolves the target state or district boundary from geoBoundaries, or the target city boundary from OpenStreetMap geocoding when
--cityis used. - Tiles the area in the local UTM CRS.
- Downloads a cloud-median Sentinel-2 L2A annual composite from Microsoft Planetary Computer.
- Computes OlmoEarth embeddings locally with the open
olmoearth-pretrainmodel. - Derives interpretable spectral deltas:
NDVI,MNDWI,NDBI, andBSI. - Derives a pollution proxy from Sentinel-2 L2A
AOT(aerosol optical thickness), which is useful for aerosol loading changes but is not direct PM2.5 or gas concentration. - Downloads WorldPop 1km constrained population rasters for supported years (
2015-2030) and allocates those counts into the display cells. - Optionally resolves OSM ward-like administrative polygons for the selected district or city and aggregates cell metrics into ward polygons.
- Writes overlay cells with per-period properties like:
embedding_change_5y,vegetation_delta_5y,water_delta_5y,urban_delta_5y,pollution_delta_5y,population_delta_5y, plus award_overlay.geojsonwhen ward boundaries are available.
Note
the map uses embedding L2 shift as the main OlmoEarth change score because annual OlmoEarth vectors can stay nearly parallel across time, which makes cosine distance too flat for an interactive overlay.
-
Administrative boundaries: geoBoundaries, the geoBoundaries Global Database of Political Administrative Boundaries, is used for state and district boundaries. geoBoundaries requests web attribution and distributes data under CC BY 4.0.
-
Satellite imagery: annual composites are built from Copernicus Sentinel-2 L2A imagery accessed via Microsoft Planetary Computer. Sentinel-2 data remains subject to the Copernicus Sentinel Data Terms and Conditions.
-
Pollution proxy: the
pollution_delta_*metric is derived from Sentinel-2 L2AAOT(aerosol optical thickness) as documented in the official Copernicus Sentinel-2 L2A documentation. It is an aerosol-loading proxy, not direct PM2.5, NO2, or regulatory air-quality data. -
Population: population overlays come from WorldPop annual gridded population products. WorldPop's datasets are available under CC BY 4.0.
-
Ward overlays and OSM-derived layers: ward polygons are resolved from OpenStreetMap administrative data on a best-effort basis. OSM data is licensed under the ODbL 1.0.
-
Basemaps in the UI: when using the built-in map baselayers, preserve the attribution shown in the UI for OpenStreetMap, CARTO, and Esri.
-
Model attribution: embeddings are generated with OlmoEarth /
olmoearth_pretrainfrom Ai2.
- The default long-baseline
10yrequest can reach back into2015, where Sentinel-2 coverage is not as complete as later years. - Annual median composites suppress seasonal noise well, but they also smooth short-lived events.
- The static UI is intended for exploration; for publication you may want to add labels, annotation layers, and editorial notes.