This monorepo contains the following components:
- Data Parser - Parses CML data and metadata CSV files from SFTP uploads into the database
- Database - TimescaleDB for storing time series data and metadata
- Data Processor - (Stub implementation) Placeholder for future data analysis and processing logic
- Webserver - Main user-facing web application with interactive visualizations
- Grafana - Real-time dashboards for CML data visualization
- MNO Data Source Simulator - Simulates real-time CML data from MNO sources via SFTP
- SFTP Receiver - Receives uploaded CML data files
The webserver provides an intuitive interface with four main pages:
- Landing Page (
/) - System overview with data statistics and processing status - Real-Time Data (
/realtime) - Interactive CML network map with Grafana-embedded time series plots - Archive (
/archive) - Long-term archive statistics and data distribution analysis - Data Uploads (
/data-uploads) - File upload interface for CML data files
- Docker and Docker Compose
- Git
-
Clone the repository:
git clone https://github.com/OpenSenseAction/GMDI_prototype.git cd GMDI_prototype -
Generate SSH keys for SFTP server:
cd ssh_keys ./generate_ssh_keys.sh cd ..
-
Build and run the containers:
docker compose up -d
-
Access the services:
- Webserver (Main UI): http://localhost:5000
- Grafana Dashboards: http://localhost:3000
- Database: localhost:5432
- SFTP Server: localhost:2222
Note: The processor service (port 5002) is currently a minimal stub implementation.
- MNO Simulator → generates CML data from NetCDF files and uploads via SFTP to SFTP Receiver
- Parser → watches SFTP upload directory and processes CSV files (both metadata and data)
- Parser → validates and writes parsed data to Database (TimescaleDB)
- Webserver → serves UI and provides API access to database
- Grafana → visualizes real-time data from database with embedded dashboards
On docker compose up the archive_generator service automatically generates
a 1-day archive at 10-second resolution from the 3-month OpenMRG NetCDF file
and the archive_loader service bulk-loads it into the database.
Defaults (overridable via environment variables):
- 728 CML sublinks (364 unique CML IDs) covering Berlin area
- ~6.3M data rows at 10-second intervals over 1 day
- Generates in ~15 s, loads in ~15 s
NetCDF source file (openMRG_cmls_20150827_3months.nc, ~193 MB) is
gitignored. If not present in parser/example_data/, it is downloaded
automatically at startup via NETCDF_FILE_URL.
# Longer archive or different resolution via environment variables:
ARCHIVE_DAYS=7 ARCHIVE_INTERVAL_SECONDS=60 docker compose up -d| Variable | Default | Description |
|---|---|---|
ARCHIVE_DAYS |
1 |
Days of history to generate |
ARCHIVE_INTERVAL_SECONDS |
10 |
Time step in seconds |
NETCDF_FILE_URL |
KIT download link | URL to fetch the NetCDF file if absent |
docker compose down -v # Remove volumes
docker compose up -d # Regenerate and reload from scratchFor a full 3-month archive at native 10-second resolution (~579M rows):
docker compose run --rm -e DB_HOST=database parser \
python /app/parser/parse_netcdf_archive.pyUse ARCHIVE_MAX_DAYS to limit the time window (default: 7 days,
0 = no limit). Requires at least 4 GB RAM for the full dataset.
The webserver supports multiple storage backends for received files:
- Local filesystem (default) - For development and testing
- MinIO - S3-compatible object storage (optional)
- AWS S3 - Production object storage (configure via environment variables)
To use MinIO, uncomment the minio service in docker-compose.yml and set:
environment:
- STORAGE_BACKEND=minio
- STORAGE_S3_BUCKET=cml-data
- STORAGE_S3_ENDPOINT=http://minio:9000Each tenant has:
- a PostgreSQL login role whose name equals the
user_idstored in the data tables - a Grafana organisation (org) with a dedicated datasource connecting as that role
- a Flask login account in
webserver/configs/users.json
Row-Level Security on cml_metadata and cml_stats, plus the
cml_data_1h_secure security-barrier view, ensure each DB role only reads its
own data without any application-level filtering.
The canonical deployment pattern is a separate git repo that includes this
repo as a git submodule and overrides configuration with a
docker-compose.override.yml.
Create a SQL migration file (e.g. migrations/008_add_acme.sql):
-- Idempotent: safe to re-run
DO $$
BEGIN
IF NOT EXISTS (SELECT FROM pg_roles WHERE rolname = 'acme') THEN
CREATE ROLE acme LOGIN PASSWORD 'change-me-in-production';
END IF;
END
$$;
GRANT USAGE ON SCHEMA public TO acme;
GRANT SELECT, INSERT, UPDATE ON cml_data TO acme;
GRANT SELECT, INSERT, UPDATE ON cml_metadata TO acme;
GRANT SELECT, INSERT, UPDATE ON cml_stats TO acme;
GRANT EXECUTE ON FUNCTION update_cml_stats(TEXT, TEXT) TO acme;
GRANT SELECT ON cml_data_secure TO acme;
GRANT SELECT ON cml_data_1h_secure TO acme;
GRANT acme TO webserver_role;Apply it to the running database:
docker compose exec -T database psql -U myuser -d mydatabase \
< migrations/008_add_acme.sqlNo new RLS policies are needed; the generic WHERE user_id = current_user
policies cover every role automatically.
In your deployment repo, create an override that replaces ORGS / USERS in
grafana/init_grafana.py, or mount a patched copy of the file. The simplest
approach is to extend via environment variables. Until init_grafana.py
supports env-driven tenant lists, the easiest override is to replace the
script with a deployment-repo copy that appends the new tenant:
# deployment-repo/grafana/init_grafana.py (copy of the upstream file + additions)
ORGS = [
{"id": 1, "name": "demo_openmrg"},
{"id": 2, "name": "demo_orange_cameroun"},
{"id": 3, "name": "acme"}, # ← new tenant
]
USERS = [
{"login": "demo_openmrg", "org_id": 1, "role": "Viewer"},
{"login": "demo_orange_cameroun", "org_id": 2, "role": "Viewer"},
{"login": "acme", "org_id": 3, "role": "Viewer"}, # ← new
]And add the datasource + dashboard copy call in __main__:
create_datasource_for_org(
org_id=3,
name="PostgreSQL",
uid="ds_acme",
user="acme",
password="change-me-in-production",
)
copy_dashboards_to_org(target_org_id=3, source_org_id=1)Mount the patched script via docker-compose.override.yml:
services:
init_grafana:
volumes:
- ./grafana/init_grafana.py:/app/init_grafana.py:roThe deployment repo should provide its own webserver/configs/users.json
(already live-mounted, no rebuild needed):
{
"demo_openmrg": { "password_hash": "scrypt:...", "display_name": "OpenMRG Demo", "grafana_org_id": 1 },
"demo_orange_cameroun":{ "password_hash": "scrypt:...", "display_name": "Orange Cameroun Demo","grafana_org_id": 2 },
"acme": { "password_hash": "scrypt:...", "display_name": "Acme Corp", "grafana_org_id": 3 }
}Generate a password hash with:
docker compose run --rm webserver python3 -c \
"from werkzeug.security import generate_password_hash; print(generate_password_hash('your-password'))"The sftp_receiver reads authorised keys from ssh_keys/authorized_keys and
per-user key directories. Add the new tenant's public key there via the
deployment repo's volume mounts in docker-compose.override.yml.
# Apply the DB migration (only needed once per database volume lifetime)
docker compose exec -T database psql -U myuser -d mydatabase \
< migrations/008_add_acme.sql
# Restart so init_grafana re-runs bootstrap (creates org 3, datasource, copies dashboards)
docker compose restart init_grafana
# Or on a fresh stack: docker compose up -dinit_grafana is idempotent — re-running it on an existing stack is safe.