feat(backup): portable backup bundle + idempotent restore (#149)#175
feat(backup): portable backup bundle + idempotent restore (#149)#175LeoLin990405 wants to merge 1 commit into
Conversation
…#149) First increment of the local-first backup/restore/sync workflow (hoangsonww#149): a versioned, portable backup bundle and a validated, idempotent restore — to move dashboard state between machines or recover a fresh install. Zero new deps. Sync targets, scheduling, WS-streaming, and MCP/CLI affordances are deferred to follow-up PRs (noted on the issue). Backend (/api/backup): - export — a portable bundle: a manifest (format, schema_version, app version, created_at, per-table counts) + data for sessions/agents/events/token_usage/ model_pricing/dashboard_runs. Distinct from the raw /api/settings/export dump (no manifest/restore) and the infra db-backup.sh (full-file copy). - validate — checks manifest + schema_version compatibility (forward-compatible: same/older accepted, newer rejected) + structure. No mutation. - dry-run — per-table preview of what would change (new vs already-present, pricing conflicts, invalid rows) with zero mutation. - restore — single-transaction merge. Append-only tables INSERT OR IGNORE by PK (idempotent: reapplying inserts nothing; local rows never overwritten). model_pricing honors a conflict strategy (keep_local default | use_incoming, which only rewrites rows that actually differ). Column-intersection inserts (PRAGMA table_info) tolerate cross-version schema drift; all parameterized; table names from a fixed allowlist only. - The three POST routes parse the bundle with a route-scoped 64mb JSON limit so whole-database bundles aren't rejected by the global 1mb limit. - Rows missing a complete primary key are reported + skipped (never written as a NULL-keyed row, never duplicated on re-apply), and dry-run mirrors restore. Frontend: - Settings → Backup & Restore: Download backup, and Restore-from-file with a dry-run preview (manifest + per-table summary), a pricing conflict-strategy choice, and a confirm gate before the (idempotent) merge. Full en/zh/vi i18n. Docs: docs/BACKUP.md (bundle format, compatibility policy, restore semantics, recovery procedure, and how it differs from raw export + db-backup.sh) + README API table. Tests: server/__tests__/backup.test.js (18) — export manifest/counts, validate accept/reject (garbage manifest, wrong format, future schema_version), dry-run accuracy + read-only, restore insert + idempotency, append-only rows never overwritten, model_pricing keep_local vs use_incoming, column-intersection of unknown columns, transaction atomicity, the >1mb body path, PK-less rows skipped, and use_incoming update-count parity with dry-run. Regenerated the Settings screen-snapshot baseline for the new section.
There was a problem hiding this comment.
Code Review
This pull request introduces a portable, versioned Backup & Restore feature that allows users to export and transactionally restore dashboard state. Feedback on the implementation highlights several critical issues: a missing import of CheckCircle in the frontend, a performance anti-pattern of preparing SQL statements inside a loop, and a side-effect of using INSERT OR REPLACE that resets unmapped columns to default values. Additionally, reviewers noted a potential race condition in the asynchronous file analysis, a potential SQL injection vulnerability in tableColumns due to direct string interpolation, and a potential crash if backup rows are malformed.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| History, | ||
| ChevronLeft, | ||
| ChevronRight, | ||
| Archive, | ||
| Download, | ||
| Upload, | ||
| Loader2, | ||
| } from "lucide-react"; |
There was a problem hiding this comment.
The CheckCircle component is used on line 733 but does not appear to be imported from lucide-react in the import block. This will cause a compilation or runtime error. Please ensure CheckCircle is added to the lucide-react imports.
| History, | |
| ChevronLeft, | |
| ChevronRight, | |
| Archive, | |
| Download, | |
| Upload, | |
| Loader2, | |
| } from "lucide-react"; | |
| History, | |
| ChevronLeft, | |
| ChevronRight, | |
| Archive, | |
| Download, | |
| Upload, | |
| Loader2, | |
| CheckCircle, | |
| } from "lucide-react"; |
| const run = db.transaction(() => { | ||
| const applied = {}; | ||
| let totalInserted = 0; | ||
|
|
||
| // Append-only tables — INSERT OR IGNORE by PK. | ||
| for (const table of Object.keys(APPEND_ONLY)) { | ||
| const tableCols = colsByTable[table]; | ||
| const pk = APPEND_ONLY[table]; | ||
| const rows = rowsOf(bundle, table); | ||
| let inserted = 0; | ||
| let invalid = 0; | ||
| for (const row of rows) { | ||
| // Skip PK-incomplete rows so we never write a NULL-keyed row or one that | ||
| // would duplicate on re-apply (keeps restore idempotent + matches the | ||
| // dry-run preview, which counts these as `invalid`). | ||
| if (!hasCompletePk(row, pk)) { | ||
| invalid++; | ||
| continue; | ||
| } | ||
| const built = buildInsert({ table, tableCols, row, conflict: "ignore" }); | ||
| if (!built) continue; | ||
| const info = db.prepare(built.sql).run(...built.columns.map((c) => row[c])); | ||
| inserted += info.changes; // 0 when the PK already existed (ignored) | ||
| } | ||
| applied[table] = { inserted, skipped: rows.length - inserted - invalid, invalid }; | ||
| totalInserted += inserted; | ||
| } |
There was a problem hiding this comment.
Preparing a SQL statement inside a loop (db.prepare(built.sql)) is a performance anti-pattern in SQLite. For large backup bundles with thousands of rows, this compiles the SQL statement repeatedly, causing significant CPU overhead and slowing down the restore process.
Consider caching the prepared statements locally within applyRestore to compile each unique SQL statement exactly once.
const stmtCache = new Map();
const getStmt = (sql) => {
let stmt = stmtCache.get(sql);
if (!stmt) {
stmt = db.prepare(sql);
stmtCache.set(sql, stmt);
}
return stmt;
};
const run = db.transaction(() => {
const applied = {};
let totalInserted = 0;
// Append-only tables — INSERT OR IGNORE by PK.
for (const table of Object.keys(APPEND_ONLY)) {
const tableCols = colsByTable[table];
const pk = APPEND_ONLY[table];
const rows = rowsOf(bundle, table);
let inserted = 0;
let invalid = 0;
for (const row of rows) {
// Skip PK-incomplete rows so we never write a NULL-keyed row or one that
// would duplicate on re-apply (keeps restore idempotent + matches the
// dry-run preview, which counts these as `invalid`).
if (!hasCompletePk(row, pk)) {
invalid++;
continue;
}
const built = buildInsert({ table, tableCols, row, conflict: "ignore" });
if (!built) continue;
const info = getStmt(built.sql).run(...built.columns.map((c) => row[c]));
inserted += info.changes; // 0 when the PK already existed (ignored)
}| const local = getLocal.get(row[pk[0]]); | ||
| if (!local) { | ||
| const built = buildInsert({ table, tableCols, row, conflict: "ignore" }); | ||
| if (built) { | ||
| db.prepare(built.sql).run(...built.columns.map((c) => row[c])); | ||
| inserted++; | ||
| } | ||
| } else if (strategy === "use_incoming" && rowDiffers(row, local, tableCols, pk)) { | ||
| const built = buildInsert({ table, tableCols, row, conflict: "replace" }); | ||
| if (built) { | ||
| db.prepare(built.sql).run(...built.columns.map((c) => row[c])); | ||
| updated++; | ||
| } | ||
| } |
There was a problem hiding this comment.
Using INSERT OR REPLACE (via conflict: "replace") for updating existing rows in model_pricing has a side effect in SQLite: it deletes the conflicting row and inserts a new one. This means any columns present in the live database schema but missing from the backup bundle (e.g., due to cross-version schema drift or restoring an older backup on a newer app version) will be reset to their default values or NULL.
To preserve these columns and adhere to the column-intersection compatibility policy, use an UPDATE statement instead of INSERT OR REPLACE when updating existing rows.
const local = getLocal.get(row[pk[0]]);
if (!local) {
const built = buildInsert({ table, tableCols, row, conflict: "ignore" });
if (built) {
getStmt(built.sql).run(...built.columns.map((c) => row[c]));
inserted++;
}
} else if (strategy === "use_incoming" && rowDiffers(row, local, tableCols, pk)) {
const tableColSet = new Set(tableCols);
const pkSet = new Set(pk);
const updateCols = tableCols.filter(
(c) => tableColSet.has(c) && !pkSet.has(c) && Object.prototype.hasOwnProperty.call(row, c)
);
if (updateCols.length > 0) {
const setClause = updateCols.map((c) => `${c} = ?`).join(", ");
const whereClause = pk.map((c) => `${c} = ?`).join(" AND ");
const sql = `UPDATE ${table} SET ${setClause} WHERE ${whereClause}`;
const params = [...updateCols.map((c) => row[c]), ...pk.map((c) => row[c])];
getStmt(sql).run(...params);
updated++;
}
}| const analyze = useCallback( | ||
| async (parsed: BackupBundle) => { | ||
| setAnalyzing(true); | ||
| setError(null); | ||
| setValidation(null); | ||
| setDryRun(null); | ||
| setResult(null); | ||
| try { | ||
| const validateRes = await api.backup.validate(parsed); | ||
| setValidation(validateRes); | ||
| if (validateRes.compatible) { | ||
| const dryRunRes = await api.backup.dryRun(parsed, "keep_local"); | ||
| setDryRun(dryRunRes); | ||
| } | ||
| } catch (err) { | ||
| setError(err instanceof Error ? err.message : t("backup.restoreFailed")); | ||
| } finally { | ||
| setAnalyzing(false); | ||
| } | ||
| }, | ||
| [t] | ||
| ); |
There was a problem hiding this comment.
There is a potential race condition in the asynchronous analyze function. If a user selects a backup file, and then quickly selects another file or changes the pricing strategy before the first analysis completes, the state (validation, dryRun, error) can be updated with stale results from the first file.
Consider using a reference to track the active analysis ID or an AbortController to ignore or cancel stale in-flight requests.
| function tableColumns(db, table) { | ||
| // PRAGMA table_info can't be parameterized; table names come only from the | ||
| // fixed BACKUP_TABLES allow-list, never user input, so interpolation is safe. | ||
| return db | ||
| .prepare(`PRAGMA table_info(${table})`) | ||
| .all() | ||
| .map((c) => c.name); | ||
| } |
There was a problem hiding this comment.
The tableColumns function interpolates the table parameter directly into the SQL query (PRAGMA table_info(${table})). While currently called only with safe values from BACKUP_TABLES, this function is exported and could be reused in the future with user-controlled input, leading to SQL injection.
To enforce defense-in-depth, validate that the table argument is present in BACKUP_TABLES before executing the query.
function tableColumns(db, table) {
if (!BACKUP_TABLES.includes(table)) {
throw new Error(`Invalid table: ${table}`);
}
// PRAGMA table_info can't be parameterized; table names come only from the
// fixed BACKUP_TABLES allow-list, never user input, so interpolation is safe.
return db
.prepare(`PRAGMA table_info(${table})`)
.all()
.map((c) => c.name);
}| function hasCompletePk(row, pk) { | ||
| return pk.every((c) => row[c] != null); | ||
| } |
There was a problem hiding this comment.
If any row in the backup data is null or not an object (which can happen with malformed or corrupted JSON files), accessing row[c] will throw a TypeError and crash the restore/dry-run process.
To make the input validation more robust, guard hasCompletePk to ensure row is a non-null object before checking its properties.
| function hasCompletePk(row, pk) { | |
| return pk.every((c) => row[c] != null); | |
| } | |
| function hasCompletePk(row, pk) { | |
| return row && typeof row === "object" && pk.every((c) => row[c] != null); | |
| } |
Summary
First increment of the local-first backup/restore/sync workflow (issue #149): a portable, versioned backup bundle + a validated, idempotent restore — to move dashboard state between machines or recover a fresh install from database loss. Zero new dependencies.
As discussed on the issue, I'm landing #149 in focused PRs. This PR = the backup + restore core. Sync targets (folder/network), scheduling, WebSocket streaming progress for very large bundles, and MCP/CLI affordances are deferred to follow-ups.
Changes
Backend (
/api/backup)manifest(format,schema_version, app version,created_at, per-tablecounts) +dataforsessions / agents / events / token_usage / model_pricing / dashboard_runs. Distinct from the raw/api/settings/exportdump (no manifest/restore) and the infradb-backup.sh(full-file copy) — docs/BACKUP.md explains the difference.schema_versioncompatibility (forward-compatible: same/older accepted, newer rejected) + structure. No mutation.INSERT OR IGNOREby PK → idempotent (reapplying the same bundle inserts nothing; existing local rows are never overwritten).model_pricinghonors a conflict strategy:keep_local(default) oruse_incoming(only rewrites rows that actually differ). Column-intersection inserts (PRAGMA table_info) tolerate cross-version schema drift; everything parameterized; table names from a fixed allowlist only.Frontend
Docs:
docs/BACKUP.md(bundle format, compatibility policy, restore semantics, recovery procedure, difference from raw export +db-backup.sh) + README API table.Type of Change
How to Test
npm run test:server— all pass (363), incl.server/__tests__/backup.test.js(18).cd client && npm run build— cleantsc -b && vite build.curl -O localhost:4820/api/backup/export, thenPOSTit to/api/backup/dry-runand/api/backup/restore?pricing_strategy=keep_local(see docs/BACKUP.md).Acceptance criteria (this PR)
Checklist
npm test) — see notenpm run format:check)