Skip to content

feat(backup): portable backup bundle + idempotent restore (#149)#175

Open
LeoLin990405 wants to merge 1 commit into
hoangsonww:masterfrom
LeoLin990405:feat/backup-restore
Open

feat(backup): portable backup bundle + idempotent restore (#149)#175
LeoLin990405 wants to merge 1 commit into
hoangsonww:masterfrom
LeoLin990405:feat/backup-restore

Conversation

@LeoLin990405

Copy link
Copy Markdown
Contributor

Summary

First increment of the local-first backup/restore/sync workflow (issue #149): a portable, versioned backup bundle + a validated, idempotent restore — to move dashboard state between machines or recover a fresh install from database loss. Zero new dependencies.

As discussed on the issue, I'm landing #149 in focused PRs. This PR = the backup + restore core. Sync targets (folder/network), scheduling, WebSocket streaming progress for very large bundles, and MCP/CLI affordances are deferred to follow-ups.

Changes

Backend (/api/backup)

  • export — a portable bundle: a manifest (format, schema_version, app version, created_at, per-table counts) + data for sessions / agents / events / token_usage / model_pricing / dashboard_runs. Distinct from the raw /api/settings/export dump (no manifest/restore) and the infra db-backup.sh (full-file copy) — docs/BACKUP.md explains the difference.
  • validate — manifest + schema_version compatibility (forward-compatible: same/older accepted, newer rejected) + structure. No mutation.
  • dry-run — per-table preview of what would change (new vs already-present, pricing conflicts, invalid rows), zero mutation.
  • restore — single-transaction merge. Append-only tables INSERT OR IGNORE by PK → idempotent (reapplying the same bundle inserts nothing; existing local rows are never overwritten). model_pricing honors a conflict strategy: keep_local (default) or use_incoming (only rewrites rows that actually differ). Column-intersection inserts (PRAGMA table_info) tolerate cross-version schema drift; everything parameterized; table names from a fixed allowlist only.
  • The three POST routes use a route-scoped 64mb JSON limit so whole-database bundles aren't rejected by the global 1mb limit. Rows missing a complete PK are reported + skipped (never written NULL-keyed, never duplicated), and dry-run mirrors restore exactly.

Frontend

  • Settings → Backup & Restore: Download backup, and Restore-from-file with a dry-run preview (manifest + per-table summary), a pricing conflict-strategy choice, and a confirm gate before the (idempotent) merge. Full en/zh/vi i18n.

Docs: docs/BACKUP.md (bundle format, compatibility policy, restore semantics, recovery procedure, difference from raw export + db-backup.sh) + README API table.

Type of Change

  • Bug fix
  • New feature (non-breaking change that adds functionality)
  • Breaking change
  • Refactor
  • Documentation update
  • Infrastructure / CI / DevOps
  • Dependency update

How to Test

  1. npm run test:server — all pass (363), incl. server/__tests__/backup.test.js (18).
  2. cd client && npm run build — clean tsc -b && vite build.
  3. Manual: Settings → Backup & Restore → Download backup → on another install (or after Clear Data) → Restore from backup → review the dry-run preview → confirm. Re-run the same file → it inserts nothing (idempotent).
  4. API: curl -O localhost:4820/api/backup/export, then POST it to /api/backup/dry-run and /api/backup/restore?pricing_strategy=keep_local (see docs/BACKUP.md).

Acceptance criteria (this PR)

  • Create a portable backup bundle from Settings (sessions, agents, events, token usage, model pricing, dashboard runs)
  • Upload/select a bundle and see a dry-run summary before any mutation
  • Restore is idempotent — reapplying the same bundle never duplicates
  • Restore preserves active local data unless you pick a conflict strategy for the mutable pricing table
  • Failed/incompatible restores surface actionable errors (4xx + issues; no partial write — transactional)
  • Server tests for schema versioning, corrupt/partial manifests, duplicate records, conflict handling
  • Docs explain the difference from raw JSON export + recovery procedure
  • Deferred: WebSocket streaming progress for very large bundles (restore is transactional/fast for typical sizes; full streaming + chunking is a follow-up)

Checklist

  • Read the contributing guidelines
  • Code follows the project's standards
  • Added tests proving the feature works
  • All new and existing tests pass (npm test) — see note
  • Code formatted (npm run format:check)
  • Updated documentation (docs/BACKUP.md + README)

A bundle intentionally contains full session data (event payloads, metadata) — it's a database-recovery artifact, like the existing /export; treat the file like the DB. No env/file config or secrets are included, and this PR has no external sync, so the issue's "no secrets in synced bundles" non-goal isn't engaged yet.

Test note: npm run test:server 363/363; cd client && npm run build clean. I regenerated the Settings screen-snapshot baseline for the new section and added api.backup to that test's API mock. npm run test:client otherwise shows only the pre-existing Dashboard/Tabby failures on Node 26 (jsdom 27 localStorage) — unrelated; CI runs Node 22 where the client suite passes.

…#149)

First increment of the local-first backup/restore/sync workflow (hoangsonww#149): a
versioned, portable backup bundle and a validated, idempotent restore — to move
dashboard state between machines or recover a fresh install. Zero new deps. Sync
targets, scheduling, WS-streaming, and MCP/CLI affordances are deferred to
follow-up PRs (noted on the issue).

Backend (/api/backup):
- export — a portable bundle: a manifest (format, schema_version, app version,
  created_at, per-table counts) + data for sessions/agents/events/token_usage/
  model_pricing/dashboard_runs. Distinct from the raw /api/settings/export dump
  (no manifest/restore) and the infra db-backup.sh (full-file copy).
- validate — checks manifest + schema_version compatibility (forward-compatible:
  same/older accepted, newer rejected) + structure. No mutation.
- dry-run — per-table preview of what would change (new vs already-present,
  pricing conflicts, invalid rows) with zero mutation.
- restore — single-transaction merge. Append-only tables INSERT OR IGNORE by PK
  (idempotent: reapplying inserts nothing; local rows never overwritten).
  model_pricing honors a conflict strategy (keep_local default | use_incoming,
  which only rewrites rows that actually differ). Column-intersection inserts
  (PRAGMA table_info) tolerate cross-version schema drift; all parameterized;
  table names from a fixed allowlist only.
- The three POST routes parse the bundle with a route-scoped 64mb JSON limit so
  whole-database bundles aren't rejected by the global 1mb limit.
- Rows missing a complete primary key are reported + skipped (never written as a
  NULL-keyed row, never duplicated on re-apply), and dry-run mirrors restore.

Frontend:
- Settings → Backup & Restore: Download backup, and Restore-from-file with a
  dry-run preview (manifest + per-table summary), a pricing conflict-strategy
  choice, and a confirm gate before the (idempotent) merge. Full en/zh/vi i18n.

Docs: docs/BACKUP.md (bundle format, compatibility policy, restore semantics,
recovery procedure, and how it differs from raw export + db-backup.sh) + README
API table.

Tests: server/__tests__/backup.test.js (18) — export manifest/counts, validate
accept/reject (garbage manifest, wrong format, future schema_version), dry-run
accuracy + read-only, restore insert + idempotency, append-only rows never
overwritten, model_pricing keep_local vs use_incoming, column-intersection of
unknown columns, transaction atomicity, the >1mb body path, PK-less rows
skipped, and use_incoming update-count parity with dry-run. Regenerated the
Settings screen-snapshot baseline for the new section.
@LeoLin990405 LeoLin990405 requested a review from hoangsonww as a code owner June 18, 2026 05:13

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a portable, versioned Backup & Restore feature that allows users to export and transactionally restore dashboard state. Feedback on the implementation highlights several critical issues: a missing import of CheckCircle in the frontend, a performance anti-pattern of preparing SQL statements inside a loop, and a side-effect of using INSERT OR REPLACE that resets unmapped columns to default values. Additionally, reviewers noted a potential race condition in the asynchronous file analysis, a potential SQL injection vulnerability in tableColumns due to direct string interpolation, and a potential crash if backup rows are malformed.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines 50 to 57
History,
ChevronLeft,
ChevronRight,
Archive,
Download,
Upload,
Loader2,
} from "lucide-react";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The CheckCircle component is used on line 733 but does not appear to be imported from lucide-react in the import block. This will cause a compilation or runtime error. Please ensure CheckCircle is added to the lucide-react imports.

Suggested change
History,
ChevronLeft,
ChevronRight,
Archive,
Download,
Upload,
Loader2,
} from "lucide-react";
History,
ChevronLeft,
ChevronRight,
Archive,
Download,
Upload,
Loader2,
CheckCircle,
} from "lucide-react";

Comment thread server/lib/backup.js
Comment on lines +382 to +408
const run = db.transaction(() => {
const applied = {};
let totalInserted = 0;

// Append-only tables — INSERT OR IGNORE by PK.
for (const table of Object.keys(APPEND_ONLY)) {
const tableCols = colsByTable[table];
const pk = APPEND_ONLY[table];
const rows = rowsOf(bundle, table);
let inserted = 0;
let invalid = 0;
for (const row of rows) {
// Skip PK-incomplete rows so we never write a NULL-keyed row or one that
// would duplicate on re-apply (keeps restore idempotent + matches the
// dry-run preview, which counts these as `invalid`).
if (!hasCompletePk(row, pk)) {
invalid++;
continue;
}
const built = buildInsert({ table, tableCols, row, conflict: "ignore" });
if (!built) continue;
const info = db.prepare(built.sql).run(...built.columns.map((c) => row[c]));
inserted += info.changes; // 0 when the PK already existed (ignored)
}
applied[table] = { inserted, skipped: rows.length - inserted - invalid, invalid };
totalInserted += inserted;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Preparing a SQL statement inside a loop (db.prepare(built.sql)) is a performance anti-pattern in SQLite. For large backup bundles with thousands of rows, this compiles the SQL statement repeatedly, causing significant CPU overhead and slowing down the restore process.

Consider caching the prepared statements locally within applyRestore to compile each unique SQL statement exactly once.

  const stmtCache = new Map();
  const getStmt = (sql) => {
    let stmt = stmtCache.get(sql);
    if (!stmt) {
      stmt = db.prepare(sql);
      stmtCache.set(sql, stmt);
    }
    return stmt;
  };

  const run = db.transaction(() => {
    const applied = {};
    let totalInserted = 0;

    // Append-only tables — INSERT OR IGNORE by PK.
    for (const table of Object.keys(APPEND_ONLY)) {
      const tableCols = colsByTable[table];
      const pk = APPEND_ONLY[table];
      const rows = rowsOf(bundle, table);
      let inserted = 0;
      let invalid = 0;
      for (const row of rows) {
        // Skip PK-incomplete rows so we never write a NULL-keyed row or one that
        // would duplicate on re-apply (keeps restore idempotent + matches the
        // dry-run preview, which counts these as `invalid`).
        if (!hasCompletePk(row, pk)) {
          invalid++;
          continue;
        }
        const built = buildInsert({ table, tableCols, row, conflict: "ignore" });
        if (!built) continue;
        const info = getStmt(built.sql).run(...built.columns.map((c) => row[c]));
        inserted += info.changes; // 0 when the PK already existed (ignored)
      }

Comment thread server/lib/backup.js
Comment on lines +428 to +441
const local = getLocal.get(row[pk[0]]);
if (!local) {
const built = buildInsert({ table, tableCols, row, conflict: "ignore" });
if (built) {
db.prepare(built.sql).run(...built.columns.map((c) => row[c]));
inserted++;
}
} else if (strategy === "use_incoming" && rowDiffers(row, local, tableCols, pk)) {
const built = buildInsert({ table, tableCols, row, conflict: "replace" });
if (built) {
db.prepare(built.sql).run(...built.columns.map((c) => row[c]));
updated++;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using INSERT OR REPLACE (via conflict: "replace") for updating existing rows in model_pricing has a side effect in SQLite: it deletes the conflicting row and inserts a new one. This means any columns present in the live database schema but missing from the backup bundle (e.g., due to cross-version schema drift or restoring an older backup on a newer app version) will be reset to their default values or NULL.

To preserve these columns and adhere to the column-intersection compatibility policy, use an UPDATE statement instead of INSERT OR REPLACE when updating existing rows.

        const local = getLocal.get(row[pk[0]]);
        if (!local) {
          const built = buildInsert({ table, tableCols, row, conflict: "ignore" });
          if (built) {
            getStmt(built.sql).run(...built.columns.map((c) => row[c]));
            inserted++;
          }
        } else if (strategy === "use_incoming" && rowDiffers(row, local, tableCols, pk)) {
          const tableColSet = new Set(tableCols);
          const pkSet = new Set(pk);
          const updateCols = tableCols.filter(
            (c) => tableColSet.has(c) && !pkSet.has(c) && Object.prototype.hasOwnProperty.call(row, c)
          );
          if (updateCols.length > 0) {
            const setClause = updateCols.map((c) => `${c} = ?`).join(", ");
            const whereClause = pk.map((c) => `${c} = ?`).join(" AND ");
            const sql = `UPDATE ${table} SET ${setClause} WHERE ${whereClause}`;
            const params = [...updateCols.map((c) => row[c]), ...pk.map((c) => row[c])];
            getStmt(sql).run(...params);
            updated++;
          }
        }

Comment on lines +406 to +427
const analyze = useCallback(
async (parsed: BackupBundle) => {
setAnalyzing(true);
setError(null);
setValidation(null);
setDryRun(null);
setResult(null);
try {
const validateRes = await api.backup.validate(parsed);
setValidation(validateRes);
if (validateRes.compatible) {
const dryRunRes = await api.backup.dryRun(parsed, "keep_local");
setDryRun(dryRunRes);
}
} catch (err) {
setError(err instanceof Error ? err.message : t("backup.restoreFailed"));
} finally {
setAnalyzing(false);
}
},
[t]
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a potential race condition in the asynchronous analyze function. If a user selects a backup file, and then quickly selects another file or changes the pricing strategy before the first analysis completes, the state (validation, dryRun, error) can be updated with stale results from the first file.

Consider using a reference to track the active analysis ID or an AbortController to ignore or cancel stale in-flight requests.

Comment thread server/lib/backup.js
Comment on lines +78 to +85
function tableColumns(db, table) {
// PRAGMA table_info can't be parameterized; table names come only from the
// fixed BACKUP_TABLES allow-list, never user input, so interpolation is safe.
return db
.prepare(`PRAGMA table_info(${table})`)
.all()
.map((c) => c.name);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The tableColumns function interpolates the table parameter directly into the SQL query (PRAGMA table_info(${table})). While currently called only with safe values from BACKUP_TABLES, this function is exported and could be reused in the future with user-controlled input, leading to SQL injection.

To enforce defense-in-depth, validate that the table argument is present in BACKUP_TABLES before executing the query.

function tableColumns(db, table) {
  if (!BACKUP_TABLES.includes(table)) {
    throw new Error(`Invalid table: ${table}`);
  }
  // PRAGMA table_info can't be parameterized; table names come only from the
  // fixed BACKUP_TABLES allow-list, never user input, so interpolation is safe.
  return db
    .prepare(`PRAGMA table_info(${table})`)
    .all()
    .map((c) => c.name);
}

Comment thread server/lib/backup.js
Comment on lines +247 to +249
function hasCompletePk(row, pk) {
return pk.every((c) => row[c] != null);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If any row in the backup data is null or not an object (which can happen with malformed or corrupted JSON files), accessing row[c] will throw a TypeError and crash the restore/dry-run process.

To make the input validation more robust, guard hasCompletePk to ensure row is a non-null object before checking its properties.

Suggested change
function hasCompletePk(row, pk) {
return pk.every((c) => row[c] != null);
}
function hasCompletePk(row, pk) {
return row && typeof row === "object" && pk.every((c) => row[c] != null);
}

@hoangsonww hoangsonww added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers question Further information is requested labels Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested

Projects

Development

Successfully merging this pull request may close these issues.

2 participants