feat(upgrade): console-driven apply/dismiss + CLI + auto-mode (0.2.1)#44
Conversation
…e 0.2.1 PR-2 of the auto-upgrade design — the web console "Upgrade now" button now works. - upgrade.installer: detect editable / pipx / uv-tool / pip / system; refuse editable + system-managed installs (pipx/uv detected via marker files so a custom PIPX_HOME/UV_TOOL_DIR still classifies correctly). - upgrade.helper: detached worker (start_new_session / DETACHED_PROCESS) that stops the service, runs the matching upgrade command, restarts, and verifies /health — writing every transition to upgrade_state.json. Windows-safe signal handling; a failed restart is reported as a failed upgrade, never left down. - REST: POST /api/v1/admin/upgrade/apply + /dismiss, and /admin/shutdown. - Console banner: button enabled when auto-upgradable, confirm modal, progress tracking across the restart, reload on success; Skip now persists server-side. - hebb upgrade CLI (--check/--apply/--status) for headless hosts. - Scheduler: wire auto_upgrade_mode="auto" to trigger the helper (was a no-op); upgrade_state.reconcile_stale clears an abandoned in_progress flag (keyed on the helper PID) at boot + before each /apply. Tests: 51 upgrade unit+integration tests; mypy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds the v0.2.1 auto-upgrade flow: install-method detection, a detached helper process, new admin endpoints, scheduler triggering, CLI support, frontend upgrade interactions, stale-state reconciliation, and version/documentation updates. ChangesAuto-upgrade feature (v0.2.1)
Sequence Diagram(s)sequenceDiagram
participant Client
participant Server
participant State
participant Helper
Client->>Server: POST /api/v1/admin/upgrade/apply
Server->>State: reconcile_stale() + validate upgrade
Server->>State: set upgrade_in_progress=True
Server->>Helper: spawn_detached(...)
Helper-->>Server: helper PID or error
Server->>State: record helper PID or roll back
Server-->>Client: response payload
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request implements the second phase of the auto-upgrade system, introducing a detached upgrade helper, new admin endpoints (/apply, /dismiss, /shutdown), a hebb upgrade CLI command, and frontend progress tracking. Review feedback focuses on preventing UnicodeDecodeError during subprocess execution on non-UTF-8 systems, adding a safety check for home_dir in the CLI, expanding system Python detection to cover Homebrew/MacPorts on macOS, fixing a logging bug where a cleared PID is referenced, and a suggestion to optimize state updates in the /apply endpoint.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| proc = subprocess.run( | ||
| argv, | ||
| capture_output=True, | ||
| text=True, | ||
| env=env, | ||
| timeout=INSTALL_TIMEOUT_SECONDS, | ||
| ) |
There was a problem hiding this comment.
Add errors="replace" to subprocess.run when capturing output as text. On platforms with non-UTF-8 system encodings (such as Windows with non-English locales), pip or uv output containing non-ASCII characters can trigger a UnicodeDecodeError and crash the upgrade helper.
| proc = subprocess.run( | |
| argv, | |
| capture_output=True, | |
| text=True, | |
| env=env, | |
| timeout=INSTALL_TIMEOUT_SECONDS, | |
| ) | |
| proc = subprocess.run( | |
| argv, | |
| capture_output=True, | |
| text=True, | |
| errors="replace", | |
| env=env, | |
| timeout=INSTALL_TIMEOUT_SECONDS, | |
| ) |
| settings = load_settings() | ||
| cmd = build_command() |
There was a problem hiding this comment.
| proc = subprocess.run( | ||
| [sys.executable, "-c", "import hebb; print(hebb.__version__)"], | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=30, | ||
| ) |
There was a problem hiding this comment.
Add errors="replace" to subprocess.run when capturing version output to prevent potential UnicodeDecodeError on non-UTF-8 systems.
| proc = subprocess.run( | |
| [sys.executable, "-c", "import hebb; print(hebb.__version__)"], | |
| capture_output=True, | |
| text=True, | |
| timeout=30, | |
| ) | |
| proc = subprocess.run( | |
| [sys.executable, "-c", "import hebb; print(hebb.__version__)"], | |
| capture_output=True, | |
| text=True, | |
| errors="replace", | |
| timeout=30, | |
| ) |
| system_roots = ( | ||
| "/usr", | ||
| "/library/frameworks", | ||
| "/system", | ||
| "c:/program files", # also matches "c:/program files (x86)" | ||
| "c:/programdata", | ||
| "c:/windows", | ||
| ) |
There was a problem hiding this comment.
Add /opt/homebrew and /opt/local to system_roots to correctly identify Homebrew and MacPorts global Python environments on macOS. This prevents attempting to upgrade packages globally in these environments, which can fail or corrupt system-managed packages.
| system_roots = ( | |
| "/usr", | |
| "/library/frameworks", | |
| "/system", | |
| "c:/program files", # also matches "c:/program files (x86)" | |
| "c:/programdata", | |
| "c:/windows", | |
| ) | |
| system_roots = ( | |
| "/usr", | |
| "/library/frameworks", | |
| "/system", | |
| "/opt/homebrew", | |
| "/opt/local", | |
| "c:/program files", # also matches "c:/program files (x86)" | |
| "c:/programdata", | |
| "c:/windows", | |
| ) |
| state.upgrade_in_progress = False | ||
| state.upgrade_helper_pid = None | ||
| if state.last_upgrade and state.last_upgrade.status == "in_progress": | ||
| state.last_upgrade.status = "failed" | ||
| state.last_upgrade.finished_at = datetime.now(timezone.utc).isoformat() | ||
| note = "interrupted — reset on restart" | ||
| state.last_upgrade.log_tail = ((state.last_upgrade.log_tail or "") + "\n" + note).strip() | ||
| save(home_dir, state) | ||
| logger.info("Reset stale upgrade_in_progress flag (pid=%s)", state.upgrade_helper_pid) |
There was a problem hiding this comment.
The logging statement at the end of reconcile_stale references state.upgrade_helper_pid after it has already been set to None on line 160. This results in logging pid=None instead of the actual stale PID. Store the PID in a local variable before clearing it.
| state.upgrade_in_progress = False | |
| state.upgrade_helper_pid = None | |
| if state.last_upgrade and state.last_upgrade.status == "in_progress": | |
| state.last_upgrade.status = "failed" | |
| state.last_upgrade.finished_at = datetime.now(timezone.utc).isoformat() | |
| note = "interrupted — reset on restart" | |
| state.last_upgrade.log_tail = ((state.last_upgrade.log_tail or "") + "\n" + note).strip() | |
| save(home_dir, state) | |
| logger.info("Reset stale upgrade_in_progress flag (pid=%s)", state.upgrade_helper_pid) | |
| old_pid = state.upgrade_helper_pid | |
| state.upgrade_in_progress = False | |
| state.upgrade_helper_pid = None | |
| if state.last_upgrade and state.last_upgrade.status == "in_progress": | |
| state.last_upgrade.status = "failed" | |
| state.last_upgrade.finished_at = datetime.now(timezone.utc).isoformat() | |
| note = "interrupted — reset on restart" | |
| state.last_upgrade.log_tail = ((state.last_upgrade.log_tail or "") + "\n" + note).strip() | |
| save(home_dir, state) | |
| logger.info("Reset stale upgrade_in_progress flag (pid=%s)", old_pid) |
| # Close the double-click window before spawning. This handler is async with | ||
| # no await before the spawn, so the event loop serializes concurrent /apply | ||
| # calls — the second one observes upgrade_in_progress and is refused above. | ||
| upgrade_state.update(settings.home_dir, upgrade_in_progress=True) | ||
| try: | ||
| pid = _spawn_helper(settings, cmd.method) | ||
| except Exception as exc: | ||
| # Spawn failed — roll back the in-progress flag so the banner recovers. | ||
| upgrade_state.update(settings.home_dir, upgrade_in_progress=False) | ||
| logger.exception("Failed to spawn upgrade helper") | ||
| raise HTTPException(status_code=500, detail=f"Failed to start upgrade: {exc}") | ||
|
|
||
| upgrade_state.update(settings.home_dir, upgrade_helper_pid=pid) |
There was a problem hiding this comment.
Since apply_upgrade is defined as async def but contains no await expressions before spawning the helper, the entire handler executes synchronously on the main event loop thread. This guarantees serialization of concurrent requests without yielding control. Therefore, the first synchronous disk write upgrade_state.update(..., upgrade_in_progress=True) is redundant and can be removed, saving unnecessary disk I/O and simplifying the code.
try:
pid = _spawn_helper(settings, cmd.method)
except Exception as exc:
logger.exception("Failed to spawn upgrade helper")
raise HTTPException(status_code=500, detail=f"Failed to start upgrade: {exc}")
upgrade_state.update(settings.home_dir, upgrade_in_progress=True, upgrade_helper_pid=pid)There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/hebb/server/app.py (1)
48-49: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
lifespanstill needs the repo-standard API docstring.This public function has a summary docstring, but it is still missing the required
Args,Returns, andRaisessections.As per coding guidelines,
src/**/*.py:All public APIs in Python MUST have docstrings with Args, Returns, and Raises sections.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/hebb/server/app.py` around lines 48 - 49, The public async function lifespan is missing the repo-standard API docstring sections required for public Python APIs. Update its docstring to include the standard Args, Returns, and Raises sections, describing the app parameter, the async iterator return, and any startup/shutdown exceptions that may be raised, while keeping the existing summary intact.Source: Coding guidelines
🧹 Nitpick comments (2)
src/hebb/upgrade/state.py (1)
123-137: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd the missing
Raisessection to this public API docstring.
reconcile_stale()is public and already hasArgs/Returns, but the repo guideline requiresRaisestoo. As per coding guidelines,**/*.py: "Include docstring with Args, Returns, and Raises sections for all public APIs".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/hebb/upgrade/state.py` around lines 123 - 137, The public API docstring for reconcile_stale is missing the required Raises section. Update the reconcile_stale(home_dir, max_age_seconds) docstring to include a Raises block alongside the existing Args and Returns sections, documenting any exceptions it can propagate or stating that it raises none if applicable, following the repo guideline for public Python APIs.Source: Coding guidelines
src/hebb/upgrade/installer.py (1)
138-145: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd
Raisessections to the public API docstrings.
detect_method()andbuild_command()haveArgs/Returns, but the repo guideline requiresRaiseson public APIs as well. As per coding guidelines,**/*.py: "Include docstring with Args, Returns, and Raises sections for all public APIs".Also applies to: 156-165
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/hebb/upgrade/installer.py` around lines 138 - 145, Add a Raises section to the public API docstrings for detect_method() and build_command() in installer.py so they comply with the repo guideline requiring Args, Returns, and Raises on all public APIs. Update each docstring to include the exception(s) these functions can emit, keeping the existing structure and placing the new Raises section alongside the current Returns documentation.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/hebb/cli/commands/upgrade.py`:
- Around line 137-153: The `check` flow in `upgrade.py` is catching all
exceptions around `httpx.post(...)` and `resp.raise_for_status()`, which
incorrectly treats daemon HTTP errors like 409/5xx as if the daemon were
unreachable. Narrow the fallback in the `check` branch to only
transport/connection failures from `httpx.RequestError`, and handle HTTP status
failures separately so the daemon’s response is surfaced instead of silently
falling back to `run_check`.
- Around line 155-167: The upgrade CLI flow is re-raising helper exit codes with
SystemExit, but recoverable failures should use click.ClickException instead. In
the upgrade command path around the apply flow, update the logic that calls
_apply_via_daemon and _apply_via_helper so failures surface as ClickException
while successful exits just return normally. Keep the change localized to the
upgrade command and its helper-driven apply branches.
In `@src/hebb/scheduler/manager.py`:
- Around line 402-413: The auto-upgrade flow in manager.py updates upgrade_state
only after spawn_detached succeeds, which leaves a window where a concurrent
/apply can start another helper; update upgrade_in_progress=True before calling
spawn_detached in the manager’s upgrade path to match the behavior in
routers/upgrade.py, and keep the existing failure handling so the flag is rolled
back if spawning fails.
In `@src/hebb/server/routers/admin.py`:
- Around line 275-283: The shutdown_service route docstring is out of sync with
the actual no-service behavior and is missing the required API documentation
sections. Update the shutdown_service docstring to describe the implemented flow
accurately, especially the no-installed-service branch handled later in the
function, and add complete Returns and Raises sections (and Args if applicable)
to satisfy the public Python API docstring guidelines. Keep the wording aligned
with the shutdown_service implementation so tests and callers are not misled.
In `@src/hebb/server/routers/upgrade.py`:
- Around line 73-80: The public route handlers apply_upgrade and dismiss_upgrade
need docstrings updated to match the repo standard. Add the required Args,
Returns, and Raises sections to each function’s docstring, keeping the existing
behavior and descriptive summary intact. Use the function names apply_upgrade
and dismiss_upgrade to locate the two route handlers and ensure both public APIs
in this module are documented consistently.
In `@src/hebb/static/js/components/upgrade-banner.js`:
- Around line 73-77: The disabled tooltip in the upgrade banner is being
injected into HTML via applyAttrs, which can break markup when
refusalTooltip(state) contains quotes. Update the upgrade-banner rendering logic
to set the disabled state and title through DOM properties/attributes instead of
string interpolation, and apply the same fix wherever applyAttrs is used for the
editable-install button path.
In `@src/hebb/upgrade/helper.py`:
- Around line 161-175: The health wait in _wait_health currently treats any 200
response from /health as success, which can let the upgrade proceed even if the
daemon is still the old version. Update _wait_health in helper.py to parse the
version returned by the health endpoint and require it to match the expected
upgraded version before returning True; then adjust the callers that persist
upgrade success so they only set status="success" and advance current_version
when the version check passes.
- Around line 178-193: The public helper entry points are missing repo-standard
API documentation: update run_upgrade, spawn_detached, and main in
src/hebb/upgrade/helper.py so each has a docstring with Args, Returns, and
Raises sections, and add a new docstring for main since it currently has none.
Make the docstrings consistent with the existing public API style by describing
the parameters, the integer return value, and any exceptions these helpers may
raise.
In `@src/hebb/upgrade/installer.py`:
- Around line 127-135: The system-managed prefix check in the installer
currently misses Homebrew-style locations, so interpreter detection can
incorrectly treat package-manager installs as regular pip environments. Update
the prefix list used by the system-managed helper in the installer logic to
include common global package-manager paths such as Homebrew prefixes, and make
sure detect_method() and build_command() both respect that expanded check so
self-upgrade is refused for those interpreters.
In `@src/hebb/upgrade/state.py`:
- Around line 142-154: The stale-state check in the upgrade reconciliation flow
is clearing active upgrades when `started is None` even after a live helper PID
was confirmed. Update the logic in the stale-detection path in state.py so the
`started is None` fallback only marks stale when there is no live
`upgrade_helper_pid`, and preserves `upgrade_in_progress` while the helper
process is still alive. Use the `state.upgrade_helper_pid`, `_pid_alive`, and
`state.last_upgrade.started_at` checks to keep the scheduler-managed upgrade
lock intact.
In `@tests/unit/upgrade/test_helper.py`:
- Around line 82-83: The Windows-specific assertion in the upgrade helper test
still dereferences helper.signal.SIGKILL directly, which can raise an
AttributeError on the very platform being guarded. Update the assertion in the
test around helper.signal.SIGTERM and helper.signal.SIGKILL to first check
whether SIGKILL exists on helper.signal, and only assert its absence when it is
available. Keep the fix localized to the test case in test_helper.py so the
runtime constraint matches the signal availability.
In `@tests/unit/upgrade/test_installer.py`:
- Around line 21-35: The detection tests for installer method classification are
still dependent on the real environment because only sys.executable is patched
while _classify_executable() now also checks sys.prefix. Update test_detect_pipx
and test_detect_uv_tool in the installer test module to stub sys.prefix as well,
or mock _classify_executable directly, so detect_method() and build_command()
are isolated from the runner’s actual uv/pipx setup.
---
Outside diff comments:
In `@src/hebb/server/app.py`:
- Around line 48-49: The public async function lifespan is missing the
repo-standard API docstring sections required for public Python APIs. Update its
docstring to include the standard Args, Returns, and Raises sections, describing
the app parameter, the async iterator return, and any startup/shutdown
exceptions that may be raised, while keeping the existing summary intact.
---
Nitpick comments:
In `@src/hebb/upgrade/installer.py`:
- Around line 138-145: Add a Raises section to the public API docstrings for
detect_method() and build_command() in installer.py so they comply with the repo
guideline requiring Args, Returns, and Raises on all public APIs. Update each
docstring to include the exception(s) these functions can emit, keeping the
existing structure and placing the new Raises section alongside the current
Returns documentation.
In `@src/hebb/upgrade/state.py`:
- Around line 123-137: The public API docstring for reconcile_stale is missing
the required Raises section. Update the reconcile_stale(home_dir,
max_age_seconds) docstring to include a Raises block alongside the existing Args
and Returns sections, documenting any exceptions it can propagate or stating
that it raises none if applicable, following the repo guideline for public
Python APIs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 1895aa79-51fa-418f-b3f3-db097fd3b830
📒 Files selected for processing (23)
.claude-plugin/plugin.json.release-please-manifest.jsonCHANGELOG.mdpyproject.tomlreports/design/auto-upgrade.mdsrc/hebb/__init__.pysrc/hebb/cli/commands/upgrade.pysrc/hebb/cli/main.pysrc/hebb/scheduler/manager.pysrc/hebb/server/app.pysrc/hebb/server/routers/admin.pysrc/hebb/server/routers/upgrade.pysrc/hebb/static/css/style.csssrc/hebb/static/js/api.jssrc/hebb/static/js/components/upgrade-banner.jssrc/hebb/static/js/i18n.jssrc/hebb/upgrade/helper.pysrc/hebb/upgrade/installer.pysrc/hebb/upgrade/state.pytests/integration/upgrade/test_router.pytests/unit/upgrade/test_helper.pytests/unit/upgrade/test_installer.pytests/unit/upgrade/test_state.py
| if check: | ||
| try: | ||
| resp = httpx.post(f"{url}/api/v1/admin/upgrade/check", timeout=15) | ||
| resp.raise_for_status() | ||
| _print_status(resp.json()) | ||
| except Exception: | ||
| # Daemon down — check directly. | ||
| import asyncio | ||
|
|
||
| from hebb.config.loader import load_settings | ||
| from hebb.upgrade.checker import run_check | ||
|
|
||
| settings = load_settings() | ||
| assert settings.home_dir is not None | ||
| checked = asyncio.run(run_check(settings.home_dir)) | ||
| _print_status(checked.model_dump(mode="json")) | ||
| return |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Broad except Exception conflates a down daemon with a real HTTP error.
resp.raise_for_status() turns a deliberate 409 (auto_upgrade_mode=off) or any 5xx from a running daemon into the "Daemon down — check directly" fallback, so --check silently runs a direct PyPI check that bypasses the daemon's disabled-checks policy. Consider catching only connection/transport errors (httpx.RequestError) for the fallback and surfacing HTTP status errors distinctly.
🧰 Tools
🪛 ast-grep (0.44.0)
[warning] 138-138: Request-controlled URL passed to httpx; validate against an allowlist to prevent SSRF.
Context: httpx.post(f"{url}/api/v1/admin/upgrade/check", timeout=15)
Note: [CWE-918] Server-Side Request Forgery (SSRF).
(avoid-ssrf)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/hebb/cli/commands/upgrade.py` around lines 137 - 153, The `check` flow in
`upgrade.py` is catching all exceptions around `httpx.post(...)` and
`resp.raise_for_status()`, which incorrectly treats daemon HTTP errors like
409/5xx as if the daemon were unreachable. Narrow the fallback in the `check`
branch to only transport/connection failures from `httpx.RequestError`, and
handle HTTP status failures separately so the daemon’s response is surfaced
instead of silently falling back to `run_check`.
| if apply_: | ||
| state = _daemon_state(url) | ||
| raise SystemExit(_apply_via_daemon(url) if state is not None else _apply_via_helper()) | ||
|
|
||
| # Default: show status, prompt to apply when an upgrade is available. | ||
| state = _daemon_state(url) | ||
| if state is None: | ||
| console.print("[yellow]Daemon not reachable.[/yellow] Run 'hebb upgrade --check' to check directly.") | ||
| return | ||
| _print_status(state) | ||
| if state.get("available") and state.get("auto_upgradable", False): | ||
| if click.confirm("Upgrade now?", default=False): | ||
| raise SystemExit(_apply_via_daemon(url)) |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Use click.ClickException for CLI failure exits instead of raise SystemExit.
The apply helpers already print the error and return an exit code, which is then re-raised as SystemExit(...). Per project convention, recoverable CLI failures should surface via click.ClickException (which prints and sets exit code 1); reserve SystemExit(1) for unrecoverable cases. Success (exit 0) can simply return.
As per coding guidelines: "Keep CLI exits via click.ClickException; reserve SystemExit(1) for unrecoverable failures".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/hebb/cli/commands/upgrade.py` around lines 155 - 167, The upgrade CLI
flow is re-raising helper exit codes with SystemExit, but recoverable failures
should use click.ClickException instead. In the upgrade command path around the
apply flow, update the logic that calls _apply_via_daemon and _apply_via_helper
so failures surface as ClickException while successful exits just return
normally. Keep the change localized to the upgrade command and its helper-driven
apply branches.
Source: Coding guidelines
| try: | ||
| pid = spawn_detached( | ||
| home_dir=self.settings.home_dir, | ||
| port=self.settings.port, | ||
| grace=self.settings.upgrade_grace_seconds, | ||
| method=cmd.method, | ||
| parent_pid=os.getpid(), | ||
| ) | ||
| upgrade_state.update(self.settings.home_dir, upgrade_in_progress=True, upgrade_helper_pid=pid) | ||
| logger.info("Auto-upgrade: spawned helper to upgrade to %s", state.latest_version) | ||
| except Exception: | ||
| logger.exception("Auto-upgrade: failed to spawn helper") |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Set upgrade_in_progress=True before spawning to mirror /apply and close the double-spawn window.
/apply (routers/upgrade.py) sets the flag before spawn_detached and rolls it back on failure. Here the flag is only persisted after a successful spawn, so a concurrent /apply (or the captured state being stale relative to disk) can spawn a second helper — two helpers then race to stop/upgrade/restart the same install. There is also no rollback path, but since the flag is set last that is moot until the ordering is fixed.
🔒️ Proposed ordering fix
- try:
- pid = spawn_detached(
- home_dir=self.settings.home_dir,
- port=self.settings.port,
- grace=self.settings.upgrade_grace_seconds,
- method=cmd.method,
- parent_pid=os.getpid(),
- )
- upgrade_state.update(self.settings.home_dir, upgrade_in_progress=True, upgrade_helper_pid=pid)
- logger.info("Auto-upgrade: spawned helper to upgrade to %s", state.latest_version)
- except Exception:
- logger.exception("Auto-upgrade: failed to spawn helper")
+ # Mark in-progress before spawning (mirrors /apply) so a concurrent
+ # /apply or check cannot spawn a second helper; roll back on failure.
+ upgrade_state.update(self.settings.home_dir, upgrade_in_progress=True)
+ try:
+ pid = spawn_detached(
+ home_dir=self.settings.home_dir,
+ port=self.settings.port,
+ grace=self.settings.upgrade_grace_seconds,
+ method=cmd.method,
+ parent_pid=os.getpid(),
+ )
+ upgrade_state.update(self.settings.home_dir, upgrade_helper_pid=pid)
+ logger.info("Auto-upgrade: spawned helper to upgrade to %s", state.latest_version)
+ except Exception:
+ upgrade_state.update(self.settings.home_dir, upgrade_in_progress=False)
+ logger.exception("Auto-upgrade: failed to spawn helper")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| pid = spawn_detached( | |
| home_dir=self.settings.home_dir, | |
| port=self.settings.port, | |
| grace=self.settings.upgrade_grace_seconds, | |
| method=cmd.method, | |
| parent_pid=os.getpid(), | |
| ) | |
| upgrade_state.update(self.settings.home_dir, upgrade_in_progress=True, upgrade_helper_pid=pid) | |
| logger.info("Auto-upgrade: spawned helper to upgrade to %s", state.latest_version) | |
| except Exception: | |
| logger.exception("Auto-upgrade: failed to spawn helper") | |
| # Mark in-progress before spawning (mirrors /apply) so a concurrent | |
| # /apply or check cannot spawn a second helper; roll back on failure. | |
| upgrade_state.update(self.settings.home_dir, upgrade_in_progress=True) | |
| try: | |
| pid = spawn_detached( | |
| home_dir=self.settings.home_dir, | |
| port=self.settings.port, | |
| grace=self.settings.upgrade_grace_seconds, | |
| method=cmd.method, | |
| parent_pid=os.getpid(), | |
| ) | |
| upgrade_state.update(self.settings.home_dir, upgrade_helper_pid=pid) | |
| logger.info("Auto-upgrade: spawned helper to upgrade to %s", state.latest_version) | |
| except Exception: | |
| upgrade_state.update(self.settings.home_dir, upgrade_in_progress=False) | |
| logger.exception("Auto-upgrade: failed to spawn helper") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/hebb/scheduler/manager.py` around lines 402 - 413, The auto-upgrade flow
in manager.py updates upgrade_state only after spawn_detached succeeds, which
leaves a window where a concurrent /apply can start another helper; update
upgrade_in_progress=True before calling spawn_detached in the manager’s upgrade
path to match the behavior in routers/upgrade.py, and keep the existing failure
handling so the flag is rolled back if spawning fails.
| async def shutdown_service() -> dict[str, Any]: | ||
| """Stop the OS-managed service without restarting it. | ||
|
|
||
| Used by the upgrade helper to take the daemon down while it replaces the | ||
| package on disk (``/restart`` brings it straight back up). Returns | ||
| immediately; the stop is dispatched ~1s later so the HTTP response flushes | ||
| first. When no installed service is found in any scope, falls back to an | ||
| in-process exit. | ||
| """ |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Update the shutdown route contract to match the implementation.
Lines 281-282 still say the no-service case “falls back to an in-process exit,” but Lines 312-316 intentionally do the opposite and leave the process running. That docstring will mislead callers and future tests. While touching it, please add the required Returns/Raises sections as well.
As per coding guidelines, src/**/*.py: All public APIs in Python MUST have docstrings with Args, Returns, and Raises sections.
Also applies to: 312-316
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/hebb/server/routers/admin.py` around lines 275 - 283, The
shutdown_service route docstring is out of sync with the actual no-service
behavior and is missing the required API documentation sections. Update the
shutdown_service docstring to describe the implemented flow accurately,
especially the no-installed-service branch handled later in the function, and
add complete Returns and Raises sections (and Args if applicable) to satisfy the
public Python API docstring guidelines. Keep the wording aligned with the
shutdown_service implementation so tests and callers are not misled.
Source: Coding guidelines
| @router.post("/apply") | ||
| async def apply_upgrade(settings: Settings = Depends(get_settings)) -> dict[str, Any]: | ||
| """Spawn the detached upgrade helper after validating it is safe to run. | ||
|
|
||
| Refuses when: checks are disabled (``mode=off``), no newer version is | ||
| available, an upgrade is already in progress, or the install method is not | ||
| auto-upgradable (editable / system-managed). | ||
| """ |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Document the new upgrade endpoints to the repo standard.
apply_upgrade and dismiss_upgrade are public route handlers under src/, but their new docstrings still omit the required Args, Returns, and Raises sections.
As per coding guidelines, src/**/*.py: All public APIs in Python MUST have docstrings with Args, Returns, and Raises sections.
Also applies to: 121-123
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/hebb/server/routers/upgrade.py` around lines 73 - 80, The public route
handlers apply_upgrade and dismiss_upgrade need docstrings updated to match the
repo standard. Add the required Args, Returns, and Raises sections to each
function’s docstring, keeping the existing behavior and descriptive summary
intact. Use the function names apply_upgrade and dismiss_upgrade to locate the
two route handlers and ensure both public APIs in this module are documented
consistently.
Source: Coding guidelines
| def run_upgrade( | ||
| *, | ||
| home_dir: Path, | ||
| method: Method, | ||
| parent_pid: int, | ||
| grace: float, | ||
| port: int, | ||
| dry_run: bool = False, | ||
| ) -> int: | ||
| """Execute the full stop → upgrade → restart → verify flow. | ||
|
|
||
| Returns a process exit code (0 on a successful upgrade, 1 otherwise). Every | ||
| outcome — success or failure — is persisted to ``upgrade_state.json`` and | ||
| the service is always brought back up in a ``finally`` so a crash never | ||
| leaves the daemon down. | ||
| """ |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Document the new public helper entry points to the repo standard.
run_upgrade, spawn_detached, and main are public functions under src/, but the new docs are missing the required Args/Returns/Raises sections, and main currently has no docstring at all.
As per coding guidelines, src/**/*.py: All public APIs in Python MUST have docstrings with Args, Returns, and Raises sections.
Also applies to: 308-331, 363-380
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/hebb/upgrade/helper.py` around lines 178 - 193, The public helper entry
points are missing repo-standard API documentation: update run_upgrade,
spawn_detached, and main in src/hebb/upgrade/helper.py so each has a docstring
with Args, Returns, and Raises sections, and add a new docstring for main since
it currently has none. Make the docstrings consistent with the existing public
API style by describing the parameters, the integer return value, and any
exceptions these helpers may raise.
Source: Coding guidelines
| system_roots = ( | ||
| "/usr", | ||
| "/library/frameworks", | ||
| "/system", | ||
| "c:/program files", # also matches "c:/program files (x86)" | ||
| "c:/programdata", | ||
| "c:/windows", | ||
| ) | ||
| return any(prefix.startswith(root) for root in system_roots) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Treat Homebrew-style global interpreters as system-managed too.
This list misses common non-venv package-manager prefixes like /opt/homebrew, so detect_method() falls through to "pip" and build_command() will self-upgrade a package-managed interpreter even though this flow is supposed to refuse system-managed installs.
Suggested fix
system_roots = (
"/usr",
+ "/opt/homebrew",
"/library/frameworks",
"/system",
"c:/program files", # also matches "c:/program files (x86)"
"c:/programdata",
"c:/windows",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| system_roots = ( | |
| "/usr", | |
| "/library/frameworks", | |
| "/system", | |
| "c:/program files", # also matches "c:/program files (x86)" | |
| "c:/programdata", | |
| "c:/windows", | |
| ) | |
| return any(prefix.startswith(root) for root in system_roots) | |
| system_roots = ( | |
| "/usr", | |
| "/opt/homebrew", | |
| "/library/frameworks", | |
| "/system", | |
| "c:/program files", # also matches "c:/program files (x86)" | |
| "c:/programdata", | |
| "c:/windows", | |
| ) | |
| return any(prefix.startswith(root) for root in system_roots) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/hebb/upgrade/installer.py` around lines 127 - 135, The system-managed
prefix check in the installer currently misses Homebrew-style locations, so
interpreter detection can incorrectly treat package-manager installs as regular
pip environments. Update the prefix list used by the system-managed helper in
the installer logic to include common global package-manager paths such as
Homebrew prefixes, and make sure detect_method() and build_command() both
respect that expanded check so self-upgrade is refused for those interpreters.
| stale = False | ||
| if state.upgrade_helper_pid is not None and not _pid_alive(state.upgrade_helper_pid): | ||
| stale = True | ||
| started = state.last_upgrade.started_at if state.last_upgrade else None | ||
| if not stale and started: | ||
| try: | ||
| age = (datetime.now(timezone.utc) - datetime.fromisoformat(started)).total_seconds() | ||
| stale = age > max_age_seconds | ||
| except (ValueError, TypeError): | ||
| stale = True | ||
| elif not stale and started is None: | ||
| # In progress with no record at all — there is nothing live to wait on. | ||
| stale = True |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Don't clear upgrade_in_progress when the helper PID is still alive.
This started is None fallback fires even after a successful live-PID check. The scheduler path already persists upgrade_in_progress=True, upgrade_helper_pid=pid in src/hebb/scheduler/manager.py:410, so a later reconciliation can drop the lock while the worker is still running.
Suggested fix
- elif not stale and started is None:
+ elif (
+ not stale
+ and started is None
+ and state.upgrade_helper_pid is None
+ ):
# In progress with no record at all — there is nothing live to wait on.
stale = True📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| stale = False | |
| if state.upgrade_helper_pid is not None and not _pid_alive(state.upgrade_helper_pid): | |
| stale = True | |
| started = state.last_upgrade.started_at if state.last_upgrade else None | |
| if not stale and started: | |
| try: | |
| age = (datetime.now(timezone.utc) - datetime.fromisoformat(started)).total_seconds() | |
| stale = age > max_age_seconds | |
| except (ValueError, TypeError): | |
| stale = True | |
| elif not stale and started is None: | |
| # In progress with no record at all — there is nothing live to wait on. | |
| stale = True | |
| stale = False | |
| if state.upgrade_helper_pid is not None and not _pid_alive(state.upgrade_helper_pid): | |
| stale = True | |
| started = state.last_upgrade.started_at if state.last_upgrade else None | |
| if not stale and started: | |
| try: | |
| age = (datetime.now(timezone.utc) - datetime.fromisoformat(started)).total_seconds() | |
| stale = age > max_age_seconds | |
| except (ValueError, TypeError): | |
| stale = True | |
| elif ( | |
| not stale | |
| and started is None | |
| and state.upgrade_helper_pid is None | |
| ): | |
| # In progress with no record at all — there is nothing live to wait on. | |
| stale = True |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/hebb/upgrade/state.py` around lines 142 - 154, The stale-state check in
the upgrade reconciliation flow is clearing active upgrades when `started is
None` even after a live helper PID was confirmed. Update the logic in the
stale-detection path in state.py so the `started is None` fallback only marks
stale when there is no live `upgrade_helper_pid`, and preserves
`upgrade_in_progress` while the helper process is still alive. Use the
`state.upgrade_helper_pid`, `_pid_alive`, and `state.last_upgrade.started_at`
checks to keep the scheduler-managed upgrade lock intact.
The positive pipx/uv detection tests assumed those tools were on PATH (true locally, false on CI runners which ship pipx but not uv) — build_command then refused the uv-tool method and the asserts failed. Pin shutil.which so the command is built deterministically regardless of the environment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
os.kill(pid, 0) is the POSIX liveness idiom, but on Windows signal 0 is CTRL_C_EVENT — os.kill(pid, 0) sends a Ctrl+C to the process group instead of probing. reconcile_stale() called _pid_alive(os.getpid()) on the apply path, so on Windows it raised KeyboardInterrupt in the daemon (and crashed the Windows CI test job via the in-progress router test that seeds its own PID). Probe via OpenProcess + GetExitCodeProcess on Windows; keep os.kill(pid, 0) on POSIX. Deduplicate the helper's copy by importing the single implementation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three unit tests asserted POSIX-only facts and failed on the Windows CI matrix
(the production code is correct):
- test_terminate_parent_no_sigkill_on_windows referenced signal.SIGKILL, which
does not exist on Windows — assert sent == [SIGTERM] instead.
- the two _is_system_python positive tests fed POSIX roots ("/usr"), which
Path.resolve() drive-prefixes to "c:/usr" on Windows. Gate them to POSIX and
add a Windows-native variant ("C:\\PROGRAM FILES") for the case-insensitive
check.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
PR-2 of the auto-upgrade design (
reports/design/auto-upgrade.md) — the web console "Upgrade now" banner button now actually upgrades Hebb Mind. PR-1 shipped detection + a disabled-button notification surface; this PR makes the apply flow real, end to end.What's included
upgrade.installer— detect editable / pipx / uv-tool / pip / system installs and refuse editable + system-managed Python (the button stays disabled with the real reason). pipx/uv are detected via tool marker files so a customPIPX_HOME/UV_TOOL_DIRstill classifies correctly.upgrade.helper— a fully detached worker (start_new_session/DETACHED_PROCESS) that stops the OS service, runs the matching upgrade command, restarts, and verifies/health, writing every transition toupgrade_state.json. Windows-safe signal handling; a failed restart is reported as a failed upgrade, never left silently down.POST /api/v1/admin/upgrade/apply+/dismiss, andPOST /api/v1/admin/shutdown(stop without restart). "Skip this version" now persists server-side (dismissed_for_version), not just in the browser.hebb upgradeCLI —--check/--apply/--statusfor headless hosts.auto_upgrade_mode="auto"is now wired to trigger the helper directly (was previously a no-op);upgrade_state.reconcile_staleclears an abandonedupgrade_in_progressflag (keyed on the recorded helper PID) at boot + before each/apply.Quality
signal.SIGKILLcrash, a daemon-left-down path on restart failure, an editable-install data-loss path, and Windows system-Python misdetection.Release
Bumps 0.2.0 → 0.2.1 (patch; the project bumps patch for pre-1.0 features) across
pyproject.toml,src/hebb/__init__.py,.release-please-manifest.json,.claude-plugin/plugin.json+ CHANGELOG.mainauto-publishes 0.2.1 to PyPI (publish.ymlships on thepyproject.tomlversion change). PR-3 (native OS notifications + Settings-page section) is the only remaining design item.🤖 Generated with Claude Code
Summary by CodeRabbit
hebb upgradeCLI command for headless upgrade status/check/apply./shutdownto stop the service without restart.auto_upgrade_mode="auto"to automatically trigger upgrades when eligible.