HYPERFLEET-453, HYPERFLEET-454 - Implement health endpoints and graceful shutdown #34

rafabene · 2026-01-13T19:41:06Z

Summary

Consolidate health endpoints (/healthz, /readyz) on metrics server port 8080
Implement graceful shutdown with configurable SHUTDOWN_TIMEOUT (default 20s)
Use signal.NotifyContext for SIGTERM/SIGINT handling
Add proper shutdown sequence: mark not ready, drain HTTP connections, cleanup resources

Test plan

Verify /healthz returns 200 OK on port 8080
Verify /readyz returns 200 OK when ready, 503 during shutdown
Test graceful shutdown with SIGTERM signal
Verify SHUTDOWN_TIMEOUT environment variable is respected
Run make test and make lint

Summary by CodeRabbit

New Features
- Added standard liveness (/healthz) and readiness (/readyz) probes and a dedicated health server with readiness state management.
Bug Fixes
- Startup/shutdown now signal readiness accurately and mark service as draining during graceful shutdown.
Configuration
- Health endpoints moved to port 8080; metrics moved to port 9090. Flag names and probe timing defaults updated.
Documentation
- Docs and deployment manifests updated to reflect new endpoints, ports, flags, and probe behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-13T19:41:19Z

Walkthrough

This PR introduces a new health server exposing liveness (/healthz) and readiness (/readyz) on port 8080, moves metrics to /metrics on port 9090, and renames related config flags and types (Health instead of HealthCheck). It replaces the previous health-check server implementation with NewHealthServer(), adds a thread-safe readiness state singleton, health handlers that optionally verify DB connectivity, adds ListenNotifier for servers, updates startup/shutdown sequencing to set readiness and perform environment teardown, and updates manifests, charts, docs, tests, and defaults to the new ports and probe paths.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant HS as HealthServer
    participant RS as ReadinessState
    participant H as HealthHandler
    participant DB as Database

    App->>HS: Start()
    HS->>HS: Start HTTP listener (:8080)
    App->>RS: SetReady()
    RS->>RS: ready=true, shuttingDown=false

    Client->>HS: GET /healthz
    HS->>H: LivenessHandler()
    H-->>Client: 200 OK {"status":"ok"}

    Client->>HS: GET /readyz
    HS->>H: ReadinessHandler()
    H->>RS: IsShuttingDown(), IsReady()
    alt Ready and not shutting down
        H->>DB: Ping() (if configured)
        DB-->>H: OK
        H-->>Client: 200 OK {"status":"ok"}
    else Not ready or DB failure or shutting down
        H-->>Client: 503 Service Unavailable {"reason": "..."}
    end

sequenceDiagram
    participant App as Application
    participant RS as ReadinessState
    participant HS as HealthServer
    participant AS as APIServer
    participant MS as MetricsServer
    participant Env as Environment

    App->>RS: SetShuttingDown()
    RS->>RS: shuttingDown=true

    App->>HS: Stop()
    HS->>HS: Graceful shutdown (respect timeout)

    App->>AS: Stop()
    AS->>AS: Graceful shutdown

    App->>MS: Stop()
    MS->>MS: Graceful shutdown

    App->>Env: Teardown()
    Env-->>App: Resources closed

    Note over App: Shutdown complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested labels

lgtm, approved

Suggested reviewers

rh-amarin
aredenba-rh

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.68% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title clearly and specifically describes the main objectives: implementing health endpoints (/healthz, /readyz) and graceful shutdown, which are confirmed across the raw_summary file changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

pkg/health/readiness.go (1)
1-59: Consider adding a Reset method for test isolation.

The singleton pattern can make unit tests interdependent if state carries over between tests.
♻️ Optional: Add Reset for testing
+// Reset resets the readiness state (intended for testing only)
+func (r *ReadinessState) Reset() {
+	r.mu.Lock()
+	defer r.mu.Unlock()
+	r.ready = false
+	r.shuttingDown = false
+}
pkg/health/handler.go (1)
30-76: Consider logging database connectivity failures for observability.

The readiness handler correctly prioritizes state checks (shutting down, not ready) before attempting the database ping, which avoids unnecessary DB calls during shutdown. However, database ping failures are silently returned as 503 without logging.

Adding debug-level logging for DB ping failures could help with troubleshooting connectivity issues in production without being too noisy.
💡 Optional: Add debug logging for DB failures
// At the top of the file, add the logger import and context
import (
    "context"
    "encoding/json"
    "net/http"

    "github.com/openshift-hyperfleet/hyperfleet-api/pkg/db"
    "github.com/openshift-hyperfleet/hyperfleet-api/pkg/logger"
)

// In ReadinessHandler, around lines 68-75:
if err := sqlDB.PingContext(r.Context()); err != nil {
    logger.WithError(r.Context(), err).Debug("Database ping failed during readiness check")
    w.WriteHeader(http.StatusServiceUnavailable)
    // ... rest unchanged
}

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a03475 and bff6243.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (22)

.claude/settings.local.json
AGENTS.md
README.md
charts/templates/deployment.yaml
charts/templates/service.yaml
charts/values.yaml
cmd/hyperfleet-api/servecmd/cmd.go
cmd/hyperfleet-api/server/health_server.go
cmd/hyperfleet-api/server/healthcheck_server.go
cmd/hyperfleet-api/server/metrics_server.go
cmd/hyperfleet-api/server/server.go
docs/deployment.md
docs/development.md
docs/hyperfleet-api.http
go.mod
pkg/config/config.go
pkg/config/health.go
pkg/config/health_check.go
pkg/config/metrics.go
pkg/health/handler.go
pkg/health/readiness.go
test/helper.go

💤 Files with no reviewable changes (2)

cmd/hyperfleet-api/server/healthcheck_server.go
pkg/config/health_check.go

🚧 Files skipped from review as they are similar to previous changes (8)

pkg/config/metrics.go
pkg/config/health.go
charts/values.yaml
docs/hyperfleet-api.http
cmd/hyperfleet-api/server/health_server.go
README.md
docs/development.md
charts/templates/service.yaml

🧰 Additional context used

📓 Path-based instructions (1)

**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID

Files:

cmd/hyperfleet-api/server/server.go
cmd/hyperfleet-api/server/metrics_server.go
pkg/health/handler.go
test/helper.go
pkg/config/config.go
cmd/hyperfleet-api/servecmd/cmd.go
pkg/health/readiness.go

🧠 Learnings (19)

📓 Common learnings

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

Applied to files:

cmd/hyperfleet-api/server/server.go
cmd/hyperfleet-api/server/metrics_server.go
test/helper.go
charts/templates/deployment.yaml
docs/deployment.md
cmd/hyperfleet-api/servecmd/cmd.go
AGENTS.md

📚 Learning: 2026-01-13T04:16:08.743Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.

Applied to files:

cmd/hyperfleet-api/server/server.go
cmd/hyperfleet-api/server/metrics_server.go
pkg/health/handler.go
test/helper.go
pkg/config/config.go
cmd/hyperfleet-api/servecmd/cmd.go
pkg/health/readiness.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go
pkg/health/handler.go
docs/deployment.md
AGENTS.md

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go
pkg/health/handler.go
test/helper.go
charts/templates/deployment.yaml
docs/deployment.md
pkg/config/config.go
cmd/hyperfleet-api/servecmd/cmd.go
AGENTS.md
pkg/health/readiness.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : HTTP handlers must follow the pattern: parse request → call service/DAO → handle errors → send response with appropriate HTTP status codes

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept resource creation/update/delete requests via REST endpoints, validate, and persist to PostgreSQL

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,dao}/**/*.go : Errors must use structured ServiceError type from pkg/errors/ with HttpCode, Code, and Reason fields; send via errors.SendError(w, r, err)

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Database transactions are automatically managed by TransactionMiddleware for all HTTP requests; store session in context and auto-commit on success, rollback on error

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,db}/**/*.go : Use context values for passing request-scoped data (session, operation ID, user info) rather than function parameters or global state

Applied to files:

pkg/health/handler.go
cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/dao/**/*.go : All DAO methods must accept context.Context as the first parameter for transaction propagation

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/services/**/*.go : Status aggregation: Phase is 'Ready' if all adapters report Ready=True, 'Failed' if any adapter reports Ready=False, 'NotReady' otherwise

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go
pkg/health/readiness.go

📚 Learning: 2026-01-08T09:40:10.138Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go
go.mod

📚 Learning: 2026-01-13T04:19:50.012Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: go.mod:32-36
Timestamp: 2026-01-13T04:19:50.012Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, the team has decided to keep OpenTelemetry core packages (go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk, go.opentelemetry.io/otel/trace, go.opentelemetry.io/otel/exporters/stdout/stdouttrace) at v1.38.0 rather than upgrading to v1.39.0 in PR `#31`.

Applied to files:

go.mod

📚 Learning: 2026-01-13T10:10:34.485Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/telemetry/otel.go:10-10
Timestamp: 2026-01-13T10:10:34.485Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository (PR `#31`), the team has decided to keep the OpenTelemetry semantic conventions package at v1.37.0 (go.opentelemetry.io/otel/semconv/v1.37.0) while using the SDK at v1.38.0 in pkg/telemetry/otel.go.

Applied to files:

go.mod

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to **/*.go : Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)

Applied to files:

go.mod

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/db/**/*.go : Use PostgreSQL 14.2 with GORM ORM for database operations

Applied to files:

go.mod

🧬 Code graph analysis (1)

pkg/health/handler.go (2)

pkg/db/session.go (1)

SessionFactory (12-20)

pkg/health/readiness.go (1)

GetReadinessState (20-28)

🔇 Additional comments (29)

go.mod (1)

128-128: LGTM!

The added indirect dependencies (go.yaml.in/yaml/v2 and sigs.k8s.io/yaml) are standard YAML processing libraries commonly used in the Kubernetes ecosystem. These are transitive dependencies and appropriate for this PR's scope.

Also applies to: 140-140

.claude/settings.local.json (1)

4-8: LGTM!

The expanded Bash permissions are appropriate for development workflows. All added commands (git stash, make lint, ls, git add) are standard development operations that don't pose security risks.

cmd/hyperfleet-api/server/metrics_server.go (1)

20-22: LGTM!

The comment accurately documents the architectural change where health endpoints are now served by a dedicated health server on port 8080, while this metrics server continues to serve /metrics.

AGENTS.md (3)

347-350: LGTM!

Documentation correctly reflects the new CLI flags with updated naming (--health-server-bindaddress) and default ports (health on 8080, metrics on 9090).

689-690: LGTM!

Server port documentation accurately reflects the new architecture: health endpoints (/healthz, /readyz) on port 8080 and metrics on port 9090.

777-777: LGTM!

Metrics port reference correctly updated to 9090.

pkg/health/readiness.go (3)

7-12: LGTM! Well-designed thread-safe state management.

The ReadinessState struct correctly uses sync.RWMutex to protect concurrent access to the ready and shuttingDown flags.

14-28: Singleton initialization is correct.

Using sync.Once ensures thread-safe lazy initialization of the global state, which is the idiomatic Go pattern for singletons.

30-52: State transition logic is sound.

The locking strategy is correct:

Write operations (SetReady, SetShuttingDown) use Lock()

Read operations (IsReady, IsShuttingDown) use RLock()

SetReady() resetting shuttingDown = false is a reasonable design choice that allows recovery from aborted shutdowns.

cmd/hyperfleet-api/servecmd/cmd.go (5)

17-17: LGTM!

Import addition for the health package aligns with the new readiness state management.

83-93: Well-designed readiness signaling pattern.

The type assertion for ListenNotifier properly waits for the health server to be listening before marking the application as ready. This addresses the race condition where readiness could be set before the server was actually accepting connections.

101-107: Proper shutdown sequencing.

The shutdown sequence correctly marks the application as not ready first (causing /readyz to return 503), then stops the health server. This allows Kubernetes to stop routing traffic before the server shuts down.

115-124: LGTM!

The telemetry shutdown now properly uses shutdownCtx with the configured timeout, and Teardown() is called to clean up database connections. This addresses the previous review comment about unbounded OTel shutdown.

105-113: API and metrics server shutdown calls lack timeout context, while health server properly enforces one.

The apiServer.Stop() and metricsServer.Stop() calls at lines 108-113 execute without timeout enforcement, passing context.Background() to their underlying http.Server.Shutdown() calls. In contrast, healthServer.Stop() properly creates its own timeout context with s.shutdownTimeout. This inconsistency means the API and metrics servers can hang indefinitely during shutdown if they fail to gracefully close connections.

Align all three server shutdowns: either apply a consistent timeout context to all three (such as via the shutdownTimeout pattern), or document why the API and metrics servers differ from the health server's timeout behavior.

Likely an incorrect or invalid review comment.

cmd/hyperfleet-api/server/server.go (1)

20-24: Clean interface design for readiness signaling.

The ListenNotifier interface follows Go idioms well:

Single-method interface enables flexible composition

Read-only channel (<-chan struct{}) correctly signals one-time events

Optional interface pattern allows backward compatibility with servers that don't implement it

pkg/health/handler.go (2)

1-28: LGTM!

The Handler struct and LivenessHandler are well-designed. The liveness check correctly returns a simple 200 OK without any dependency checks, which is the appropriate behavior for Kubernetes liveness probes.

78-80: LGTM!

The success response correctly returns 200 OK with a consistent JSON structure.

test/helper.go (3)

63-63: LGTM!

Field rename from HealthCheckServer to HealthServer aligns with the new naming convention.

184-192: LGTM!

Method correctly renamed and uses the new NewHealthServer() constructor. The log messages referencing "health check server" are acceptable as they describe the server's purpose.

261-263: LGTM!

Method renamed to HealthURL and correctly references Health.BindAddress instead of the old HealthCheck.BindAddress, consistent with the configuration changes.

pkg/config/config.go (4)

16-22: LGTM!

The struct field rename from HealthCheck to Health with corresponding JSON tag change is clean and consistent.

Note: The JSON tag change from "health_check" to "health" is a breaking change for any existing configuration files that use the old key name. Ensure this is documented in migration notes if applicable.

24-32: LGTM!

Constructor properly initializes Health with NewHealthConfig().

39-39: LGTM!

Flag wiring correctly updated for the new Health field.

54-54: LGTM!

ReadFiles wiring correctly updated with the new "Health" display name.

charts/templates/deployment.yaml (2)

56-67: LGTM! Port configuration aligns with the agreed architecture standards.

The three-port layout is correctly implemented:

Port 8000: API server

Port 8080: Health server (/healthz, /readyz)

Port 9090: Metrics server (/metrics)

This matches the team's architecture standards discussion and provides clear separation of concerns for network policies (Prometheus → 9090, kubelet → 8080).

83-98: Probe configuration looks good with one minor observation.

The liveness and readiness probes are correctly configured with the new /healthz and /readyz endpoints. The differentiated timing is appropriate:

Liveness: initialDelaySeconds: 15, periodSeconds: 20 (less aggressive)

Readiness: initialDelaySeconds: 5, periodSeconds: 5 (more responsive)

One consideration: the liveness probe's initialDelaySeconds: 15 combined with periodSeconds: 20 means a failing pod won't be restarted until ~35s after startup. If the application typically starts faster, you could reduce initialDelaySeconds slightly, though the current values are reasonable defaults.

docs/deployment.md (3)

81-82: LGTM! Environment variable documentation correctly reflects the new port defaults.

275-278: Health checks documentation is accurate and clear.

The documentation correctly describes the three probes with their respective ports and expected behavior. The distinction between liveness (process alive) and readiness (ready for traffic, 503 during startup/shutdown) is clearly articulated.

297-297: LGTM! Metrics URL updated to the correct port.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/helper.go (1)
109-116: Metrics server not stopped during teardown.

startMetricsServer() is called at line 116, but stopMetricsServer is not added to the teardowns slice. This means the metrics server will continue running after test teardown, potentially causing resource leaks or port conflicts in subsequent test runs.
🐛 Proposed fix
 		helper.teardowns = []func() error{
 			helper.CleanDB,
 			jwkMockTeardown,
 			helper.stopAPIServer,
+			helper.stopMetricsServer,
 			helper.teardownEnv,
 		}
 		helper.startAPIServer()
 		helper.startMetricsServer()

🤖 Fix all issues with AI agents

In @cmd/hyperfleet-api/servecmd/cmd.go:
- Around line 116-121: The telemetry shutdown uses context.Background() which
ignores the overall shutdown timeout; change the call to use shutdownCtx (the
cancellation context that enforces the shutdown deadline) so
telemetry.Shutdown(shutdownCtx, tp) is invoked (keep the tp nil check) and
retain the existing logging (logger.WithError(ctx, err).Error(...)) so errors
are reported if the shutdown is cancelled or fails.

🧹 Nitpick comments (8)

pkg/config/shutdown.go (1)
33-41: Consider logging a warning for invalid SHUTDOWN_TIMEOUT values.

When time.ParseDuration fails, the invalid value is silently ignored. This could make debugging configuration issues harder if an operator sets an invalid value like SHUTDOWN_TIMEOUT=20 (missing unit suffix).
♻️ Optional improvement to add a warning log
+import (
+	"log/slog"
+	"os"
+	"time"
+
+	"github.com/spf13/pflag"
+)

 // BindEnv binds environment variables to config values
 // Environment variables take precedence over flags
 func (s *ShutdownConfig) BindEnv() {
 	if val := os.Getenv("SHUTDOWN_TIMEOUT"); val != "" {
 		if d, err := time.ParseDuration(val); err == nil {
 			s.Timeout = d
+		} else {
+			slog.Warn("invalid SHUTDOWN_TIMEOUT value, using default",
+				"value", val,
+				"default", s.Timeout,
+				"error", err)
 		}
 	}
 }
pkg/config/shutdown_test.go (1)
10-20: Minor redundancy in default value assertions.

Lines 13-15 and 17-19 both verify the same default timeout value. The second check (explicit 20*time.Second) is redundant since DefaultShutdownTimeout is defined as 20 * time.Second.
♻️ Optional: Remove redundant assertion
 func TestNewShutdownConfig_Defaults(t *testing.T) {
 	cfg := NewShutdownConfig()
 
 	if cfg.Timeout != DefaultShutdownTimeout {
 		t.Errorf("expected default timeout %v, got %v", DefaultShutdownTimeout, cfg.Timeout)
 	}
-
-	if cfg.Timeout != 20*time.Second {
-		t.Errorf("expected timeout 20s, got %v", cfg.Timeout)
-	}
 }
pkg/health/readiness_test.go (1)
93-100: Consider adding test isolation for the singleton test.

This test accesses the global singleton but doesn't reset its state. While it only performs read operations, if the singleton is modified by previous tests (e.g., in parallel execution), it could still pass but leave the global state in an unexpected condition. Consider calling Reset() at the end for consistency with other tests.
♻️ Suggested improvement
 func TestGetReadinessState_ReturnsSingleton(t *testing.T) {
 	state1 := GetReadinessState()
 	state2 := GetReadinessState()
 
 	if state1 != state2 {
 		t.Error("expected GetReadinessState() to return the same instance")
 	}
+
+	// Reset state for other tests
+	state1.Reset()
 }
pkg/health/handlers_test.go (1)

134-168: Consider adding a test for database connection failure.

The TestReadinessHandler_IncludesChecks test uses a nil sessionFactory, which causes the handler to report database: "ok" via the fallback path. Consider adding a test with a mock SessionFactory that returns an error from CheckConnection() to verify the database: "error" path and ensure 503 is returned when the database is unreachable.
pkg/health/handlers.go (2)
10-11: Unused type definition.

CheckFunc is defined but not used anywhere in this file or the tests. If this is intended for future extensibility (e.g., registering custom health checks), consider adding a TODO comment or removing it until needed.

67-77: Consider reporting "skipped" instead of "ok" when sessionFactory is nil.

When sessionFactory is nil, the handler reports database: "ok", which could mask a misconfiguration where the database dependency wasn't properly injected. Reporting "skipped" or "not_configured" would provide clearer diagnostics.
♻️ Suggested improvement
 	// Check database connection
 	if h.sessionFactory != nil {
 		if err := h.sessionFactory.CheckConnection(); err != nil {
 			checks["database"] = "error"
 			allOK = false
 		} else {
 			checks["database"] = "ok"
 		}
 	} else {
-		checks["database"] = "ok"
+		checks["database"] = "skipped"
 	}
pkg/health/readiness.go (2)
8-12: Consider renaming shutting to shuttingDown for clarity.

The field name shutting is less descriptive than the method IsShuttingDown() and SetShuttingDown(). Aligning the field name would improve code readability.
Suggested improvement
 type ReadinessState struct {
 	mu       sync.RWMutex
 	ready    bool
-	shutting bool
+	shuttingDown bool
 }
Then update references in methods accordingly.
62-68: Consider adding documentation about production usage.

The Reset() method is correctly implemented for testing purposes. Consider strengthening the documentation to warn against calling this in production code, as resetting the shutdown state mid-flight could cause unexpected behavior.
Suggested documentation improvement
-// Reset resets the readiness state (useful for testing)
+// Reset resets the readiness state to initial values.
+// WARNING: This method is intended for testing only. Do not call in production
+// as it may cause inconsistent state during graceful shutdown.
 func (r *ReadinessState) Reset() {

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0120ac6 and 5acbbe9.

📒 Files selected for processing (21)

AGENTS.md
README.md
charts/templates/deployment.yaml
charts/templates/service.yaml
charts/values.yaml
cmd/hyperfleet-api/servecmd/cmd.go
cmd/hyperfleet-api/server/healthcheck_server.go
cmd/hyperfleet-api/server/logging/request_logging_middleware.go
cmd/hyperfleet-api/server/metrics_server.go
docs/deployment.md
docs/development.md
docs/hyperfleet-api.http
pkg/config/config.go
pkg/config/health_check.go
pkg/config/shutdown.go
pkg/config/shutdown_test.go
pkg/health/handlers.go
pkg/health/handlers_test.go
pkg/health/readiness.go
pkg/health/readiness_test.go
test/helper.go

💤 Files with no reviewable changes (5)

charts/values.yaml
pkg/config/health_check.go
cmd/hyperfleet-api/server/logging/request_logging_middleware.go
charts/templates/service.yaml
cmd/hyperfleet-api/server/healthcheck_server.go

🧰 Additional context used

📓 Path-based instructions (1)

**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID

Files:

pkg/health/handlers_test.go
pkg/config/shutdown.go
pkg/health/readiness.go
cmd/hyperfleet-api/server/metrics_server.go
cmd/hyperfleet-api/servecmd/cmd.go
pkg/health/readiness_test.go
test/helper.go
pkg/health/handlers.go
pkg/config/config.go
pkg/config/shutdown_test.go

🧠 Learnings (17)

📓 Common learnings

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

Applied to files:

docs/development.md
docs/deployment.md
cmd/hyperfleet-api/server/metrics_server.go
cmd/hyperfleet-api/servecmd/cmd.go
charts/templates/deployment.yaml
README.md
test/helper.go
AGENTS.md
docs/hyperfleet-api.http

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

Applied to files:

docs/development.md
pkg/health/handlers_test.go
docs/deployment.md
pkg/health/readiness.go
cmd/hyperfleet-api/server/metrics_server.go
cmd/hyperfleet-api/servecmd/cmd.go
charts/templates/deployment.yaml
pkg/health/readiness_test.go
README.md
test/helper.go
pkg/health/handlers.go
AGENTS.md
docs/hyperfleet-api.http

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : HTTP handlers must follow the pattern: parse request → call service/DAO → handle errors → send response with appropriate HTTP status codes

Applied to files:

pkg/health/handlers_test.go
cmd/hyperfleet-api/server/metrics_server.go
pkg/health/handlers.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration

Applied to files:

pkg/health/handlers_test.go
cmd/hyperfleet-api/server/metrics_server.go
pkg/health/handlers.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept resource creation/update/delete requests via REST endpoints, validate, and persist to PostgreSQL

Applied to files:

pkg/health/handlers_test.go
cmd/hyperfleet-api/server/metrics_server.go
pkg/health/handlers.go

📚 Learning: 2026-01-13T04:16:08.743Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.

Applied to files:

pkg/health/handlers_test.go
pkg/config/shutdown.go
pkg/health/readiness.go
cmd/hyperfleet-api/server/metrics_server.go
cmd/hyperfleet-api/servecmd/cmd.go
pkg/health/readiness_test.go
test/helper.go
pkg/health/handlers.go
pkg/config/config.go
pkg/config/shutdown_test.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/services/**/*.go : Status aggregation: Phase is 'Ready' if all adapters report Ready=True, 'Failed' if any adapter reports Ready=False, 'NotReady' otherwise

Applied to files:

pkg/health/readiness.go
pkg/health/readiness_test.go

📚 Learning: 2026-01-08T09:40:10.138Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go
cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go
cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go
cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,dao}/**/*.go : Errors must use structured ServiceError type from pkg/errors/ with HttpCode, Code, and Reason fields; send via errors.SendError(w, r, err)

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go
pkg/health/handlers.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Serve resource data to Sentinel via GET /{resourceType} endpoints with pagination and filtering support

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go
pkg/health/handlers.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/environments/**/*.go : Environment-based configuration: use OCM_ENV variable to select development, unit_testing, integration_testing, or production configuration

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,db}/**/*.go : Use context values for passing request-scoped data (session, operation ID, user info) rather than function parameters or global state

Applied to files:

pkg/health/handlers.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Database transactions are automatically managed by TransactionMiddleware for all HTTP requests; store session in context and auto-commit on success, rollback on error

Applied to files:

pkg/health/handlers.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept status updates from adapters via POST /{resourceType}/{id}/statuses and store in adapter_statuses table

Applied to files:

pkg/health/handlers.go

🧬 Code graph analysis (6)

cmd/hyperfleet-api/server/metrics_server.go (3)

pkg/health/handlers.go (1)

NewHandler (19-23)

cmd/hyperfleet-api/environments/types.go (1)

Database (36-38)

pkg/db/session.go (1)

SessionFactory (12-20)

pkg/health/readiness_test.go (1)

pkg/health/readiness.go (2)

ReadinessState (8-12)

GetReadinessState (21-23)

test/helper.go (5)

pkg/db/session.go (1)

SessionFactory (12-20)

pkg/config/config.go (1)

ApplicationConfig (15-22)

cmd/hyperfleet-api/environments/types.go (1)

ApplicationConfig (32-34)

cmd/hyperfleet-api/server/server.go (1)

Server (13-18)

test/factories/factory.go (1)

Factories (9-10)

pkg/health/handlers.go (2)

pkg/db/session.go (1)

SessionFactory (12-20)

pkg/health/readiness.go (1)

GetReadinessState (21-23)

pkg/config/config.go (1)

pkg/config/shutdown.go (2)

ShutdownConfig (15-17)

NewShutdownConfig (19-23)

pkg/config/shutdown_test.go (1)

pkg/config/shutdown.go (2)

NewShutdownConfig (19-23)

DefaultShutdownTimeout (12-12)

🔇 Additional comments (37)

test/helper.go (2)

56-68: LGTM!

The Helper struct field formatting is clean. The removal of HealthCheckServer and related fields aligns with the PR objective to consolidate health endpoints onto the metrics server.

164-179: LGTM!

The startMetricsServer and stopMetricsServer methods follow the same pattern as the API server methods, which is consistent and maintainable.

AGENTS.md (3)

347-352: LGTM!

Documentation accurately reflects the consolidation of health endpoints onto the metrics server and the new graceful shutdown configuration. The shutdown-timeout flag with environment variable binding (SHUTDOWN_TIMEOUT) is properly documented.

690-690: LGTM!

Port documentation correctly updated to reflect the consolidated health and metrics endpoints on port 8080. Based on learnings, this updates the previous port 8083 health check configuration.

775-778: LGTM!

Health endpoints documentation correctly describes /healthz for liveness and /readyz for readiness probes. This aligns with Kubernetes conventions and the PR objectives.

pkg/config/shutdown.go (2)

10-23: LGTM!

The default shutdown timeout of 20 seconds is a reasonable value that allows for connection draining while not delaying shutdown too long. The struct and constructor follow the established patterns in the codebase.

25-27: LGTM!

Flag registration follows the same pattern as other config types in the codebase.

pkg/config/config.go (4)

15-22: LGTM!

The ApplicationConfig struct correctly replaces HealthCheck with Shutdown, aligning with the PR's consolidation of health endpoints into the metrics server and addition of graceful shutdown configuration.

24-33: LGTM!

Constructor properly initializes all config components including the new ShutdownConfig.

35-43: LGTM!

AddFlags correctly includes Shutdown.AddFlags, ensuring the --shutdown-timeout flag is registered.

45-55: Intentional omission: Shutdown.ReadFiles not called.

Note that Shutdown.ReadFiles() is not included in the readFiles slice. This is acceptable since ShutdownConfig.ReadFiles() currently returns nil (no file-based configuration). If file-based shutdown configuration is added later, remember to wire it here.

pkg/config/shutdown_test.go (3)

22-52: LGTM!

Flag registration and parsing tests are thorough, verifying both the default value and custom value parsing via command-line arguments.

54-99: LGTM!

Excellent table-driven test coverage for BindEnv. The test cases cover valid durations (seconds and minutes), invalid duration strings, and empty values. Using t.Setenv ensures proper test isolation.

101-108: LGTM!

Simple but necessary test ensuring ReadFiles doesn't return an error, which is important for interface compliance even though it's currently a no-op.

docs/development.md (1)

98-100: Documentation correctly updated to reflect consolidated health endpoints.

The URLs now accurately point to the new /healthz and /readyz endpoints on port 8080, consistent with the code changes that consolidate health checks onto the metrics server.

README.md (1)

83-85: README correctly documents the new health probe endpoints.

The liveness and readiness probe URLs are accurate and consistent with the implementation in metrics_server.go and other documentation files.

charts/templates/deployment.yaml (1)

79-94: Kubernetes probes correctly configured for consolidated health endpoints.

The liveness (/healthz) and readiness (/readyz) probes are properly configured to use the metrics port. The timing parameters are reasonable:

Liveness: initialDelaySeconds=15, periodSeconds=20 — avoids premature pod restarts during startup

Readiness: initialDelaySeconds=5, periodSeconds=10 — enables quick traffic routing once ready

This configuration aligns with the PR's goal of consolidating health and metrics endpoints on port 8080.

docs/deployment.md (2)

81-84: Environment variable documentation accurately reflects new configuration.

The METRICS_PORT description correctly notes it now serves both metrics and health endpoints. The new SHUTDOWN_TIMEOUT variable is properly documented with its default value of 20s.

277-279: Health check documentation clearly describes probe behavior.

The descriptions accurately differentiate liveness (process alive check) from readiness (traffic acceptance check with 503 during startup/shutdown), consistent with the implementation.

cmd/hyperfleet-api/server/metrics_server.go (1)

25-28: Health endpoints correctly integrated into metrics server.

Both LivenessHandler and ReadinessHandler methods exist on the Handler type with proper signatures for use as HTTP handlers. The handler is properly instantiated with the SessionFactory dependency, and both /healthz (liveness) and /readyz (readiness) endpoints are correctly registered with GET method restrictions. The ReadinessHandler includes database connectivity checks via sessionFactory.CheckConnection(), ensuring the endpoint accurately reflects system readiness status.

docs/hyperfleet-api.http (1)

195-200: LGTM!

The health endpoint documentation correctly reflects the consolidation to the metrics port (8080) with standard Kubernetes probe naming conventions (/healthz for liveness, /readyz for readiness). This aligns with the PR objective and replaces the previous /healthcheck endpoint. Based on learnings, the health check was previously on port 8083 and is now properly consolidated.

pkg/health/readiness_test.go (3)

7-17: LGTM!

Initial state test correctly validates the zero-value behavior of ReadinessState.

19-57: LGTM!

State transition tests (SetReady, SetNotReady, SetShuttingDown) comprehensively verify the state machine behavior, including the important invariant that IsReady() returns false when shutting down.

59-73: LGTM!

Reset() test properly verifies both flags are cleared.

pkg/health/handlers_test.go (4)

10-33: LGTM!

Liveness handler test correctly validates HTTP 200 response with JSON body containing status: "ok".

35-66: LGTM!

Readiness handler test properly validates the happy path with state reset and cleanup.

68-132: LGTM!

Error path tests comprehensively cover both "not ready" and "shutting down" scenarios, including the shutdown message verification.

170-209: LGTM!

Content-Type header tests for both handlers ensure proper JSON response formatting.

cmd/hyperfleet-api/servecmd/cmd.go (3)

47-54: LGTM!

Shutdown configuration binding and logging are properly implemented, providing visibility into the configured timeout.

83-90: LGTM!

The readiness signaling and signal handling are well-implemented. Using signal.NotifyContext is the idiomatic Go approach for graceful shutdown, and marking the application ready after servers start ensures probes reflect actual availability.

128-135: LGTM!

The shutdown completion/timeout handling with select is correct. The forced exit on timeout ensures the process doesn't hang indefinitely.

pkg/health/handlers.go (2)

32-43: LGTM!

Liveness handler correctly implements a minimal process-alive check that always returns 200 OK. This is appropriate for Kubernetes liveness probes, which should only fail if the process is unhealthy and needs to be restarted.

79-103: LGTM!

The readiness check logic correctly aggregates multiple health checks and returns appropriate status codes. The check ordering (shutdown → database → ready flag) ensures early exit for shutdown scenarios.

pkg/health/readiness.go (4)

14-23: LGTM!

The singleton pattern is appropriate for application-wide readiness state. Package-level initialization is thread-safe in Go, and the Reset() method addresses testability concerns.

25-37: LGTM!

Thread-safe setters with proper mutex locking. The implementation is correct and straightforward.

39-46: LGTM!

Atomically setting both shutting and ready under a single lock ensures consistent state during shutdown transitions. Well documented.

48-60: LGTM!

Correct use of RLock for concurrent read access. The !r.shutting check in IsReady() provides defensive protection against incorrect state transitions, ensuring readiness is never reported during shutdown even if SetReady() were mistakenly called after SetShuttingDown().

cmd/hyperfleet-api/servecmd/cmd.go

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @cmd/hyperfleet-api/servecmd/cmd.go:
- Around line 101-126: The Stop calls are using unbounded contexts — update the
shutdown flow so apiServer.Stop and metricsServer.Stop accept a context (or at
least use the existing shutdownCtx) and use it when calling
http.Server.Shutdown; specifically change the Stop signatures to Stop(ctx
context.Context) (or add an overload) and invoke apiServer.Stop(shutdownCtx) and
metricsServer.Stop(shutdownCtx) from this goroutine (or, if you cannot change
signatures, have each Stop create a derived context with the same timeout as
shutdownCtx and use that for http.Server.Shutdown) so that the per-server
shutdown honors the timeout the caller created (shutdownCtx) just like
telemetry.Shutdown(shutdownCtx, tp).

🧹 Nitpick comments (1)

pkg/config/shutdown.go (1)
33-41: Silent error on invalid SHUTDOWN_TIMEOUT may confuse operators.

When time.ParseDuration fails, the error is silently ignored, and the timeout falls back to the default. This could mask misconfiguration (e.g., SHUTDOWN_TIMEOUT=20 without a unit).

Consider logging a warning when parsing fails so operators are aware their configuration was not applied.
🔧 Suggested improvement
 func (s *ShutdownConfig) BindEnv() {
 	if val := os.Getenv("SHUTDOWN_TIMEOUT"); val != "" {
 		if d, err := time.ParseDuration(val); err == nil {
 			s.Timeout = d
+		} else {
+			// Log warning - requires importing logger or returning error
+			// Consider: log.Printf("warning: invalid SHUTDOWN_TIMEOUT %q, using default: %v", val, err)
 		}
 	}
 }
Alternatively, change the signature to BindEnv() error to propagate the error to the caller, consistent with how other config validation might be handled.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5acbbe9 and f7a96ea.

📒 Files selected for processing (6)

AGENTS.md
cmd/hyperfleet-api/servecmd/cmd.go
docs/deployment.md
pkg/config/config.go
pkg/config/shutdown.go
pkg/config/shutdown_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

pkg/config/shutdown_test.go
AGENTS.md

🧰 Additional context used

📓 Path-based instructions (1)

**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID

Files:

cmd/hyperfleet-api/servecmd/cmd.go
pkg/config/shutdown.go
pkg/config/config.go

🧠 Learnings (8)

📓 Common learnings

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go
docs/deployment.md

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go
docs/deployment.md

📚 Learning: 2026-01-08T09:40:10.138Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/environments/**/*.go : Environment-based configuration: use OCM_ENV variable to select development, unit_testing, integration_testing, or production configuration

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-13T04:16:08.743Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go
pkg/config/shutdown.go
pkg/config/config.go

🧬 Code graph analysis (2)

cmd/hyperfleet-api/servecmd/cmd.go (4)

cmd/hyperfleet-api/environments/framework.go (1)

Environment (55-57)

pkg/telemetry/otel.go (1)

Shutdown (67-72)

pkg/logger/logger.go (4)

With (276-281)

Info (253-255)

WithError (284-289)

Error (265-267)

pkg/health/readiness.go (1)

GetReadinessState (21-23)

pkg/config/config.go (1)

pkg/config/shutdown.go (2)

ShutdownConfig (15-17)

NewShutdownConfig (19-23)

🔇 Additional comments (11)

pkg/config/shutdown.go (2)

1-13: LGTM!

Package setup and constant definition are clean and well-documented. The 20-second default is a reasonable timeout for graceful shutdown.

15-27: LGTM!

The struct definition, constructor, and flag wiring follow the established patterns used by other config types in this package (e.g., ServerConfig, MetricsConfig).

pkg/config/config.go (3)

15-22: LGTM!

The Shutdown field is properly added to ApplicationConfig, replacing the removed HealthCheck field. The JSON tag follows the snake_case convention used by other fields.

24-43: LGTM!

Constructor and flag wiring are correctly updated to include ShutdownConfig initialization and flag binding, consistent with other config subsystems.

45-64: Shutdown.ReadFiles intentionally omitted from ReadFiles.

ShutdownConfig.ReadFiles() currently returns nil with no file-based configuration. Omitting it from this list is reasonable, but if file-based shutdown config is added later, remember to include it here.

cmd/hyperfleet-api/servecmd/cmd.go (4)

16-16: LGTM!

Health package import added for readiness state management.

47-54: LGTM!

Environment binding and structured logging of the shutdown timeout follow the codebase conventions. Based on learnings, this correctly uses logger.With(ctx, ...) for structured logging.

83-90: LGTM!

The readiness state is correctly set after servers start, and signal.NotifyContext is the idiomatic Go pattern for signal handling with context cancellation.

128-135: LGTM!

The select pattern correctly handles both graceful completion and timeout scenarios. Using os.Exit(1) on timeout ensures the process terminates even if shutdown hangs.

docs/deployment.md (2)

81-84: LGTM!

Documentation correctly reflects the updated port configuration and new SHUTDOWN_TIMEOUT environment variable with its default value.

277-279: LGTM!

Health check documentation accurately describes the consolidated endpoints on port 8080, matching the implementation. The distinction between liveness (/healthz) and readiness (/readyz) behavior is clearly explained. Based on learnings, this updates the previous documentation that referenced /healthcheck on a separate port.

cmd/hyperfleet-api/servecmd/cmd.go

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

test/helper.go (1)
109-116: Metrics server not included in teardowns.

The metrics server is started at line 116 but stopMetricsServer is not added to the teardowns slice. This leaves the metrics server goroutine running and port 8080 bound after tests complete.
🔧 Suggested fix
 	helper.teardowns = []func() error{
 		helper.CleanDB,
 		jwkMockTeardown,
 		helper.stopAPIServer,
+		helper.stopMetricsServer,
 		helper.teardownEnv,
 	}
pkg/config/config.go (1)

45-64: Add Shutdown.ReadFiles() to the ApplicationConfig.ReadFiles() method for consistency.

Shutdown.ReadFiles() is missing from the configuration loading in ReadFiles(), even though all other config types (Server, Database, OCM, Metrics, Logging) are included. For consistency, it should be added to the list, similar to Server, Metrics, and Logging which also return nil.

cmd/hyperfleet-api/server/api_server.go (1)

140-155: Remove database close from Start() method; it conflicts with graceful shutdown orchestration.

The Start() method closes the database connection after Serve() returns (lines 152-154). During graceful shutdown, apiServer.Stop(ctx) is called from cmd.go:109, which triggers HTTP server shutdown and causes Serve() to return. This immediately closes the database. Subsequently, cmd.go:124 calls environments.Environment().Teardown(), which closes the database again. This results in a double-close of the SessionFactory during every graceful shutdown, violating the code's own comment that this must be called "exactly once during an app's lifetime."

Move database cleanup responsibility entirely to the shutdown orchestrator in cmd.go (already calling Teardown() at line 124) and remove the redundant close from the Start() method.

🧹 Nitpick comments (1)

pkg/config/shutdown.go (1)
33-41: Silent failure on invalid SHUTDOWN_TIMEOUT value.

If the environment variable contains an invalid duration string, the parse error is silently ignored and the default/flag value is retained. This could lead to unexpected behavior if an operator misconfigures the timeout (e.g., SHUTDOWN_TIMEOUT=20 without a unit).

Consider logging a warning when parsing fails so operators are aware of the misconfiguration.
♻️ Suggested improvement
+import (
+	"context"
+	"os"
+	"time"
+
+	"github.com/openshift-hyperfleet/hyperfleet-api/pkg/logger"
+	"github.com/spf13/pflag"
+)

 func (s *ShutdownConfig) BindEnv() {
 	if val := os.Getenv("SHUTDOWN_TIMEOUT"); val != "" {
 		if d, err := time.ParseDuration(val); err == nil {
 			s.Timeout = d
+		} else {
+			logger.Warn(context.Background(), "Invalid SHUTDOWN_TIMEOUT value, using default", "value", val, "error", err)
 		}
 	}
 }

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7a96ea and 28df995.

📒 Files selected for processing (10)

AGENTS.md
cmd/hyperfleet-api/servecmd/cmd.go
cmd/hyperfleet-api/server/api_server.go
cmd/hyperfleet-api/server/metrics_server.go
cmd/hyperfleet-api/server/server.go
docs/deployment.md
pkg/config/config.go
pkg/config/shutdown.go
pkg/config/shutdown_test.go
test/helper.go

🚧 Files skipped from review as they are similar to previous changes (3)

pkg/config/shutdown_test.go
cmd/hyperfleet-api/servecmd/cmd.go
docs/deployment.md

🧰 Additional context used

📓 Path-based instructions (1)

**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID

Files:

cmd/hyperfleet-api/server/server.go
cmd/hyperfleet-api/server/api_server.go
test/helper.go
pkg/config/shutdown.go
cmd/hyperfleet-api/server/metrics_server.go
pkg/config/config.go

🧠 Learnings (12)

📓 Common learnings

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

Applied to files:

cmd/hyperfleet-api/server/server.go
cmd/hyperfleet-api/server/api_server.go
test/helper.go
cmd/hyperfleet-api/server/metrics_server.go
AGENTS.md

📚 Learning: 2026-01-13T04:16:08.743Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.

Applied to files:

cmd/hyperfleet-api/server/server.go
cmd/hyperfleet-api/server/api_server.go
test/helper.go
pkg/config/shutdown.go
cmd/hyperfleet-api/server/metrics_server.go
pkg/config/config.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

Applied to files:

test/helper.go
cmd/hyperfleet-api/server/metrics_server.go
AGENTS.md

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go

📚 Learning: 2026-01-08T09:40:10.138Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : HTTP handlers must follow the pattern: parse request → call service/DAO → handle errors → send response with appropriate HTTP status codes

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept resource creation/update/delete requests via REST endpoints, validate, and persist to PostgreSQL

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,dao}/**/*.go : Errors must use structured ServiceError type from pkg/errors/ with HttpCode, Code, and Reason fields; send via errors.SendError(w, r, err)

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Serve resource data to Sentinel via GET /{resourceType} endpoints with pagination and filtering support

Applied to files:

cmd/hyperfleet-api/server/metrics_server.go

🧬 Code graph analysis (4)

cmd/hyperfleet-api/server/api_server.go (1)

pkg/telemetry/otel.go (1)

Shutdown (67-72)

test/helper.go (5)

pkg/db/session.go (1)

SessionFactory (12-20)

pkg/config/config.go (1)

ApplicationConfig (15-22)

cmd/hyperfleet-api/environments/types.go (1)

ApplicationConfig (32-34)

cmd/hyperfleet-api/server/server.go (1)

Server (13-18)

test/factories/factory.go (1)

Factories (9-10)

cmd/hyperfleet-api/server/metrics_server.go (3)

pkg/health/handlers.go (1)

NewHandler (19-23)

cmd/hyperfleet-api/environments/types.go (1)

Database (36-38)

pkg/db/session.go (1)

SessionFactory (12-20)

pkg/config/config.go (1)

pkg/config/shutdown.go (2)

ShutdownConfig (15-17)

NewShutdownConfig (19-23)

🔇 Additional comments (15)

test/helper.go (2)

56-68: LGTM!

The Helper struct cleanly removes the HealthCheckServer field, aligning with the PR's consolidation of health endpoints onto the metrics server.

157-179: LGTM!

The Stop calls now correctly pass context.Background() to match the updated Server interface signature. Using context.Background() is appropriate for test teardown where there's no parent context or timeout requirement.

AGENTS.md (3)

347-352: LGTM!

Documentation accurately reflects the consolidation of health endpoints onto the metrics server and the new graceful shutdown configuration. The --shutdown-timeout flag with SHUTDOWN_TIMEOUT environment variable is clearly documented.

690-691: LGTM!

Server configuration section correctly updated to show all three endpoints (/metrics, /healthz, /readyz) served on port 8080.

775-778: LGTM!

Health endpoints documentation is clear and accurate:

/healthz for liveness (process alive check)

/readyz for readiness (database connection check)

This aligns with Kubernetes probe conventions and the implementation in pkg/health/handlers.go.

pkg/config/shutdown.go (2)

10-13: LGTM!

The 20-second default shutdown timeout is a reasonable choice that provides adequate time for connection draining while not extending shutdown unnecessarily.

15-23: LGTM!

Clean struct definition and constructor following the established config pattern in the codebase.

pkg/config/config.go (3)

15-22: LGTM!

The Shutdown field is correctly added to ApplicationConfig, replacing the removed HealthCheck configuration. The struct maintains consistency with other config fields.

24-33: LGTM!

Constructor properly initializes Shutdown via NewShutdownConfig(), ensuring the default timeout is set.

35-43: LGTM!

The --shutdown-timeout flag is correctly wired through c.Shutdown.AddFlags(flagset).

cmd/hyperfleet-api/server/metrics_server.go (2)

25-29: LGTM!

Health endpoints are correctly integrated:

/healthz for liveness probe (process alive)

/readyz for readiness probe (database connectivity)

Handler properly initialized with SessionFactory for database health checks

HTTP methods correctly restricted to GET

This consolidates health checks onto the metrics server as intended. Based on learnings, this replaces the previous /healthcheck endpoint on port 8083.

77-79: LGTM!

The Stop method correctly accepts a context and passes it to http.Server.Shutdown(ctx), enabling graceful shutdown with timeout control from the caller.

cmd/hyperfleet-api/server/server.go (2)

13-18: LGTM! Context-aware Stop signature enables proper graceful shutdown.

Adding context.Context to the Stop method allows callers to control shutdown timeouts and cancellation, which is the correct pattern for graceful shutdown in Go.

27-34: Correct handling of server closed error.

The check function appropriately filters out http.ErrServerClosed, which is expected during graceful shutdown and should not cause an exit.

cmd/hyperfleet-api/server/api_server.go (1)

157-158: LGTM! Context is correctly propagated to HTTP server shutdown.

The implementation properly passes the context to httpServer.Shutdown(ctx), allowing the graceful shutdown timeout configured via SHUTDOWN_TIMEOUT to be respected. This is consistent with similar patterns in pkg/telemetry/otel.go:Shutdown.

charts/templates/deployment.yaml

pkg/health/handlers.go

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cmd/hyperfleet-api/servecmd/cmd.go (1)
110-114: Use a timeout context for OTel shutdown instead of context.Background().

The current implementation at line 111 passes an unbounded context to telemetry.Shutdown(), which could hang indefinitely if OTel exporters are slow or unresponsive. Create a shutdown context with a reasonable timeout (e.g., 5–30 seconds) to prevent blocking graceful shutdown.
if tp != nil {
	shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()
	if err := telemetry.Shutdown(shutdownCtx, tp); err != nil {
		logger.WithError(ctx, err).Error("Failed to shutdown OpenTelemetry")
	}
}

🤖 Fix all issues with AI agents

In `@cmd/hyperfleet-api/server/health_server.go`:
- Around line 72-74: Stop() currently uses context.Background() and thus has no
deadline; change it to use a context with a timeout (configurable via
SHUTDOWN_TIMEOUT or a healthServer field) so Shutdown cannot hang indefinitely.
Specifically, add a shutdown timeout value (e.g., parse SHUTDOWN_TIMEOUT into a
time.Duration at server construction or add a shutdownTimeout time.Duration
field on healthServer), then implement Stop() to create a context with
context.WithTimeout(context.Background(), s.shutdownTimeout), defer cancel(),
and pass that context to s.httpServer.Shutdown; ensure the parsed timeout has a
sensible default (20s) and propagate/return the Shutdown error as before.

🧹 Nitpick comments (4)

pkg/health/handler.go (2)
68-75: Use PingContext with request context to prevent hanging readiness checks.

sqlDB.Ping() has no timeout and could hang indefinitely if the database is unresponsive, causing Kubernetes readiness probes to timeout unpredictably. Use PingContext with the request's context to respect probe timeouts.
♻️ Suggested fix
-		if err := sqlDB.Ping(); err != nil {
+		if err := sqlDB.PingContext(r.Context()); err != nil {
30-79: Consider logging readiness failures for operational visibility.

Per coding guidelines, structured logging via logger.With(ctx, ...) should be used. Logging readiness failures (shutdown, not ready, DB unavailable) would help with debugging during deployment issues.
pkg/config/metrics.go (1)
24-24: Minor: Typo in flag description.

"adddress" has an extra 'd'. This is pre-existing but worth fixing while you're in this file.
✏️ Typo fix
-	fs.StringVar(&s.BindAddress, "metrics-server-bindaddress", s.BindAddress, "Metrics server bind adddress")
+	fs.StringVar(&s.BindAddress, "metrics-server-bindaddress", s.BindAddress, "Metrics server bind address")
cmd/hyperfleet-api/servecmd/cmd.go (1)
83-88: Potential race condition: SetReady() called before health server is listening.

The healthServer.Start() is called in a goroutine, and SetReady() is invoked immediately after. There's a window where the readiness state is set to ready before the health server's HTTP listener is fully bound and accepting connections. This could cause the readiness probe to fail initially if Kubernetes checks /readyz before the server is actually listening.

Consider having the health server signal when it's ready to accept connections, or add a brief synchronization mechanism.
♻️ One approach: Use a channel to signal server readiness
// In health_server.go, add a Ready channel
type healthServer struct {
    httpServer *http.Server
    Ready      chan struct{}
}

func (s *healthServer) Start() {
    close(s.Ready) // Signal ready after listener is bound
    // ... existing start logic
}
 healthServer := server.NewHealthServer()
 go healthServer.Start()
+
+// Wait for health server to be ready (with timeout)
+select {
+case <-healthServer.Ready:
+case <-time.After(5 * time.Second):
+    logger.Error(ctx, "Health server failed to start in time")
+}

 // Mark application as ready to receive traffic
 health.GetReadinessState().SetReady()

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 28df995 and b421a83.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (21)

.claude/settings.local.json
AGENTS.md
README.md
charts/templates/deployment.yaml
charts/templates/service.yaml
charts/values.yaml
cmd/hyperfleet-api/servecmd/cmd.go
cmd/hyperfleet-api/server/health_server.go
cmd/hyperfleet-api/server/healthcheck_server.go
cmd/hyperfleet-api/server/metrics_server.go
docs/deployment.md
docs/development.md
docs/hyperfleet-api.http
go.mod
pkg/config/config.go
pkg/config/health.go
pkg/config/health_check.go
pkg/config/metrics.go
pkg/health/handler.go
pkg/health/readiness.go
test/helper.go

💤 Files with no reviewable changes (2)

pkg/config/health_check.go
cmd/hyperfleet-api/server/healthcheck_server.go

✅ Files skipped from review due to trivial changes (1)

.claude/settings.local.json

🚧 Files skipped from review as they are similar to previous changes (7)

cmd/hyperfleet-api/server/metrics_server.go
test/helper.go
pkg/health/readiness.go
docs/development.md
docs/deployment.md
charts/values.yaml
charts/templates/service.yaml

🧰 Additional context used

📓 Path-based instructions (1)

**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID

Files:

pkg/config/health.go
pkg/health/handler.go
pkg/config/config.go
pkg/config/metrics.go
cmd/hyperfleet-api/servecmd/cmd.go
cmd/hyperfleet-api/server/health_server.go

🧠 Learnings (20)

📓 Common learnings

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: go.mod:32-36
Timestamp: 2026-01-13T04:19:50.012Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, the team has decided to keep OpenTelemetry core packages (go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk, go.opentelemetry.io/otel/trace, go.opentelemetry.io/otel/exporters/stdout/stdouttrace) at v1.38.0 rather than upgrading to v1.39.0 in PR `#31`.

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

Applied to files:

README.md
pkg/config/health.go
pkg/config/config.go
pkg/config/metrics.go
AGENTS.md
cmd/hyperfleet-api/servecmd/cmd.go
cmd/hyperfleet-api/server/health_server.go
docs/hyperfleet-api.http
charts/templates/deployment.yaml

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

Applied to files:

README.md
pkg/config/health.go
pkg/health/handler.go
pkg/config/config.go
AGENTS.md
cmd/hyperfleet-api/servecmd/cmd.go
cmd/hyperfleet-api/server/health_server.go
docs/hyperfleet-api.http
charts/templates/deployment.yaml

📚 Learning: 2026-01-13T04:19:50.012Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: go.mod:32-36
Timestamp: 2026-01-13T04:19:50.012Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, the team has decided to keep OpenTelemetry core packages (go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk, go.opentelemetry.io/otel/trace, go.opentelemetry.io/otel/exporters/stdout/stdouttrace) at v1.38.0 rather than upgrading to v1.39.0 in PR `#31`.

Applied to files:

go.mod
docs/hyperfleet-api.http

📚 Learning: 2026-01-13T10:10:34.485Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/telemetry/otel.go:10-10
Timestamp: 2026-01-13T10:10:34.485Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository (PR `#31`), the team has decided to keep the OpenTelemetry semantic conventions package at v1.37.0 (go.opentelemetry.io/otel/semconv/v1.37.0) while using the SDK at v1.38.0 in pkg/telemetry/otel.go.

Applied to files:

go.mod
docs/hyperfleet-api.http

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment

Applied to files:

go.mod
cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to **/*.go : Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)

Applied to files:

go.mod

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/db/**/*.go : Use PostgreSQL 14.2 with GORM ORM for database operations

Applied to files:

go.mod

📚 Learning: 2026-01-13T04:16:08.743Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.

Applied to files:

pkg/config/health.go
pkg/health/handler.go
pkg/config/config.go
pkg/config/metrics.go
cmd/hyperfleet-api/servecmd/cmd.go
cmd/hyperfleet-api/server/health_server.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : HTTP handlers must follow the pattern: parse request → call service/DAO → handle errors → send response with appropriate HTTP status codes

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Database transactions are automatically managed by TransactionMiddleware for all HTTP requests; store session in context and auto-commit on success, rollback on error

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept resource creation/update/delete requests via REST endpoints, validate, and persist to PostgreSQL

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,dao}/**/*.go : Errors must use structured ServiceError type from pkg/errors/ with HttpCode, Code, and Reason fields; send via errors.SendError(w, r, err)

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,db}/**/*.go : Use context values for passing request-scoped data (session, operation ID, user info) rather than function parameters or global state

Applied to files:

pkg/health/handler.go
cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration

Applied to files:

pkg/health/handler.go
pkg/config/metrics.go
AGENTS.md

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept status updates from adapters via POST /{resourceType}/{id}/statuses and store in adapter_statuses table

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Serve resource data to Sentinel via GET /{resourceType} endpoints with pagination and filtering support

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/dao/**/*.go : All DAO methods must accept context.Context as the first parameter for transaction propagation

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T09:40:10.138Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

🧬 Code graph analysis (3)

pkg/health/handler.go (2)

pkg/db/session.go (1)

SessionFactory (12-20)

pkg/health/readiness.go (1)

GetReadinessState (20-28)

pkg/config/config.go (1)

pkg/config/health.go (2)

HealthConfig (7-10)

NewHealthConfig (12-17)

cmd/hyperfleet-api/servecmd/cmd.go (4)

cmd/hyperfleet-api/server/health_server.go (1)

NewHealthServer (16-33)

pkg/health/readiness.go (1)

GetReadinessState (20-28)

pkg/logger/logger.go (3)

Info (253-255)

WithError (284-289)

Error (265-267)

cmd/hyperfleet-api/environments/framework.go (1)

Environment (55-57)

🔇 Additional comments (20)

go.mod (1)

128-128: LGTM - Indirect dependencies added.

The new indirect dependencies (go.yaml.in/yaml/v2 and sigs.k8s.io/yaml) are likely pulled in transitively by other packages in the dependency tree. No direct action required.

Also applies to: 140-140

README.md (1)

83-85: Documentation updated to reflect new endpoint architecture.

The endpoint changes align with the PR objectives:

Health endpoints (/healthz, /readyz) consolidated on port 8080

Metrics moved to port 9090

This is a breaking change from the previous architecture (health on 8083, metrics on 8080). Based on learnings, ensure Kubernetes deployment manifests and any external monitoring configurations are updated accordingly.

pkg/health/handler.go (2)

1-20: LGTM - Handler structure with dependency injection.

Good use of dependency injection for the sessionFactory. Making it optional (nil-safe in ReadinessHandler) allows flexibility for different deployment scenarios.

22-28: LGTM - Simple liveness check.

Liveness probe correctly returns 200 OK unconditionally. This is appropriate since liveness should only indicate the process is running, not dependent on external resources.

cmd/hyperfleet-api/server/health_server.go (2)

16-33: LGTM - Health server setup with proper endpoint registration.

Good setup with:

NotFoundHandler for unmatched routes

Health handler integration with session factory for DB checks

Proper configuration binding via env().Config.Health.BindAddress

48-70: LGTM - Start method with TLS support and proper error handling.

Good implementation:

TLS configuration check before starting HTTPS

Proper handling of http.ErrServerClosed vs other errors

Structured logging with logger.With() per coding guidelines

pkg/config/metrics.go (1)

15-21: LGTM - Metrics port updated to 9090.

The default port change from 8080 to 9090 correctly aligns with the PR's architectural change: health endpoints now occupy port 8080, while metrics move to 9090.

docs/hyperfleet-api.http (1)

7-8: LGTM! Documentation correctly reflects the new port architecture.

The port assignments (health on 8080, metrics on 9090) and endpoint paths (/healthz, /readyz) align with the team's agreed standards and Kubernetes conventions.

Also applies to: 196-201, 205-205

charts/templates/deployment.yaml (2)

56-57: LGTM! Port and flag configuration correctly updated.

The health server bind address and metrics server bind address flags are properly configured with the new defaults (:8080 and :9090 respectively), consistent with the HealthConfig and MetricsConfig defaults.

Also applies to: 62-66

83-98: LGTM! Probe configuration follows Kubernetes best practices.

The probe timing configuration is well-suited for graceful shutdown:

Liveness probe (/healthz) has longer intervals (15s initial, 20s period) to avoid unnecessary restarts during normal operation.

Readiness probe (/readyz) has shorter intervals (5s initial, 5s period) for faster traffic draining during shutdown.

The readiness failureThreshold: 3 with periodSeconds: 5 means traffic will be drained within ~15 seconds of marking the service as shutting down, which aligns well with the default SHUTDOWN_TIMEOUT of 20s.

AGENTS.md (2)

347-350: LGTM! CLI flag documentation updated correctly.

The flag names and defaults match the implementation in pkg/config/health.go and pkg/config/metrics.go.

689-690: LGTM! Server configuration documentation reflects the new architecture.

The port assignments and endpoint paths are correctly documented, aligning with the deployment templates and configuration defaults.

Also applies to: 777-777

pkg/config/health.go (1)

7-26: LGTM! Clean configuration struct following established patterns.

The HealthConfig struct follows the same conventions as other configuration types in the package, with proper JSON tags, a constructor with sensible defaults, flag binding via AddFlags, and a ReadFiles stub for consistency with the configuration interface.

cmd/hyperfleet-api/servecmd/cmd.go (3)

17-17: LGTM! Health package import added for readiness state management.

96-102: LGTM! Shutdown sequence correctly marks service as not ready before stopping servers.

The graceful shutdown flow properly:

Marks the service as shutting down (returns 503 on /readyz)

Stops the health server first to stop accepting new health checks

Then stops API and metrics servers

This allows Kubernetes to drain traffic before the application fully shuts down.

116-117: LGTM! Environment teardown ensures proper resource cleanup.

Calling environments.Environment().Teardown() at the end of shutdown ensures database connections and other resources are properly released.

pkg/config/config.go (4)

15-22: LGTM!

The struct field rename from HealthCheck to Health is clean, and the JSON tag "health" follows proper conventions.

24-33: LGTM!

Constructor correctly initializes Health with NewHealthConfig(). Based on learnings, this changes the health endpoint from port 8083 to 8080, which aligns with the PR objective to consolidate health endpoints on the metrics server port.

35-43: LGTM!

Flag registration updated consistently to use c.Health.AddFlags(flagset).

45-65: LGTM!

ReadFiles wiring updated consistently with the display name "Health" for error reporting.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

cmd/hyperfleet-api/server/health_server.go

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

charts/templates/service.yaml (1)
14-25: Duplicate port name "health" will cause Kubernetes validation errors.

The Service defines two ports both named "health" (lines 17 and 21). Kubernetes requires unique port names within a Service. The old port 8083 entry (lines 14-17) should be removed since health endpoints have moved to port 8080.
🐛 Proposed fix: Remove the deprecated 8083 port entry
   - port: {{ .Values.service.port }}
     targetPort: http
     protocol: TCP
     name: http
-  - port: 8083
-    targetPort: health
-    protocol: TCP
-    name: health
   - port: 8080
     targetPort: health
     protocol: TCP
     name: health
   - port: 9090
     targetPort: metrics
     protocol: TCP
     name: metrics

🤖 Fix all issues with AI agents

In `@cmd/hyperfleet-api/servecmd/cmd.go`:
- Around line 83-88: Readiness is being set immediately after spawning the
health server goroutine, creating a race where probes may hit before
ListenAndServe binds; modify server.NewHealthServer()/healthServer.Start() to
provide a readiness signal (e.g., Start returns a channel, error or exposes a
method like WaitUntilListening/NotifyListening) and only call
health.GetReadinessState().SetReady() after that signal confirms the server is
actually listening (reference healthServer.Start(), server.NewHealthServer(),
and ListenAndServe).

♻️ Duplicate comments (2)

cmd/hyperfleet-api/server/health_server.go (1)

45-50: Empty Listen() and Serve() methods remain no-ops.

These methods exist to satisfy the Server interface but do nothing, while Start() combines both responsibilities. This was flagged in a previous review as inconsistent with other server implementations like apiServer. If the interface contract expects callers to use Listen() → Serve() separately, this could cause subtle bugs. Consider documenting this design choice or creating a simpler interface for servers that only use Start()/Stop().
cmd/hyperfleet-api/servecmd/cmd.go (1)
110-114: OTel shutdown still uses context.Background() instead of a bounded context.

The past review flagged this issue and it was marked as addressed in commit f7a96ea, but the code still shows context.Background(). If the OTel exporters are slow or unresponsive, this could cause the shutdown to hang indefinitely. Consider using a context with timeout.
#!/bin/bash
# Verify the current state of telemetry.Shutdown usage
rg -n "telemetry.Shutdown" --type go -C 3

🧹 Nitpick comments (2)

cmd/hyperfleet-api/server/health_server.go (1)
69-73: Inconsistent logging pattern at line 72.

Lines 63 and 66 use logger.With(ctx, ...).Info(...) for structured logging with context fields, but line 72 uses logger.Info(ctx, ...). For consistency and to follow the structured logging pattern, consider aligning this call.
♻️ Suggested fix
 	if err != nil && err != http.ErrServerClosed {
 		check(err, "Health server terminated with errors")
 	} else {
-		logger.Info(ctx, "Health server terminated")
+		logger.With(ctx).Info("Health server terminated")
 	}
test/helper.go (1)
109-119: Health server not added to teardown list.

The health server is started but stopHealthServer is not added to the teardown list, which means it won't be stopped during test cleanup. The metrics server has the same pattern (started but not in teardowns), but for completeness, consider adding both to ensure proper cleanup.
Suggested fix

First, add a stopHealthServer method:
func (helper *Helper) stopHealthServer() error {
	if err := helper.HealthServer.Stop(); err != nil {
		return fmt.Errorf("unable to stop health server: %s", err.Error())
	}
	return nil
}
Then update the teardowns list:
 helper.teardowns = []func() error{
 	helper.CleanDB,
 	jwkMockTeardown,
 	helper.stopAPIServer,
+	helper.stopMetricsServer,
+	helper.stopHealthServer,
 	helper.teardownEnv,
 }

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b421a83 and ea69bfb.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (21)

.claude/settings.local.json
AGENTS.md
README.md
charts/templates/deployment.yaml
charts/templates/service.yaml
charts/values.yaml
cmd/hyperfleet-api/servecmd/cmd.go
cmd/hyperfleet-api/server/health_server.go
cmd/hyperfleet-api/server/healthcheck_server.go
cmd/hyperfleet-api/server/metrics_server.go
docs/deployment.md
docs/development.md
docs/hyperfleet-api.http
go.mod
pkg/config/config.go
pkg/config/health.go
pkg/config/health_check.go
pkg/config/metrics.go
pkg/health/handler.go
pkg/health/readiness.go
test/helper.go

💤 Files with no reviewable changes (2)

cmd/hyperfleet-api/server/healthcheck_server.go
pkg/config/health_check.go

🚧 Files skipped from review as they are similar to previous changes (5)

charts/values.yaml
go.mod
docs/development.md
cmd/hyperfleet-api/server/metrics_server.go
README.md

🧰 Additional context used

📓 Path-based instructions (1)

**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID

Files:

cmd/hyperfleet-api/servecmd/cmd.go
pkg/config/health.go
pkg/health/readiness.go
cmd/hyperfleet-api/server/health_server.go
test/helper.go
pkg/config/config.go
pkg/config/metrics.go
pkg/health/handler.go

🧠 Learnings (19)

📓 Common learnings

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: go.mod:32-36
Timestamp: 2026-01-13T04:19:50.012Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, the team has decided to keep OpenTelemetry core packages (go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk, go.opentelemetry.io/otel/trace, go.opentelemetry.io/otel/exporters/stdout/stdouttrace) at v1.38.0 rather than upgrading to v1.39.0 in PR `#31`.

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go
docs/deployment.md
pkg/config/health.go
AGENTS.md
cmd/hyperfleet-api/server/health_server.go
test/helper.go
pkg/config/config.go
charts/templates/deployment.yaml
pkg/config/metrics.go
charts/templates/service.yaml
docs/hyperfleet-api.http

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go
docs/deployment.md
pkg/config/health.go
AGENTS.md
pkg/health/readiness.go
cmd/hyperfleet-api/server/health_server.go
test/helper.go
pkg/config/config.go
charts/templates/deployment.yaml
pkg/health/handler.go
docs/hyperfleet-api.http

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/dao/**/*.go : All DAO methods must accept context.Context as the first parameter for transaction propagation

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,db}/**/*.go : Use context values for passing request-scoped data (session, operation ID, user info) rather than function parameters or global state

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go
pkg/health/handler.go

📚 Learning: 2026-01-08T09:40:10.138Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-13T04:16:08.743Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go
pkg/config/health.go
pkg/health/readiness.go
cmd/hyperfleet-api/server/health_server.go
test/helper.go
pkg/config/config.go
pkg/config/metrics.go
pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration

Applied to files:

docs/deployment.md
AGENTS.md
pkg/config/metrics.go
pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/services/**/*.go : Status aggregation: Phase is 'Ready' if all adapters report Ready=True, 'Failed' if any adapter reports Ready=False, 'NotReady' otherwise

Applied to files:

pkg/health/readiness.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : HTTP handlers must follow the pattern: parse request → call service/DAO → handle errors → send response with appropriate HTTP status codes

Applied to files:

cmd/hyperfleet-api/server/health_server.go
pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,dao}/**/*.go : Errors must use structured ServiceError type from pkg/errors/ with HttpCode, Code, and Reason fields; send via errors.SendError(w, r, err)

Applied to files:

cmd/hyperfleet-api/server/health_server.go
pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Database transactions are automatically managed by TransactionMiddleware for all HTTP requests; store session in context and auto-commit on success, rollback on error

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept resource creation/update/delete requests via REST endpoints, validate, and persist to PostgreSQL

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept status updates from adapters via POST /{resourceType}/{id}/statuses and store in adapter_statuses table

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Serve resource data to Sentinel via GET /{resourceType} endpoints with pagination and filtering support

Applied to files:

pkg/health/handler.go

📚 Learning: 2026-01-13T10:10:34.485Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/telemetry/otel.go:10-10
Timestamp: 2026-01-13T10:10:34.485Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository (PR `#31`), the team has decided to keep the OpenTelemetry semantic conventions package at v1.37.0 (go.opentelemetry.io/otel/semconv/v1.37.0) while using the SDK at v1.38.0 in pkg/telemetry/otel.go.

Applied to files:

docs/hyperfleet-api.http

📚 Learning: 2026-01-13T04:19:50.012Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: go.mod:32-36
Timestamp: 2026-01-13T04:19:50.012Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, the team has decided to keep OpenTelemetry core packages (go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk, go.opentelemetry.io/otel/trace, go.opentelemetry.io/otel/exporters/stdout/stdouttrace) at v1.38.0 rather than upgrading to v1.39.0 in PR `#31`.

Applied to files:

docs/hyperfleet-api.http

🧬 Code graph analysis (4)

cmd/hyperfleet-api/servecmd/cmd.go (4)

cmd/hyperfleet-api/server/health_server.go (1)

NewHealthServer (17-36)

pkg/health/readiness.go (1)

GetReadinessState (20-28)

pkg/logger/logger.go (3)

Info (253-255)

WithError (284-289)

Error (265-267)

cmd/hyperfleet-api/environments/framework.go (1)

Environment (55-57)

test/helper.go (2)

cmd/hyperfleet-api/server/server.go (1)

Server (13-18)

cmd/hyperfleet-api/server/health_server.go (1)

NewHealthServer (17-36)

pkg/config/config.go (1)

pkg/config/health.go (2)

HealthConfig (9-13)

NewHealthConfig (15-21)

pkg/health/handler.go (2)

pkg/db/session.go (1)

SessionFactory (12-20)

pkg/health/readiness.go (1)

GetReadinessState (20-28)

🔇 Additional comments (25)

.claude/settings.local.json (1)

4-8: LGTM!

The added Bash permissions are reasonable for a local development workflow. They allow Claude to assist with common operations like linting (make lint), file listing (ls), and git staging/stashing—all consistent with the PR's development and testing requirements.

pkg/config/metrics.go (1)

15-21: Port consolidation correctly implemented and documented.

The metrics bind address change to port 9090 is properly implemented and consistently reflected across Helm charts, configuration files, and documentation. Health endpoints are correctly consolidated on port 8080. This is a breaking change for existing deployments, but the necessary updates to Kubernetes manifests, charts (via metricsBindAddress and healthBindAddress values), and documentation are already in place.

pkg/health/readiness.go (1)

1-59: LGTM! Well-implemented thread-safe readiness state.

The singleton pattern using sync.Once is idiomatic, and the RWMutex correctly guards concurrent access. The semantic choice to clear shuttingDown in SetReady() allows for recovery scenarios if needed.

pkg/config/health.go (2)

15-21: Verify that localhost binding is intentional for containerized deployments.

The default localhost:8080 restricts connections to the local interface only. While Kubernetes probes run within the same pod and can reach localhost, this may cause issues if external health checks or service mesh sidecars need access. Consider whether 0.0.0.0:8080 or :8080 is more appropriate for your deployment scenario.

23-27: LGTM! Flags are well-defined.

The flag names are clear and consistent with the health server configuration pattern.

pkg/health/handler.go (2)

22-28: LGTM! Simple and correct liveness implementation.

The liveness check correctly returns 200 OK unconditionally, which is appropriate—liveness should only fail if the process is deadlocked or crashed.

33-79: LGTM! Readiness check ordering is correct.

The check sequence (shutdown state → readiness flag → database connectivity) is well-designed. Checking shutdown state first ensures Kubernetes stops routing traffic immediately when shutdown begins, before waiting for DB checks.

The database ping on each readiness probe is appropriate for Kubernetes probe intervals (typically 10s+), ensuring the service only receives traffic when it can actually serve requests. Based on learnings, this aligns with the requirement that health checks verify database accessibility.

cmd/hyperfleet-api/server/health_server.go (1)

76-80: Good fix: Stop() now respects the shutdown timeout.

This properly addresses graceful shutdown by using the configurable shutdownTimeout, preventing indefinite hangs during connection draining.

pkg/config/config.go (2)

15-22: LGTM! Clean integration of HealthConfig.

The replacement of HealthCheck with Health is consistently applied across the struct definition and all wiring points.

24-32: LGTM!

Constructor properly initializes the new Health field.

docs/hyperfleet-api.http (2)

7-8: LGTM!

Port variables updated correctly to reflect the new architecture: health on 8080, metrics on 9090.

196-205: LGTM!

Health probe examples correctly updated to use /healthz and /readyz endpoints on the health port, with metrics on the dedicated metrics port.

docs/deployment.md (3)

81-82: LGTM!

Environment variable documentation correctly reflects the new port defaults.

275-278: LGTM!

Health checks documentation clearly describes the liveness, readiness, and metrics endpoints with their respective ports and behaviors.

297-297: LGTM!

Prometheus metrics URL correctly updated to port 9090.

charts/templates/deployment.yaml (2)

56-67: LGTM!

Server bind address flags and container ports correctly implement the three-port separation: API on 8000, health on 8080, metrics on 9090.

83-98: LGTM!

Probe configuration looks appropriate:

Liveness probe (/healthz): 15s initial delay with 20s period provides reasonable failure detection without excessive load.

Readiness probe (/readyz): 5s period enables quick traffic routing adjustments during startup and shutdown.

cmd/hyperfleet-api/servecmd/cmd.go (2)

96-102: LGTM!

Proper graceful shutdown sequence: marking the application as not ready before stopping servers ensures load balancers stop routing new traffic during the drain period.

116-117: LGTM!

Good addition of Teardown() to properly close database connections and other resources during graceful shutdown.
AGENTS.md (3)
347-350: LGTM! Flag documentation updated correctly.

The CLI flag documentation is updated consistently with the new naming convention (--health-server-bindaddress, --enable-health-https) and correctly reflects the new port mappings (health on 8080, metrics on 9090).

689-690: LGTM! Port documentation is accurate.

The port documentation correctly reflects the new architecture: health endpoints (/healthz, /readyz) on port 8080 and metrics (/metrics) on port 9090.

775-777: Inconsistent health endpoint documentation.

Line 775 still references GET /healthcheck which appears to be outdated given the new endpoints are /healthz (liveness) and /readyz (readiness) on port 8080. This should be updated for consistency with the rest of the documentation.
Suggested fix
-**Health Check**: `GET /healthcheck` returns 200 OK when database is accessible
+**Health Check**: Liveness at `GET /healthz` (port 8080) returns 200 OK; Readiness at `GET /readyz` returns 200 OK when database is accessible (503 during shutdown)
⛔ Skipped due to learnings
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration
test/helper.go (3)

63-63: LGTM! Field renamed consistently.

The field rename from HealthCheckServer to HealthServer aligns with the new API naming convention.

184-192: LGTM! Health server startup implementation is correct.

The method correctly uses server.NewHealthServer() matching the new API. The implementation follows the same pattern as startMetricsServer().

Minor note: The log messages say "health check server" (lines 188, 190) while the method/field names use "health server". This is a cosmetic inconsistency that could be cleaned up.

261-263: LGTM! URL helper updated correctly.

The HealthURL method correctly uses helper.AppConfig.Health.BindAddress to construct the URL, consistent with the new configuration structure.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

cmd/hyperfleet-api/servecmd/cmd.go

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

charts/templates/service.yaml (1)
14-25: Duplicate port name and stale port definition will cause Kubernetes Service validation failure.

There are two issues here:

Duplicate port names: Both ports 8083 (lines 14-17) and 8080 (lines 18-21) have name: health. Kubernetes requires unique port names within a Service.

Stale port: Port 8083 appears to be the legacy health check port that should have been removed. Based on the PR objectives, health endpoints are now on port 8080.

This will fail with a Kubernetes validation error similar to: spec.ports[2].name: Duplicate value: "health".
🐛 Proposed fix: Remove legacy port 8083
   - port: {{ .Values.service.port }}
     targetPort: http
     protocol: TCP
     name: http
-  - port: 8083
-    targetPort: health
-    protocol: TCP
-    name: health
   - port: 8080
     targetPort: health
     protocol: TCP
     name: health
   - port: 9090
     targetPort: metrics
     protocol: TCP
     name: metrics

♻️ Duplicate comments (1)

cmd/hyperfleet-api/server/health_server.go (1)

47-52: Empty Listen() and Serve() methods are inconsistent with interface contract.

These methods return no-op values while apiServer actually implements the Listen() → Serve() pattern. This is a Liskov Substitution Principle concern since healthServer cannot be used interchangeably with other Server implementations that expect these methods to work. Consider either implementing properly or defining a simpler interface for servers that only use Start()/Stop().

🧹 Nitpick comments (1)

cmd/hyperfleet-api/servecmd/cmd.go (1)
115-119: OTel shutdown still uses unbounded context.

Despite the past review indicating this was addressed, telemetry shutdown still uses context.Background() instead of a bounded context. If OTel exporters are unresponsive, this could hang indefinitely. Consider using a context with timeout for consistency with other shutdown operations.
♻️ Suggested fix
+	shutdownCtx, cancel := context.WithTimeout(context.Background(), env().Config.Health.ShutdownTimeout)
+	defer cancel()
+
 	if tp != nil {
-		if err := telemetry.Shutdown(context.Background(), tp); err != nil {
+		if err := telemetry.Shutdown(shutdownCtx, tp); err != nil {
 			logger.WithError(ctx, err).Error("Failed to shutdown OpenTelemetry")
 		}
 	}

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea69bfb and 1a03475.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (22)

.claude/settings.local.json
AGENTS.md
README.md
charts/templates/deployment.yaml
charts/templates/service.yaml
charts/values.yaml
cmd/hyperfleet-api/servecmd/cmd.go
cmd/hyperfleet-api/server/health_server.go
cmd/hyperfleet-api/server/healthcheck_server.go
cmd/hyperfleet-api/server/metrics_server.go
cmd/hyperfleet-api/server/server.go
docs/deployment.md
docs/development.md
docs/hyperfleet-api.http
go.mod
pkg/config/config.go
pkg/config/health.go
pkg/config/health_check.go
pkg/config/metrics.go
pkg/health/handler.go
pkg/health/readiness.go
test/helper.go

💤 Files with no reviewable changes (2)

cmd/hyperfleet-api/server/healthcheck_server.go
pkg/config/health_check.go

🚧 Files skipped from review as they are similar to previous changes (10)

docs/hyperfleet-api.http
cmd/hyperfleet-api/server/metrics_server.go
pkg/config/config.go
go.mod
pkg/health/readiness.go
test/helper.go
pkg/health/handler.go
pkg/config/metrics.go
.claude/settings.local.json
README.md

🧰 Additional context used

📓 Path-based instructions (1)

**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID

Files:

cmd/hyperfleet-api/server/server.go
cmd/hyperfleet-api/servecmd/cmd.go
pkg/config/health.go
cmd/hyperfleet-api/server/health_server.go

🧠 Learnings (13)

📓 Common learnings

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags

Applied to files:

cmd/hyperfleet-api/server/server.go
cmd/hyperfleet-api/servecmd/cmd.go
charts/values.yaml
pkg/config/health.go
cmd/hyperfleet-api/server/health_server.go
docs/development.md
docs/deployment.md
charts/templates/deployment.yaml
AGENTS.md
charts/templates/service.yaml

📚 Learning: 2026-01-13T04:16:08.743Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.

Applied to files:

cmd/hyperfleet-api/server/server.go
cmd/hyperfleet-api/servecmd/cmd.go
pkg/config/health.go
cmd/hyperfleet-api/server/health_server.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go
pkg/config/health.go
cmd/hyperfleet-api/server/health_server.go
docs/development.md
docs/deployment.md
charts/templates/deployment.yaml
AGENTS.md

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/dao/**/*.go : All DAO methods must accept context.Context as the first parameter for transaction propagation

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,db}/**/*.go : Use context values for passing request-scoped data (session, operation ID, user info) rather than function parameters or global state

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/services/**/*.go : Status aggregation: Phase is 'Ready' if all adapters report Ready=True, 'Failed' if any adapter reports Ready=False, 'NotReady' otherwise

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T09:40:10.138Z

Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment

Applied to files:

cmd/hyperfleet-api/servecmd/cmd.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : HTTP handlers must follow the pattern: parse request → call service/DAO → handle errors → send response with appropriate HTTP status codes

Applied to files:

cmd/hyperfleet-api/server/health_server.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,dao}/**/*.go : Errors must use structured ServiceError type from pkg/errors/ with HttpCode, Code, and Reason fields; send via errors.SendError(w, r, err)

Applied to files:

cmd/hyperfleet-api/server/health_server.go

📚 Learning: 2026-01-08T01:23:20.647Z

Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration

Applied to files:

docs/deployment.md
AGENTS.md

🧬 Code graph analysis (1)

cmd/hyperfleet-api/server/health_server.go (4)

cmd/hyperfleet-api/server/server.go (1)

Server (13-18)

pkg/health/handler.go (2)

NewHandler (16-20)

Handler (11-13)

cmd/hyperfleet-api/environments/types.go (1)

Database (36-38)

pkg/logger/fields.go (1)

FieldBindAddress (13-13)

🔇 Additional comments (17)

docs/development.md (1)

98-100: LGTM!

The documentation correctly reflects the new health and metrics endpoint configuration, which aligns with the Kubernetes probe standards (/healthz for liveness, /readyz for readiness) and the updated port mappings in the Helm values.

cmd/hyperfleet-api/server/server.go (1)

20-24: LGTM!

Clean interface design following Go's small-interface idiom. The optional ListenNotifier pattern allows servers to signal readiness without forcing all Server implementations to implement it, supporting the graceful startup/shutdown mechanism.

charts/values.yaml (1)

20-21: LGTM!

The port remapping correctly consolidates health endpoints on port 8080 and moves metrics to port 9090, aligning with the PR objectives and documentation updates.

docs/deployment.md (3)

81-82: LGTM!

Environment variable documentation correctly reflects the updated port defaults.

275-278: LGTM!

Clear documentation of the health check behavior, properly distinguishing between liveness (/healthz - process alive) and readiness (/readyz - ready for traffic, 503 during startup/shutdown) probes.

297-297: LGTM!

Prometheus metrics URL correctly updated to reflect the new port 9090.

charts/templates/deployment.yaml (2)

56-66: Port configuration aligns with architecture standards.

The three-port layout (API 8000, health 8080, metrics 9090) is correctly implemented. The container ports and bind addresses are consistent.

83-98: Probe configuration looks appropriate.

The liveness probe (/healthz) and readiness probe (/readyz) paths are correctly updated. The timing changes (liveness: 15s initial/20s period, readiness: 5s initial/5s period with 3s timeout) provide reasonable startup tolerance while enabling quick failure detection.

cmd/hyperfleet-api/servecmd/cmd.go (3)

83-93: Readiness race condition properly addressed.

The implementation correctly waits for NotifyListening() before calling SetReady(), ensuring the health server is actually accepting connections before marking the application as ready. This addresses the previous race condition concern.

101-107: Shutdown sequence follows proper order.

The shutdown flow correctly marks the service as not ready first (returning 503 on /readyz), allowing load balancers to drain traffic before stopping servers.

121-122: Good addition of environment teardown.

Adding Environment().Teardown() ensures database connections and other resources are properly closed during shutdown.

pkg/config/health.go (1)

9-31: Health configuration looks good.

The struct, defaults, and flag bindings are well-structured. The default bind address (:8080) and shutdown timeout (20s) align with the PR objectives and architecture standards.

AGENTS.md (2)

347-350: Documentation correctly updated for new health/metrics configuration.

Flag names and default ports are accurately documented to reflect the implementation changes.

689-690: Port documentation is accurate.

Health endpoints on 8080 (/healthz, /readyz) and metrics on 9090 (/metrics) correctly documented.

cmd/hyperfleet-api/server/health_server.go (3)

17-37: Health server construction is well-structured.

The server correctly wires health endpoints (/healthz, /readyz), configures shutdown timeout from config, and initializes the listening channel for readiness signaling.

54-86: Start() implementation with listener notification is well-designed.

The pattern of creating the listener first, signaling via channel close, then serving ensures reliable readiness detection. Error handling for both TLS and non-TLS modes is appropriate.

93-97: Stop() correctly implements bounded shutdown.

Using context.WithTimeout with the configured shutdown timeout ensures graceful shutdown won't hang indefinitely.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

…utdown - Separate health endpoints (8080) from metrics (9090) - Implement /healthz (liveness) and /readyz (readiness) endpoints - Add graceful shutdown with SIGTERM/SIGINT signal handling - Add ReadinessState for coordinated shutdown across components - Update Helm charts with new port configuration - Align with HyperFleet architecture standards

rh-amarin · 2026-01-14T18:22:15Z

/lgtm

openshift-ci · 2026-01-14T18:22:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rh-amarin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [rh-amarin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from 86254860 and ciaranRoche January 13, 2026 19:41

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

cmd/hyperfleet-api/servecmd/cmd.go Outdated Show resolved Hide resolved

rafabene force-pushed the HYPERFLEET-453-454 branch from 5acbbe9 to f7a96ea Compare January 13, 2026 19:48

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

cmd/hyperfleet-api/servecmd/cmd.go Outdated Show resolved Hide resolved

rafabene force-pushed the HYPERFLEET-453-454 branch from f7a96ea to 28df995 Compare January 13, 2026 19:55

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

yasun1 reviewed Jan 14, 2026

View reviewed changes

charts/templates/deployment.yaml Outdated Show resolved Hide resolved

yasun1 reviewed Jan 14, 2026

View reviewed changes

pkg/health/handlers.go Outdated Show resolved Hide resolved

rafabene force-pushed the HYPERFLEET-453-454 branch 2 times, most recently from b421a83 to dd171ef Compare January 14, 2026 17:16

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

cmd/hyperfleet-api/server/health_server.go Outdated Show resolved Hide resolved

cmd/hyperfleet-api/server/health_server.go Show resolved Hide resolved

rafabene force-pushed the HYPERFLEET-453-454 branch from dd171ef to ea69bfb Compare January 14, 2026 17:19

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

cmd/hyperfleet-api/servecmd/cmd.go Show resolved Hide resolved

rafabene force-pushed the HYPERFLEET-453-454 branch from ea69bfb to 1a03475 Compare January 14, 2026 17:33

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

rafabene force-pushed the HYPERFLEET-453-454 branch from 1a03475 to bff6243 Compare January 14, 2026 17:50

openshift-ci bot assigned rh-amarin Jan 14, 2026

openshift-ci bot added the lgtm label Jan 14, 2026

openshift-ci bot added the approved label Jan 14, 2026

openshift-merge-bot bot merged commit 4f4c655 into openshift-hyperfleet:main Jan 14, 2026
8 checks passed

HYPERFLEET-453, HYPERFLEET-454 - Implement health endpoints and graceful shutdown #34

HYPERFLEET-453, HYPERFLEET-454 - Implement health endpoints and graceful shutdown #34

Uh oh!

Conversation

rafabene commented Jan 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rh-amarin commented Jan 14, 2026

Uh oh!

openshift-ci bot commented Jan 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rafabene commented Jan 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 13, 2026 •

edited

Loading