-
Notifications
You must be signed in to change notification settings - Fork 8
HYPERFLEET-453, HYPERFLEET-454 - Implement health endpoints and graceful shutdown #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HYPERFLEET-453, HYPERFLEET-454 - Implement health endpoints and graceful shutdown #34
Conversation
WalkthroughThis PR introduces a new health server exposing liveness ( Sequence Diagram(s)sequenceDiagram
participant App as Application
participant HS as HealthServer
participant RS as ReadinessState
participant H as HealthHandler
participant DB as Database
App->>HS: Start()
HS->>HS: Start HTTP listener (:8080)
App->>RS: SetReady()
RS->>RS: ready=true, shuttingDown=false
Client->>HS: GET /healthz
HS->>H: LivenessHandler()
H-->>Client: 200 OK {"status":"ok"}
Client->>HS: GET /readyz
HS->>H: ReadinessHandler()
H->>RS: IsShuttingDown(), IsReady()
alt Ready and not shutting down
H->>DB: Ping() (if configured)
DB-->>H: OK
H-->>Client: 200 OK {"status":"ok"}
else Not ready or DB failure or shutting down
H-->>Client: 503 Service Unavailable {"reason": "..."}
end
sequenceDiagram
participant App as Application
participant RS as ReadinessState
participant HS as HealthServer
participant AS as APIServer
participant MS as MetricsServer
participant Env as Environment
App->>RS: SetShuttingDown()
RS->>RS: shuttingDown=true
App->>HS: Stop()
HS->>HS: Graceful shutdown (respect timeout)
App->>AS: Stop()
AS->>AS: Graceful shutdown
App->>MS: Stop()
MS->>MS: Graceful shutdown
App->>Env: Teardown()
Env-->>App: Resources closed
Note over App: Shutdown complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (22)
💤 Files with no reviewable changes (2)
🚧 Files skipped from review as they are similar to previous changes (8)
🧰 Additional context used📓 Path-based instructions (1)**/*.go📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (19)📓 Common learnings📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-13T04:16:08.743ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-08T09:40:10.138ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-13T04:19:50.012ZApplied to files:
📚 Learning: 2026-01-13T10:10:34.485ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
📚 Learning: 2026-01-08T01:23:20.647ZApplied to files:
🧬 Code graph analysis (1)pkg/health/handler.go (2)
🔇 Additional comments (29)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/helper.go (1)
109-116: Metrics server not stopped during teardown.
startMetricsServer()is called at line 116, butstopMetricsServeris not added to theteardownsslice. This means the metrics server will continue running after test teardown, potentially causing resource leaks or port conflicts in subsequent test runs.🐛 Proposed fix
helper.teardowns = []func() error{ helper.CleanDB, jwkMockTeardown, helper.stopAPIServer, + helper.stopMetricsServer, helper.teardownEnv, } helper.startAPIServer() helper.startMetricsServer()
🤖 Fix all issues with AI agents
In @cmd/hyperfleet-api/servecmd/cmd.go:
- Around line 116-121: The telemetry shutdown uses context.Background() which
ignores the overall shutdown timeout; change the call to use shutdownCtx (the
cancellation context that enforces the shutdown deadline) so
telemetry.Shutdown(shutdownCtx, tp) is invoked (keep the tp nil check) and
retain the existing logging (logger.WithError(ctx, err).Error(...)) so errors
are reported if the shutdown is cancelled or fails.
🧹 Nitpick comments (8)
pkg/config/shutdown.go (1)
33-41: Consider logging a warning for invalidSHUTDOWN_TIMEOUTvalues.When
time.ParseDurationfails, the invalid value is silently ignored. This could make debugging configuration issues harder if an operator sets an invalid value likeSHUTDOWN_TIMEOUT=20(missing unit suffix).♻️ Optional improvement to add a warning log
+import ( + "log/slog" + "os" + "time" + + "github.com/spf13/pflag" +) // BindEnv binds environment variables to config values // Environment variables take precedence over flags func (s *ShutdownConfig) BindEnv() { if val := os.Getenv("SHUTDOWN_TIMEOUT"); val != "" { if d, err := time.ParseDuration(val); err == nil { s.Timeout = d + } else { + slog.Warn("invalid SHUTDOWN_TIMEOUT value, using default", + "value", val, + "default", s.Timeout, + "error", err) } } }pkg/config/shutdown_test.go (1)
10-20: Minor redundancy in default value assertions.Lines 13-15 and 17-19 both verify the same default timeout value. The second check (explicit
20*time.Second) is redundant sinceDefaultShutdownTimeoutis defined as20 * time.Second.♻️ Optional: Remove redundant assertion
func TestNewShutdownConfig_Defaults(t *testing.T) { cfg := NewShutdownConfig() if cfg.Timeout != DefaultShutdownTimeout { t.Errorf("expected default timeout %v, got %v", DefaultShutdownTimeout, cfg.Timeout) } - - if cfg.Timeout != 20*time.Second { - t.Errorf("expected timeout 20s, got %v", cfg.Timeout) - } }pkg/health/readiness_test.go (1)
93-100: Consider adding test isolation for the singleton test.This test accesses the global singleton but doesn't reset its state. While it only performs read operations, if the singleton is modified by previous tests (e.g., in parallel execution), it could still pass but leave the global state in an unexpected condition. Consider calling
Reset()at the end for consistency with other tests.♻️ Suggested improvement
func TestGetReadinessState_ReturnsSingleton(t *testing.T) { state1 := GetReadinessState() state2 := GetReadinessState() if state1 != state2 { t.Error("expected GetReadinessState() to return the same instance") } + + // Reset state for other tests + state1.Reset() }pkg/health/handlers_test.go (1)
134-168: Consider adding a test for database connection failure.The
TestReadinessHandler_IncludesCheckstest uses anilsessionFactory, which causes the handler to reportdatabase: "ok"via the fallback path. Consider adding a test with a mockSessionFactorythat returns an error fromCheckConnection()to verify thedatabase: "error"path and ensure 503 is returned when the database is unreachable.pkg/health/handlers.go (2)
10-11: Unused type definition.
CheckFuncis defined but not used anywhere in this file or the tests. If this is intended for future extensibility (e.g., registering custom health checks), consider adding a TODO comment or removing it until needed.
67-77: Consider reporting "skipped" instead of "ok" when sessionFactory is nil.When
sessionFactoryis nil, the handler reportsdatabase: "ok", which could mask a misconfiguration where the database dependency wasn't properly injected. Reporting"skipped"or"not_configured"would provide clearer diagnostics.♻️ Suggested improvement
// Check database connection if h.sessionFactory != nil { if err := h.sessionFactory.CheckConnection(); err != nil { checks["database"] = "error" allOK = false } else { checks["database"] = "ok" } } else { - checks["database"] = "ok" + checks["database"] = "skipped" }pkg/health/readiness.go (2)
8-12: Consider renamingshuttingtoshuttingDownfor clarity.The field name
shuttingis less descriptive than the methodIsShuttingDown()andSetShuttingDown(). Aligning the field name would improve code readability.Suggested improvement
type ReadinessState struct { mu sync.RWMutex ready bool - shutting bool + shuttingDown bool }Then update references in methods accordingly.
62-68: Consider adding documentation about production usage.The
Reset()method is correctly implemented for testing purposes. Consider strengthening the documentation to warn against calling this in production code, as resetting the shutdown state mid-flight could cause unexpected behavior.Suggested documentation improvement
-// Reset resets the readiness state (useful for testing) +// Reset resets the readiness state to initial values. +// WARNING: This method is intended for testing only. Do not call in production +// as it may cause inconsistent state during graceful shutdown. func (r *ReadinessState) Reset() {
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
AGENTS.mdREADME.mdcharts/templates/deployment.yamlcharts/templates/service.yamlcharts/values.yamlcmd/hyperfleet-api/servecmd/cmd.gocmd/hyperfleet-api/server/healthcheck_server.gocmd/hyperfleet-api/server/logging/request_logging_middleware.gocmd/hyperfleet-api/server/metrics_server.godocs/deployment.mddocs/development.mddocs/hyperfleet-api.httppkg/config/config.gopkg/config/health_check.gopkg/config/shutdown.gopkg/config/shutdown_test.gopkg/health/handlers.gopkg/health/handlers_test.gopkg/health/readiness.gopkg/health/readiness_test.gotest/helper.go
💤 Files with no reviewable changes (5)
- charts/values.yaml
- pkg/config/health_check.go
- cmd/hyperfleet-api/server/logging/request_logging_middleware.go
- charts/templates/service.yaml
- cmd/hyperfleet-api/server/healthcheck_server.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID
Files:
pkg/health/handlers_test.gopkg/config/shutdown.gopkg/health/readiness.gocmd/hyperfleet-api/server/metrics_server.gocmd/hyperfleet-api/servecmd/cmd.gopkg/health/readiness_test.gotest/helper.gopkg/health/handlers.gopkg/config/config.gopkg/config/shutdown_test.go
🧠 Learnings (17)
📓 Common learnings
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags
Applied to files:
docs/development.mddocs/deployment.mdcmd/hyperfleet-api/server/metrics_server.gocmd/hyperfleet-api/servecmd/cmd.gocharts/templates/deployment.yamlREADME.mdtest/helper.goAGENTS.mddocs/hyperfleet-api.http
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
Applied to files:
docs/development.mdpkg/health/handlers_test.godocs/deployment.mdpkg/health/readiness.gocmd/hyperfleet-api/server/metrics_server.gocmd/hyperfleet-api/servecmd/cmd.gocharts/templates/deployment.yamlpkg/health/readiness_test.goREADME.mdtest/helper.gopkg/health/handlers.goAGENTS.mddocs/hyperfleet-api.http
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : HTTP handlers must follow the pattern: parse request → call service/DAO → handle errors → send response with appropriate HTTP status codes
Applied to files:
pkg/health/handlers_test.gocmd/hyperfleet-api/server/metrics_server.gopkg/health/handlers.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration
Applied to files:
pkg/health/handlers_test.gocmd/hyperfleet-api/server/metrics_server.gopkg/health/handlers.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept resource creation/update/delete requests via REST endpoints, validate, and persist to PostgreSQL
Applied to files:
pkg/health/handlers_test.gocmd/hyperfleet-api/server/metrics_server.gopkg/health/handlers.go
📚 Learning: 2026-01-13T04:16:08.743Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.
Applied to files:
pkg/health/handlers_test.gopkg/config/shutdown.gopkg/health/readiness.gocmd/hyperfleet-api/server/metrics_server.gocmd/hyperfleet-api/servecmd/cmd.gopkg/health/readiness_test.gotest/helper.gopkg/health/handlers.gopkg/config/config.gopkg/config/shutdown_test.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/services/**/*.go : Status aggregation: Phase is 'Ready' if all adapters report Ready=True, 'Failed' if any adapter reports Ready=False, 'NotReady' otherwise
Applied to files:
pkg/health/readiness.gopkg/health/readiness_test.go
📚 Learning: 2026-01-08T09:40:10.138Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.
Applied to files:
cmd/hyperfleet-api/server/metrics_server.gocmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment
Applied to files:
cmd/hyperfleet-api/server/metrics_server.gocmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations
Applied to files:
cmd/hyperfleet-api/server/metrics_server.gocmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,dao}/**/*.go : Errors must use structured ServiceError type from pkg/errors/ with HttpCode, Code, and Reason fields; send via errors.SendError(w, r, err)
Applied to files:
cmd/hyperfleet-api/server/metrics_server.gopkg/health/handlers.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Serve resource data to Sentinel via GET /{resourceType} endpoints with pagination and filtering support
Applied to files:
cmd/hyperfleet-api/server/metrics_server.gopkg/health/handlers.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/environments/**/*.go : Environment-based configuration: use OCM_ENV variable to select development, unit_testing, integration_testing, or production configuration
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,db}/**/*.go : Use context values for passing request-scoped data (session, operation ID, user info) rather than function parameters or global state
Applied to files:
pkg/health/handlers.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Database transactions are automatically managed by TransactionMiddleware for all HTTP requests; store session in context and auto-commit on success, rollback on error
Applied to files:
pkg/health/handlers.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept status updates from adapters via POST /{resourceType}/{id}/statuses and store in adapter_statuses table
Applied to files:
pkg/health/handlers.go
🧬 Code graph analysis (6)
cmd/hyperfleet-api/server/metrics_server.go (3)
pkg/health/handlers.go (1)
NewHandler(19-23)cmd/hyperfleet-api/environments/types.go (1)
Database(36-38)pkg/db/session.go (1)
SessionFactory(12-20)
pkg/health/readiness_test.go (1)
pkg/health/readiness.go (2)
ReadinessState(8-12)GetReadinessState(21-23)
test/helper.go (5)
pkg/db/session.go (1)
SessionFactory(12-20)pkg/config/config.go (1)
ApplicationConfig(15-22)cmd/hyperfleet-api/environments/types.go (1)
ApplicationConfig(32-34)cmd/hyperfleet-api/server/server.go (1)
Server(13-18)test/factories/factory.go (1)
Factories(9-10)
pkg/health/handlers.go (2)
pkg/db/session.go (1)
SessionFactory(12-20)pkg/health/readiness.go (1)
GetReadinessState(21-23)
pkg/config/config.go (1)
pkg/config/shutdown.go (2)
ShutdownConfig(15-17)NewShutdownConfig(19-23)
pkg/config/shutdown_test.go (1)
pkg/config/shutdown.go (2)
NewShutdownConfig(19-23)DefaultShutdownTimeout(12-12)
🔇 Additional comments (37)
test/helper.go (2)
56-68: LGTM!The
Helperstruct field formatting is clean. The removal ofHealthCheckServerand related fields aligns with the PR objective to consolidate health endpoints onto the metrics server.
164-179: LGTM!The
startMetricsServerandstopMetricsServermethods follow the same pattern as the API server methods, which is consistent and maintainable.AGENTS.md (3)
347-352: LGTM!Documentation accurately reflects the consolidation of health endpoints onto the metrics server and the new graceful shutdown configuration. The
shutdown-timeoutflag with environment variable binding (SHUTDOWN_TIMEOUT) is properly documented.
690-690: LGTM!Port documentation correctly updated to reflect the consolidated health and metrics endpoints on port 8080. Based on learnings, this updates the previous port 8083 health check configuration.
775-778: LGTM!Health endpoints documentation correctly describes
/healthzfor liveness and/readyzfor readiness probes. This aligns with Kubernetes conventions and the PR objectives.pkg/config/shutdown.go (2)
10-23: LGTM!The default shutdown timeout of 20 seconds is a reasonable value that allows for connection draining while not delaying shutdown too long. The struct and constructor follow the established patterns in the codebase.
25-27: LGTM!Flag registration follows the same pattern as other config types in the codebase.
pkg/config/config.go (4)
15-22: LGTM!The
ApplicationConfigstruct correctly replacesHealthCheckwithShutdown, aligning with the PR's consolidation of health endpoints into the metrics server and addition of graceful shutdown configuration.
24-33: LGTM!Constructor properly initializes all config components including the new
ShutdownConfig.
35-43: LGTM!
AddFlagscorrectly includesShutdown.AddFlags, ensuring the--shutdown-timeoutflag is registered.
45-55: Intentional omission:Shutdown.ReadFilesnot called.Note that
Shutdown.ReadFiles()is not included in thereadFilesslice. This is acceptable sinceShutdownConfig.ReadFiles()currently returnsnil(no file-based configuration). If file-based shutdown configuration is added later, remember to wire it here.pkg/config/shutdown_test.go (3)
22-52: LGTM!Flag registration and parsing tests are thorough, verifying both the default value and custom value parsing via command-line arguments.
54-99: LGTM!Excellent table-driven test coverage for
BindEnv. The test cases cover valid durations (seconds and minutes), invalid duration strings, and empty values. Usingt.Setenvensures proper test isolation.
101-108: LGTM!Simple but necessary test ensuring
ReadFilesdoesn't return an error, which is important for interface compliance even though it's currently a no-op.docs/development.md (1)
98-100: Documentation correctly updated to reflect consolidated health endpoints.The URLs now accurately point to the new
/healthzand/readyzendpoints on port 8080, consistent with the code changes that consolidate health checks onto the metrics server.README.md (1)
83-85: README correctly documents the new health probe endpoints.The liveness and readiness probe URLs are accurate and consistent with the implementation in
metrics_server.goand other documentation files.charts/templates/deployment.yaml (1)
79-94: Kubernetes probes correctly configured for consolidated health endpoints.The liveness (
/healthz) and readiness (/readyz) probes are properly configured to use themetricsport. The timing parameters are reasonable:
- Liveness:
initialDelaySeconds=15,periodSeconds=20— avoids premature pod restarts during startup- Readiness:
initialDelaySeconds=5,periodSeconds=10— enables quick traffic routing once readyThis configuration aligns with the PR's goal of consolidating health and metrics endpoints on port 8080.
docs/deployment.md (2)
81-84: Environment variable documentation accurately reflects new configuration.The
METRICS_PORTdescription correctly notes it now serves both metrics and health endpoints. The newSHUTDOWN_TIMEOUTvariable is properly documented with its default value of20s.
277-279: Health check documentation clearly describes probe behavior.The descriptions accurately differentiate liveness (process alive check) from readiness (traffic acceptance check with 503 during startup/shutdown), consistent with the implementation.
cmd/hyperfleet-api/server/metrics_server.go (1)
25-28: Health endpoints correctly integrated into metrics server.Both
LivenessHandlerandReadinessHandlermethods exist on theHandlertype with proper signatures for use as HTTP handlers. The handler is properly instantiated with theSessionFactorydependency, and both/healthz(liveness) and/readyz(readiness) endpoints are correctly registered with GET method restrictions. TheReadinessHandlerincludes database connectivity checks viasessionFactory.CheckConnection(), ensuring the endpoint accurately reflects system readiness status.docs/hyperfleet-api.http (1)
195-200: LGTM!The health endpoint documentation correctly reflects the consolidation to the metrics port (8080) with standard Kubernetes probe naming conventions (
/healthzfor liveness,/readyzfor readiness). This aligns with the PR objective and replaces the previous/healthcheckendpoint. Based on learnings, the health check was previously on port 8083 and is now properly consolidated.pkg/health/readiness_test.go (3)
7-17: LGTM!Initial state test correctly validates the zero-value behavior of
ReadinessState.
19-57: LGTM!State transition tests (
SetReady,SetNotReady,SetShuttingDown) comprehensively verify the state machine behavior, including the important invariant thatIsReady()returns false when shutting down.
59-73: LGTM!
Reset()test properly verifies both flags are cleared.pkg/health/handlers_test.go (4)
10-33: LGTM!Liveness handler test correctly validates HTTP 200 response with JSON body containing
status: "ok".
35-66: LGTM!Readiness handler test properly validates the happy path with state reset and cleanup.
68-132: LGTM!Error path tests comprehensively cover both "not ready" and "shutting down" scenarios, including the shutdown message verification.
170-209: LGTM!Content-Type header tests for both handlers ensure proper JSON response formatting.
cmd/hyperfleet-api/servecmd/cmd.go (3)
47-54: LGTM!Shutdown configuration binding and logging are properly implemented, providing visibility into the configured timeout.
83-90: LGTM!The readiness signaling and signal handling are well-implemented. Using
signal.NotifyContextis the idiomatic Go approach for graceful shutdown, and marking the application ready after servers start ensures probes reflect actual availability.
128-135: LGTM!The shutdown completion/timeout handling with select is correct. The forced exit on timeout ensures the process doesn't hang indefinitely.
pkg/health/handlers.go (2)
32-43: LGTM!Liveness handler correctly implements a minimal process-alive check that always returns 200 OK. This is appropriate for Kubernetes liveness probes, which should only fail if the process is unhealthy and needs to be restarted.
79-103: LGTM!The readiness check logic correctly aggregates multiple health checks and returns appropriate status codes. The check ordering (shutdown → database → ready flag) ensures early exit for shutdown scenarios.
pkg/health/readiness.go (4)
14-23: LGTM!The singleton pattern is appropriate for application-wide readiness state. Package-level initialization is thread-safe in Go, and the
Reset()method addresses testability concerns.
25-37: LGTM!Thread-safe setters with proper mutex locking. The implementation is correct and straightforward.
39-46: LGTM!Atomically setting both
shuttingandreadyunder a single lock ensures consistent state during shutdown transitions. Well documented.
48-60: LGTM!Correct use of
RLockfor concurrent read access. The!r.shuttingcheck inIsReady()provides defensive protection against incorrect state transitions, ensuring readiness is never reported during shutdown even ifSetReady()were mistakenly called afterSetShuttingDown().
5acbbe9 to
f7a96ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @cmd/hyperfleet-api/servecmd/cmd.go:
- Around line 101-126: The Stop calls are using unbounded contexts — update the
shutdown flow so apiServer.Stop and metricsServer.Stop accept a context (or at
least use the existing shutdownCtx) and use it when calling
http.Server.Shutdown; specifically change the Stop signatures to Stop(ctx
context.Context) (or add an overload) and invoke apiServer.Stop(shutdownCtx) and
metricsServer.Stop(shutdownCtx) from this goroutine (or, if you cannot change
signatures, have each Stop create a derived context with the same timeout as
shutdownCtx and use that for http.Server.Shutdown) so that the per-server
shutdown honors the timeout the caller created (shutdownCtx) just like
telemetry.Shutdown(shutdownCtx, tp).
🧹 Nitpick comments (1)
pkg/config/shutdown.go (1)
33-41: Silent error on invalidSHUTDOWN_TIMEOUTmay confuse operators.When
time.ParseDurationfails, the error is silently ignored, and the timeout falls back to the default. This could mask misconfiguration (e.g.,SHUTDOWN_TIMEOUT=20without a unit).Consider logging a warning when parsing fails so operators are aware their configuration was not applied.
🔧 Suggested improvement
func (s *ShutdownConfig) BindEnv() { if val := os.Getenv("SHUTDOWN_TIMEOUT"); val != "" { if d, err := time.ParseDuration(val); err == nil { s.Timeout = d + } else { + // Log warning - requires importing logger or returning error + // Consider: log.Printf("warning: invalid SHUTDOWN_TIMEOUT %q, using default: %v", val, err) } } }Alternatively, change the signature to
BindEnv() errorto propagate the error to the caller, consistent with how other config validation might be handled.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
AGENTS.mdcmd/hyperfleet-api/servecmd/cmd.godocs/deployment.mdpkg/config/config.gopkg/config/shutdown.gopkg/config/shutdown_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/config/shutdown_test.go
- AGENTS.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID
Files:
cmd/hyperfleet-api/servecmd/cmd.gopkg/config/shutdown.gopkg/config/config.go
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.godocs/deployment.md
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.godocs/deployment.md
📚 Learning: 2026-01-08T09:40:10.138Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/environments/**/*.go : Environment-based configuration: use OCM_ENV variable to select development, unit_testing, integration_testing, or production configuration
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-13T04:16:08.743Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.gopkg/config/shutdown.gopkg/config/config.go
🧬 Code graph analysis (2)
cmd/hyperfleet-api/servecmd/cmd.go (4)
cmd/hyperfleet-api/environments/framework.go (1)
Environment(55-57)pkg/telemetry/otel.go (1)
Shutdown(67-72)pkg/logger/logger.go (4)
With(276-281)Info(253-255)WithError(284-289)Error(265-267)pkg/health/readiness.go (1)
GetReadinessState(21-23)
pkg/config/config.go (1)
pkg/config/shutdown.go (2)
ShutdownConfig(15-17)NewShutdownConfig(19-23)
🔇 Additional comments (11)
pkg/config/shutdown.go (2)
1-13: LGTM!Package setup and constant definition are clean and well-documented. The 20-second default is a reasonable timeout for graceful shutdown.
15-27: LGTM!The struct definition, constructor, and flag wiring follow the established patterns used by other config types in this package (e.g.,
ServerConfig,MetricsConfig).pkg/config/config.go (3)
15-22: LGTM!The
Shutdownfield is properly added toApplicationConfig, replacing the removedHealthCheckfield. The JSON tag follows the snake_case convention used by other fields.
24-43: LGTM!Constructor and flag wiring are correctly updated to include
ShutdownConfiginitialization and flag binding, consistent with other config subsystems.
45-64:Shutdown.ReadFilesintentionally omitted fromReadFiles.
ShutdownConfig.ReadFiles()currently returnsnilwith no file-based configuration. Omitting it from this list is reasonable, but if file-based shutdown config is added later, remember to include it here.cmd/hyperfleet-api/servecmd/cmd.go (4)
16-16: LGTM!Health package import added for readiness state management.
47-54: LGTM!Environment binding and structured logging of the shutdown timeout follow the codebase conventions. Based on learnings, this correctly uses
logger.With(ctx, ...)for structured logging.
83-90: LGTM!The readiness state is correctly set after servers start, and
signal.NotifyContextis the idiomatic Go pattern for signal handling with context cancellation.
128-135: LGTM!The select pattern correctly handles both graceful completion and timeout scenarios. Using
os.Exit(1)on timeout ensures the process terminates even if shutdown hangs.docs/deployment.md (2)
81-84: LGTM!Documentation correctly reflects the updated port configuration and new
SHUTDOWN_TIMEOUTenvironment variable with its default value.
277-279: LGTM!Health check documentation accurately describes the consolidated endpoints on port 8080, matching the implementation. The distinction between liveness (
/healthz) and readiness (/readyz) behavior is clearly explained. Based on learnings, this updates the previous documentation that referenced/healthcheckon a separate port.
f7a96ea to
28df995
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
test/helper.go (1)
109-116: Metrics server not included in teardowns.The metrics server is started at line 116 but
stopMetricsServeris not added to theteardownsslice. This leaves the metrics server goroutine running and port 8080 bound after tests complete.🔧 Suggested fix
helper.teardowns = []func() error{ helper.CleanDB, jwkMockTeardown, helper.stopAPIServer, + helper.stopMetricsServer, helper.teardownEnv, }pkg/config/config.go (1)
45-64: AddShutdown.ReadFiles()to theApplicationConfig.ReadFiles()method for consistency.
Shutdown.ReadFiles()is missing from the configuration loading inReadFiles(), even though all other config types (Server, Database, OCM, Metrics, Logging) are included. For consistency, it should be added to the list, similar to Server, Metrics, and Logging which also returnnil.cmd/hyperfleet-api/server/api_server.go (1)
140-155: Remove database close from Start() method; it conflicts with graceful shutdown orchestration.The
Start()method closes the database connection afterServe()returns (lines 152-154). During graceful shutdown,apiServer.Stop(ctx)is called fromcmd.go:109, which triggers HTTP server shutdown and causesServe()to return. This immediately closes the database. Subsequently,cmd.go:124callsenvironments.Environment().Teardown(), which closes the database again. This results in a double-close of the SessionFactory during every graceful shutdown, violating the code's own comment that this must be called "exactly once during an app's lifetime."Move database cleanup responsibility entirely to the shutdown orchestrator in
cmd.go(already callingTeardown()at line 124) and remove the redundant close from theStart()method.
🧹 Nitpick comments (1)
pkg/config/shutdown.go (1)
33-41: Silent failure on invalidSHUTDOWN_TIMEOUTvalue.If the environment variable contains an invalid duration string, the parse error is silently ignored and the default/flag value is retained. This could lead to unexpected behavior if an operator misconfigures the timeout (e.g.,
SHUTDOWN_TIMEOUT=20without a unit).Consider logging a warning when parsing fails so operators are aware of the misconfiguration.
♻️ Suggested improvement
+import ( + "context" + "os" + "time" + + "github.com/openshift-hyperfleet/hyperfleet-api/pkg/logger" + "github.com/spf13/pflag" +) func (s *ShutdownConfig) BindEnv() { if val := os.Getenv("SHUTDOWN_TIMEOUT"); val != "" { if d, err := time.ParseDuration(val); err == nil { s.Timeout = d + } else { + logger.Warn(context.Background(), "Invalid SHUTDOWN_TIMEOUT value, using default", "value", val, "error", err) } } }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
AGENTS.mdcmd/hyperfleet-api/servecmd/cmd.gocmd/hyperfleet-api/server/api_server.gocmd/hyperfleet-api/server/metrics_server.gocmd/hyperfleet-api/server/server.godocs/deployment.mdpkg/config/config.gopkg/config/shutdown.gopkg/config/shutdown_test.gotest/helper.go
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/config/shutdown_test.go
- cmd/hyperfleet-api/servecmd/cmd.go
- docs/deployment.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID
Files:
cmd/hyperfleet-api/server/server.gocmd/hyperfleet-api/server/api_server.gotest/helper.gopkg/config/shutdown.gocmd/hyperfleet-api/server/metrics_server.gopkg/config/config.go
🧠 Learnings (12)
📓 Common learnings
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags
Applied to files:
cmd/hyperfleet-api/server/server.gocmd/hyperfleet-api/server/api_server.gotest/helper.gocmd/hyperfleet-api/server/metrics_server.goAGENTS.md
📚 Learning: 2026-01-13T04:16:08.743Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.
Applied to files:
cmd/hyperfleet-api/server/server.gocmd/hyperfleet-api/server/api_server.gotest/helper.gopkg/config/shutdown.gocmd/hyperfleet-api/server/metrics_server.gopkg/config/config.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
Applied to files:
test/helper.gocmd/hyperfleet-api/server/metrics_server.goAGENTS.md
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration
Applied to files:
cmd/hyperfleet-api/server/metrics_server.go
📚 Learning: 2026-01-08T09:40:10.138Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.
Applied to files:
cmd/hyperfleet-api/server/metrics_server.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : HTTP handlers must follow the pattern: parse request → call service/DAO → handle errors → send response with appropriate HTTP status codes
Applied to files:
cmd/hyperfleet-api/server/metrics_server.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept resource creation/update/delete requests via REST endpoints, validate, and persist to PostgreSQL
Applied to files:
cmd/hyperfleet-api/server/metrics_server.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment
Applied to files:
cmd/hyperfleet-api/server/metrics_server.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations
Applied to files:
cmd/hyperfleet-api/server/metrics_server.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,dao}/**/*.go : Errors must use structured ServiceError type from pkg/errors/ with HttpCode, Code, and Reason fields; send via errors.SendError(w, r, err)
Applied to files:
cmd/hyperfleet-api/server/metrics_server.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Serve resource data to Sentinel via GET /{resourceType} endpoints with pagination and filtering support
Applied to files:
cmd/hyperfleet-api/server/metrics_server.go
🧬 Code graph analysis (4)
cmd/hyperfleet-api/server/api_server.go (1)
pkg/telemetry/otel.go (1)
Shutdown(67-72)
test/helper.go (5)
pkg/db/session.go (1)
SessionFactory(12-20)pkg/config/config.go (1)
ApplicationConfig(15-22)cmd/hyperfleet-api/environments/types.go (1)
ApplicationConfig(32-34)cmd/hyperfleet-api/server/server.go (1)
Server(13-18)test/factories/factory.go (1)
Factories(9-10)
cmd/hyperfleet-api/server/metrics_server.go (3)
pkg/health/handlers.go (1)
NewHandler(19-23)cmd/hyperfleet-api/environments/types.go (1)
Database(36-38)pkg/db/session.go (1)
SessionFactory(12-20)
pkg/config/config.go (1)
pkg/config/shutdown.go (2)
ShutdownConfig(15-17)NewShutdownConfig(19-23)
🔇 Additional comments (15)
test/helper.go (2)
56-68: LGTM!The
Helperstruct cleanly removes theHealthCheckServerfield, aligning with the PR's consolidation of health endpoints onto the metrics server.
157-179: LGTM!The
Stopcalls now correctly passcontext.Background()to match the updatedServerinterface signature. Usingcontext.Background()is appropriate for test teardown where there's no parent context or timeout requirement.AGENTS.md (3)
347-352: LGTM!Documentation accurately reflects the consolidation of health endpoints onto the metrics server and the new graceful shutdown configuration. The
--shutdown-timeoutflag withSHUTDOWN_TIMEOUTenvironment variable is clearly documented.
690-691: LGTM!Server configuration section correctly updated to show all three endpoints (
/metrics,/healthz,/readyz) served on port 8080.
775-778: LGTM!Health endpoints documentation is clear and accurate:
/healthzfor liveness (process alive check)/readyzfor readiness (database connection check)This aligns with Kubernetes probe conventions and the implementation in
pkg/health/handlers.go.pkg/config/shutdown.go (2)
10-13: LGTM!The 20-second default shutdown timeout is a reasonable choice that provides adequate time for connection draining while not extending shutdown unnecessarily.
15-23: LGTM!Clean struct definition and constructor following the established config pattern in the codebase.
pkg/config/config.go (3)
15-22: LGTM!The
Shutdownfield is correctly added toApplicationConfig, replacing the removedHealthCheckconfiguration. The struct maintains consistency with other config fields.
24-33: LGTM!Constructor properly initializes
ShutdownviaNewShutdownConfig(), ensuring the default timeout is set.
35-43: LGTM!The
--shutdown-timeoutflag is correctly wired throughc.Shutdown.AddFlags(flagset).cmd/hyperfleet-api/server/metrics_server.go (2)
25-29: LGTM!Health endpoints are correctly integrated:
/healthzfor liveness probe (process alive)/readyzfor readiness probe (database connectivity)- Handler properly initialized with
SessionFactoryfor database health checks- HTTP methods correctly restricted to GET
This consolidates health checks onto the metrics server as intended. Based on learnings, this replaces the previous
/healthcheckendpoint on port 8083.
77-79: LGTM!The
Stopmethod correctly accepts a context and passes it tohttp.Server.Shutdown(ctx), enabling graceful shutdown with timeout control from the caller.cmd/hyperfleet-api/server/server.go (2)
13-18: LGTM! Context-aware Stop signature enables proper graceful shutdown.Adding
context.Contextto theStopmethod allows callers to control shutdown timeouts and cancellation, which is the correct pattern for graceful shutdown in Go.
27-34: Correct handling of server closed error.The
checkfunction appropriately filters outhttp.ErrServerClosed, which is expected during graceful shutdown and should not cause an exit.cmd/hyperfleet-api/server/api_server.go (1)
157-158: LGTM! Context is correctly propagated to HTTP server shutdown.The implementation properly passes the context to
httpServer.Shutdown(ctx), allowing the graceful shutdown timeout configured viaSHUTDOWN_TIMEOUTto be respected. This is consistent with similar patterns inpkg/telemetry/otel.go:Shutdown.
b421a83 to
dd171ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmd/hyperfleet-api/servecmd/cmd.go (1)
110-114: Use a timeout context for OTel shutdown instead ofcontext.Background().The current implementation at line 111 passes an unbounded context to
telemetry.Shutdown(), which could hang indefinitely if OTel exporters are slow or unresponsive. Create a shutdown context with a reasonable timeout (e.g., 5–30 seconds) to prevent blocking graceful shutdown.if tp != nil { shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() if err := telemetry.Shutdown(shutdownCtx, tp); err != nil { logger.WithError(ctx, err).Error("Failed to shutdown OpenTelemetry") } }
🤖 Fix all issues with AI agents
In `@cmd/hyperfleet-api/server/health_server.go`:
- Around line 72-74: Stop() currently uses context.Background() and thus has no
deadline; change it to use a context with a timeout (configurable via
SHUTDOWN_TIMEOUT or a healthServer field) so Shutdown cannot hang indefinitely.
Specifically, add a shutdown timeout value (e.g., parse SHUTDOWN_TIMEOUT into a
time.Duration at server construction or add a shutdownTimeout time.Duration
field on healthServer), then implement Stop() to create a context with
context.WithTimeout(context.Background(), s.shutdownTimeout), defer cancel(),
and pass that context to s.httpServer.Shutdown; ensure the parsed timeout has a
sensible default (20s) and propagate/return the Shutdown error as before.
🧹 Nitpick comments (4)
pkg/health/handler.go (2)
68-75: UsePingContextwith request context to prevent hanging readiness checks.
sqlDB.Ping()has no timeout and could hang indefinitely if the database is unresponsive, causing Kubernetes readiness probes to timeout unpredictably. UsePingContextwith the request's context to respect probe timeouts.♻️ Suggested fix
- if err := sqlDB.Ping(); err != nil { + if err := sqlDB.PingContext(r.Context()); err != nil {
30-79: Consider logging readiness failures for operational visibility.Per coding guidelines, structured logging via
logger.With(ctx, ...)should be used. Logging readiness failures (shutdown, not ready, DB unavailable) would help with debugging during deployment issues.pkg/config/metrics.go (1)
24-24: Minor: Typo in flag description."adddress" has an extra 'd'. This is pre-existing but worth fixing while you're in this file.
✏️ Typo fix
- fs.StringVar(&s.BindAddress, "metrics-server-bindaddress", s.BindAddress, "Metrics server bind adddress") + fs.StringVar(&s.BindAddress, "metrics-server-bindaddress", s.BindAddress, "Metrics server bind address")cmd/hyperfleet-api/servecmd/cmd.go (1)
83-88: Potential race condition:SetReady()called before health server is listening.The
healthServer.Start()is called in a goroutine, andSetReady()is invoked immediately after. There's a window where the readiness state is set to ready before the health server's HTTP listener is fully bound and accepting connections. This could cause the readiness probe to fail initially if Kubernetes checks/readyzbefore the server is actually listening.Consider having the health server signal when it's ready to accept connections, or add a brief synchronization mechanism.
♻️ One approach: Use a channel to signal server readiness
// In health_server.go, add a Ready channel type healthServer struct { httpServer *http.Server Ready chan struct{} } func (s *healthServer) Start() { close(s.Ready) // Signal ready after listener is bound // ... existing start logic }healthServer := server.NewHealthServer() go healthServer.Start() + +// Wait for health server to be ready (with timeout) +select { +case <-healthServer.Ready: +case <-time.After(5 * time.Second): + logger.Error(ctx, "Health server failed to start in time") +} // Mark application as ready to receive traffic health.GetReadinessState().SetReady()
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (21)
.claude/settings.local.jsonAGENTS.mdREADME.mdcharts/templates/deployment.yamlcharts/templates/service.yamlcharts/values.yamlcmd/hyperfleet-api/servecmd/cmd.gocmd/hyperfleet-api/server/health_server.gocmd/hyperfleet-api/server/healthcheck_server.gocmd/hyperfleet-api/server/metrics_server.godocs/deployment.mddocs/development.mddocs/hyperfleet-api.httpgo.modpkg/config/config.gopkg/config/health.gopkg/config/health_check.gopkg/config/metrics.gopkg/health/handler.gopkg/health/readiness.gotest/helper.go
💤 Files with no reviewable changes (2)
- pkg/config/health_check.go
- cmd/hyperfleet-api/server/healthcheck_server.go
✅ Files skipped from review due to trivial changes (1)
- .claude/settings.local.json
🚧 Files skipped from review as they are similar to previous changes (7)
- cmd/hyperfleet-api/server/metrics_server.go
- test/helper.go
- pkg/health/readiness.go
- docs/development.md
- docs/deployment.md
- charts/values.yaml
- charts/templates/service.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID
Files:
pkg/config/health.gopkg/health/handler.gopkg/config/config.gopkg/config/metrics.gocmd/hyperfleet-api/servecmd/cmd.gocmd/hyperfleet-api/server/health_server.go
🧠 Learnings (20)
📓 Common learnings
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: go.mod:32-36
Timestamp: 2026-01-13T04:19:50.012Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, the team has decided to keep OpenTelemetry core packages (go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk, go.opentelemetry.io/otel/trace, go.opentelemetry.io/otel/exporters/stdout/stdouttrace) at v1.38.0 rather than upgrading to v1.39.0 in PR `#31`.
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags
Applied to files:
README.mdpkg/config/health.gopkg/config/config.gopkg/config/metrics.goAGENTS.mdcmd/hyperfleet-api/servecmd/cmd.gocmd/hyperfleet-api/server/health_server.godocs/hyperfleet-api.httpcharts/templates/deployment.yaml
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
Applied to files:
README.mdpkg/config/health.gopkg/health/handler.gopkg/config/config.goAGENTS.mdcmd/hyperfleet-api/servecmd/cmd.gocmd/hyperfleet-api/server/health_server.godocs/hyperfleet-api.httpcharts/templates/deployment.yaml
📚 Learning: 2026-01-13T04:19:50.012Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: go.mod:32-36
Timestamp: 2026-01-13T04:19:50.012Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, the team has decided to keep OpenTelemetry core packages (go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk, go.opentelemetry.io/otel/trace, go.opentelemetry.io/otel/exporters/stdout/stdouttrace) at v1.38.0 rather than upgrading to v1.39.0 in PR `#31`.
Applied to files:
go.moddocs/hyperfleet-api.http
📚 Learning: 2026-01-13T10:10:34.485Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/telemetry/otel.go:10-10
Timestamp: 2026-01-13T10:10:34.485Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository (PR `#31`), the team has decided to keep the OpenTelemetry semantic conventions package at v1.37.0 (go.opentelemetry.io/otel/semconv/v1.37.0) while using the SDK at v1.38.0 in pkg/telemetry/otel.go.
Applied to files:
go.moddocs/hyperfleet-api.http
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment
Applied to files:
go.modcmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to **/*.go : Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Applied to files:
go.mod
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/db/**/*.go : Use PostgreSQL 14.2 with GORM ORM for database operations
Applied to files:
go.mod
📚 Learning: 2026-01-13T04:16:08.743Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.
Applied to files:
pkg/config/health.gopkg/health/handler.gopkg/config/config.gopkg/config/metrics.gocmd/hyperfleet-api/servecmd/cmd.gocmd/hyperfleet-api/server/health_server.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : HTTP handlers must follow the pattern: parse request → call service/DAO → handle errors → send response with appropriate HTTP status codes
Applied to files:
pkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Database transactions are automatically managed by TransactionMiddleware for all HTTP requests; store session in context and auto-commit on success, rollback on error
Applied to files:
pkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept resource creation/update/delete requests via REST endpoints, validate, and persist to PostgreSQL
Applied to files:
pkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,dao}/**/*.go : Errors must use structured ServiceError type from pkg/errors/ with HttpCode, Code, and Reason fields; send via errors.SendError(w, r, err)
Applied to files:
pkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,db}/**/*.go : Use context values for passing request-scoped data (session, operation ID, user info) rather than function parameters or global state
Applied to files:
pkg/health/handler.gocmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration
Applied to files:
pkg/health/handler.gopkg/config/metrics.goAGENTS.md
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept status updates from adapters via POST /{resourceType}/{id}/statuses and store in adapter_statuses table
Applied to files:
pkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Serve resource data to Sentinel via GET /{resourceType} endpoints with pagination and filtering support
Applied to files:
pkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/dao/**/*.go : All DAO methods must accept context.Context as the first parameter for transaction propagation
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T09:40:10.138Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
🧬 Code graph analysis (3)
pkg/health/handler.go (2)
pkg/db/session.go (1)
SessionFactory(12-20)pkg/health/readiness.go (1)
GetReadinessState(20-28)
pkg/config/config.go (1)
pkg/config/health.go (2)
HealthConfig(7-10)NewHealthConfig(12-17)
cmd/hyperfleet-api/servecmd/cmd.go (4)
cmd/hyperfleet-api/server/health_server.go (1)
NewHealthServer(16-33)pkg/health/readiness.go (1)
GetReadinessState(20-28)pkg/logger/logger.go (3)
Info(253-255)WithError(284-289)Error(265-267)cmd/hyperfleet-api/environments/framework.go (1)
Environment(55-57)
🔇 Additional comments (20)
go.mod (1)
128-128: LGTM - Indirect dependencies added.The new indirect dependencies (
go.yaml.in/yaml/v2andsigs.k8s.io/yaml) are likely pulled in transitively by other packages in the dependency tree. No direct action required.Also applies to: 140-140
README.md (1)
83-85: Documentation updated to reflect new endpoint architecture.The endpoint changes align with the PR objectives:
- Health endpoints (
/healthz,/readyz) consolidated on port 8080- Metrics moved to port 9090
This is a breaking change from the previous architecture (health on 8083, metrics on 8080). Based on learnings, ensure Kubernetes deployment manifests and any external monitoring configurations are updated accordingly.
pkg/health/handler.go (2)
1-20: LGTM - Handler structure with dependency injection.Good use of dependency injection for the
sessionFactory. Making it optional (nil-safe inReadinessHandler) allows flexibility for different deployment scenarios.
22-28: LGTM - Simple liveness check.Liveness probe correctly returns 200 OK unconditionally. This is appropriate since liveness should only indicate the process is running, not dependent on external resources.
cmd/hyperfleet-api/server/health_server.go (2)
16-33: LGTM - Health server setup with proper endpoint registration.Good setup with:
- NotFoundHandler for unmatched routes
- Health handler integration with session factory for DB checks
- Proper configuration binding via
env().Config.Health.BindAddress
48-70: LGTM - Start method with TLS support and proper error handling.Good implementation:
- TLS configuration check before starting HTTPS
- Proper handling of
http.ErrServerClosedvs other errors- Structured logging with
logger.With()per coding guidelinespkg/config/metrics.go (1)
15-21: LGTM - Metrics port updated to 9090.The default port change from 8080 to 9090 correctly aligns with the PR's architectural change: health endpoints now occupy port 8080, while metrics move to 9090.
docs/hyperfleet-api.http (1)
7-8: LGTM! Documentation correctly reflects the new port architecture.The port assignments (health on 8080, metrics on 9090) and endpoint paths (
/healthz,/readyz) align with the team's agreed standards and Kubernetes conventions.Also applies to: 196-201, 205-205
charts/templates/deployment.yaml (2)
56-57: LGTM! Port and flag configuration correctly updated.The health server bind address and metrics server bind address flags are properly configured with the new defaults (
:8080and:9090respectively), consistent with theHealthConfigandMetricsConfigdefaults.Also applies to: 62-66
83-98: LGTM! Probe configuration follows Kubernetes best practices.The probe timing configuration is well-suited for graceful shutdown:
- Liveness probe (
/healthz) has longer intervals (15s initial, 20s period) to avoid unnecessary restarts during normal operation.- Readiness probe (
/readyz) has shorter intervals (5s initial, 5s period) for faster traffic draining during shutdown.The readiness
failureThreshold: 3withperiodSeconds: 5means traffic will be drained within ~15 seconds of marking the service as shutting down, which aligns well with the defaultSHUTDOWN_TIMEOUTof 20s.AGENTS.md (2)
347-350: LGTM! CLI flag documentation updated correctly.The flag names and defaults match the implementation in
pkg/config/health.goandpkg/config/metrics.go.
689-690: LGTM! Server configuration documentation reflects the new architecture.The port assignments and endpoint paths are correctly documented, aligning with the deployment templates and configuration defaults.
Also applies to: 777-777
pkg/config/health.go (1)
7-26: LGTM! Clean configuration struct following established patterns.The
HealthConfigstruct follows the same conventions as other configuration types in the package, with proper JSON tags, a constructor with sensible defaults, flag binding viaAddFlags, and aReadFilesstub for consistency with the configuration interface.cmd/hyperfleet-api/servecmd/cmd.go (3)
17-17: LGTM! Health package import added for readiness state management.
96-102: LGTM! Shutdown sequence correctly marks service as not ready before stopping servers.The graceful shutdown flow properly:
- Marks the service as shutting down (returns 503 on
/readyz)- Stops the health server first to stop accepting new health checks
- Then stops API and metrics servers
This allows Kubernetes to drain traffic before the application fully shuts down.
116-117: LGTM! Environment teardown ensures proper resource cleanup.Calling
environments.Environment().Teardown()at the end of shutdown ensures database connections and other resources are properly released.pkg/config/config.go (4)
15-22: LGTM!The struct field rename from
HealthChecktoHealthis clean, and the JSON tag"health"follows proper conventions.
24-33: LGTM!Constructor correctly initializes
HealthwithNewHealthConfig(). Based on learnings, this changes the health endpoint from port 8083 to 8080, which aligns with the PR objective to consolidate health endpoints on the metrics server port.
35-43: LGTM!Flag registration updated consistently to use
c.Health.AddFlags(flagset).
45-65: LGTM!ReadFiles wiring updated consistently with the display name "Health" for error reporting.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
dd171ef to
ea69bfb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
charts/templates/service.yaml (1)
14-25: Duplicate port name "health" will cause Kubernetes validation errors.The Service defines two ports both named "health" (lines 17 and 21). Kubernetes requires unique port names within a Service. The old port 8083 entry (lines 14-17) should be removed since health endpoints have moved to port 8080.
🐛 Proposed fix: Remove the deprecated 8083 port entry
- port: {{ .Values.service.port }} targetPort: http protocol: TCP name: http - - port: 8083 - targetPort: health - protocol: TCP - name: health - port: 8080 targetPort: health protocol: TCP name: health - port: 9090 targetPort: metrics protocol: TCP name: metrics
🤖 Fix all issues with AI agents
In `@cmd/hyperfleet-api/servecmd/cmd.go`:
- Around line 83-88: Readiness is being set immediately after spawning the
health server goroutine, creating a race where probes may hit before
ListenAndServe binds; modify server.NewHealthServer()/healthServer.Start() to
provide a readiness signal (e.g., Start returns a channel, error or exposes a
method like WaitUntilListening/NotifyListening) and only call
health.GetReadinessState().SetReady() after that signal confirms the server is
actually listening (reference healthServer.Start(), server.NewHealthServer(),
and ListenAndServe).
♻️ Duplicate comments (2)
cmd/hyperfleet-api/server/health_server.go (1)
45-50: EmptyListen()andServe()methods remain no-ops.These methods exist to satisfy the
Serverinterface but do nothing, whileStart()combines both responsibilities. This was flagged in a previous review as inconsistent with other server implementations likeapiServer. If the interface contract expects callers to useListen() → Serve()separately, this could cause subtle bugs. Consider documenting this design choice or creating a simpler interface for servers that only useStart()/Stop().cmd/hyperfleet-api/servecmd/cmd.go (1)
110-114: OTel shutdown still usescontext.Background()instead of a bounded context.The past review flagged this issue and it was marked as addressed in commit f7a96ea, but the code still shows
context.Background(). If the OTel exporters are slow or unresponsive, this could cause the shutdown to hang indefinitely. Consider using a context with timeout.#!/bin/bash # Verify the current state of telemetry.Shutdown usage rg -n "telemetry.Shutdown" --type go -C 3
🧹 Nitpick comments (2)
cmd/hyperfleet-api/server/health_server.go (1)
69-73: Inconsistent logging pattern at line 72.Lines 63 and 66 use
logger.With(ctx, ...).Info(...)for structured logging with context fields, but line 72 useslogger.Info(ctx, ...). For consistency and to follow the structured logging pattern, consider aligning this call.♻️ Suggested fix
if err != nil && err != http.ErrServerClosed { check(err, "Health server terminated with errors") } else { - logger.Info(ctx, "Health server terminated") + logger.With(ctx).Info("Health server terminated") }test/helper.go (1)
109-119: Health server not added to teardown list.The health server is started but
stopHealthServeris not added to the teardown list, which means it won't be stopped during test cleanup. The metrics server has the same pattern (started but not in teardowns), but for completeness, consider adding both to ensure proper cleanup.Suggested fix
First, add a
stopHealthServermethod:func (helper *Helper) stopHealthServer() error { if err := helper.HealthServer.Stop(); err != nil { return fmt.Errorf("unable to stop health server: %s", err.Error()) } return nil }Then update the teardowns list:
helper.teardowns = []func() error{ helper.CleanDB, jwkMockTeardown, helper.stopAPIServer, + helper.stopMetricsServer, + helper.stopHealthServer, helper.teardownEnv, }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (21)
.claude/settings.local.jsonAGENTS.mdREADME.mdcharts/templates/deployment.yamlcharts/templates/service.yamlcharts/values.yamlcmd/hyperfleet-api/servecmd/cmd.gocmd/hyperfleet-api/server/health_server.gocmd/hyperfleet-api/server/healthcheck_server.gocmd/hyperfleet-api/server/metrics_server.godocs/deployment.mddocs/development.mddocs/hyperfleet-api.httpgo.modpkg/config/config.gopkg/config/health.gopkg/config/health_check.gopkg/config/metrics.gopkg/health/handler.gopkg/health/readiness.gotest/helper.go
💤 Files with no reviewable changes (2)
- cmd/hyperfleet-api/server/healthcheck_server.go
- pkg/config/health_check.go
🚧 Files skipped from review as they are similar to previous changes (5)
- charts/values.yaml
- go.mod
- docs/development.md
- cmd/hyperfleet-api/server/metrics_server.go
- README.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID
Files:
cmd/hyperfleet-api/servecmd/cmd.gopkg/config/health.gopkg/health/readiness.gocmd/hyperfleet-api/server/health_server.gotest/helper.gopkg/config/config.gopkg/config/metrics.gopkg/health/handler.go
🧠 Learnings (19)
📓 Common learnings
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: go.mod:32-36
Timestamp: 2026-01-13T04:19:50.012Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, the team has decided to keep OpenTelemetry core packages (go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk, go.opentelemetry.io/otel/trace, go.opentelemetry.io/otel/exporters/stdout/stdouttrace) at v1.38.0 rather than upgrading to v1.39.0 in PR `#31`.
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.godocs/deployment.mdpkg/config/health.goAGENTS.mdcmd/hyperfleet-api/server/health_server.gotest/helper.gopkg/config/config.gocharts/templates/deployment.yamlpkg/config/metrics.gocharts/templates/service.yamldocs/hyperfleet-api.http
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.godocs/deployment.mdpkg/config/health.goAGENTS.mdpkg/health/readiness.gocmd/hyperfleet-api/server/health_server.gotest/helper.gopkg/config/config.gocharts/templates/deployment.yamlpkg/health/handler.godocs/hyperfleet-api.http
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/dao/**/*.go : All DAO methods must accept context.Context as the first parameter for transaction propagation
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,db}/**/*.go : Use context values for passing request-scoped data (session, operation ID, user info) rather than function parameters or global state
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.gopkg/health/handler.go
📚 Learning: 2026-01-08T09:40:10.138Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-13T04:16:08.743Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.gopkg/config/health.gopkg/health/readiness.gocmd/hyperfleet-api/server/health_server.gotest/helper.gopkg/config/config.gopkg/config/metrics.gopkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration
Applied to files:
docs/deployment.mdAGENTS.mdpkg/config/metrics.gopkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/services/**/*.go : Status aggregation: Phase is 'Ready' if all adapters report Ready=True, 'Failed' if any adapter reports Ready=False, 'NotReady' otherwise
Applied to files:
pkg/health/readiness.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : HTTP handlers must follow the pattern: parse request → call service/DAO → handle errors → send response with appropriate HTTP status codes
Applied to files:
cmd/hyperfleet-api/server/health_server.gopkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,dao}/**/*.go : Errors must use structured ServiceError type from pkg/errors/ with HttpCode, Code, and Reason fields; send via errors.SendError(w, r, err)
Applied to files:
cmd/hyperfleet-api/server/health_server.gopkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Database transactions are automatically managed by TransactionMiddleware for all HTTP requests; store session in context and auto-commit on success, rollback on error
Applied to files:
pkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept resource creation/update/delete requests via REST endpoints, validate, and persist to PostgreSQL
Applied to files:
pkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Accept status updates from adapters via POST /{resourceType}/{id}/statuses and store in adapter_statuses table
Applied to files:
pkg/health/handler.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Serve resource data to Sentinel via GET /{resourceType} endpoints with pagination and filtering support
Applied to files:
pkg/health/handler.go
📚 Learning: 2026-01-13T10:10:34.485Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/telemetry/otel.go:10-10
Timestamp: 2026-01-13T10:10:34.485Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository (PR `#31`), the team has decided to keep the OpenTelemetry semantic conventions package at v1.37.0 (go.opentelemetry.io/otel/semconv/v1.37.0) while using the SDK at v1.38.0 in pkg/telemetry/otel.go.
Applied to files:
docs/hyperfleet-api.http
📚 Learning: 2026-01-13T04:19:50.012Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: go.mod:32-36
Timestamp: 2026-01-13T04:19:50.012Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, the team has decided to keep OpenTelemetry core packages (go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk, go.opentelemetry.io/otel/trace, go.opentelemetry.io/otel/exporters/stdout/stdouttrace) at v1.38.0 rather than upgrading to v1.39.0 in PR `#31`.
Applied to files:
docs/hyperfleet-api.http
🧬 Code graph analysis (4)
cmd/hyperfleet-api/servecmd/cmd.go (4)
cmd/hyperfleet-api/server/health_server.go (1)
NewHealthServer(17-36)pkg/health/readiness.go (1)
GetReadinessState(20-28)pkg/logger/logger.go (3)
Info(253-255)WithError(284-289)Error(265-267)cmd/hyperfleet-api/environments/framework.go (1)
Environment(55-57)
test/helper.go (2)
cmd/hyperfleet-api/server/server.go (1)
Server(13-18)cmd/hyperfleet-api/server/health_server.go (1)
NewHealthServer(17-36)
pkg/config/config.go (1)
pkg/config/health.go (2)
HealthConfig(9-13)NewHealthConfig(15-21)
pkg/health/handler.go (2)
pkg/db/session.go (1)
SessionFactory(12-20)pkg/health/readiness.go (1)
GetReadinessState(20-28)
🔇 Additional comments (25)
.claude/settings.local.json (1)
4-8: LGTM!The added Bash permissions are reasonable for a local development workflow. They allow Claude to assist with common operations like linting (
make lint), file listing (ls), and git staging/stashing—all consistent with the PR's development and testing requirements.pkg/config/metrics.go (1)
15-21: Port consolidation correctly implemented and documented.The metrics bind address change to port 9090 is properly implemented and consistently reflected across Helm charts, configuration files, and documentation. Health endpoints are correctly consolidated on port 8080. This is a breaking change for existing deployments, but the necessary updates to Kubernetes manifests, charts (via
metricsBindAddressandhealthBindAddressvalues), and documentation are already in place.pkg/health/readiness.go (1)
1-59: LGTM! Well-implemented thread-safe readiness state.The singleton pattern using
sync.Onceis idiomatic, and theRWMutexcorrectly guards concurrent access. The semantic choice to clearshuttingDowninSetReady()allows for recovery scenarios if needed.pkg/config/health.go (2)
15-21: Verify thatlocalhostbinding is intentional for containerized deployments.The default
localhost:8080restricts connections to the local interface only. While Kubernetes probes run within the same pod and can reach localhost, this may cause issues if external health checks or service mesh sidecars need access. Consider whether0.0.0.0:8080or:8080is more appropriate for your deployment scenario.
23-27: LGTM! Flags are well-defined.The flag names are clear and consistent with the health server configuration pattern.
pkg/health/handler.go (2)
22-28: LGTM! Simple and correct liveness implementation.The liveness check correctly returns 200 OK unconditionally, which is appropriate—liveness should only fail if the process is deadlocked or crashed.
33-79: LGTM! Readiness check ordering is correct.The check sequence (shutdown state → readiness flag → database connectivity) is well-designed. Checking shutdown state first ensures Kubernetes stops routing traffic immediately when shutdown begins, before waiting for DB checks.
The database ping on each readiness probe is appropriate for Kubernetes probe intervals (typically 10s+), ensuring the service only receives traffic when it can actually serve requests. Based on learnings, this aligns with the requirement that health checks verify database accessibility.
cmd/hyperfleet-api/server/health_server.go (1)
76-80: Good fix:Stop()now respects the shutdown timeout.This properly addresses graceful shutdown by using the configurable
shutdownTimeout, preventing indefinite hangs during connection draining.pkg/config/config.go (2)
15-22: LGTM! Clean integration ofHealthConfig.The replacement of
HealthCheckwithHealthis consistently applied across the struct definition and all wiring points.
24-32: LGTM!Constructor properly initializes the new
Healthfield.docs/hyperfleet-api.http (2)
7-8: LGTM!Port variables updated correctly to reflect the new architecture: health on 8080, metrics on 9090.
196-205: LGTM!Health probe examples correctly updated to use
/healthzand/readyzendpoints on the health port, with metrics on the dedicated metrics port.docs/deployment.md (3)
81-82: LGTM!Environment variable documentation correctly reflects the new port defaults.
275-278: LGTM!Health checks documentation clearly describes the liveness, readiness, and metrics endpoints with their respective ports and behaviors.
297-297: LGTM!Prometheus metrics URL correctly updated to port 9090.
charts/templates/deployment.yaml (2)
56-67: LGTM!Server bind address flags and container ports correctly implement the three-port separation: API on 8000, health on 8080, metrics on 9090.
83-98: LGTM!Probe configuration looks appropriate:
- Liveness probe (
/healthz): 15s initial delay with 20s period provides reasonable failure detection without excessive load.- Readiness probe (
/readyz): 5s period enables quick traffic routing adjustments during startup and shutdown.cmd/hyperfleet-api/servecmd/cmd.go (2)
96-102: LGTM!Proper graceful shutdown sequence: marking the application as not ready before stopping servers ensures load balancers stop routing new traffic during the drain period.
116-117: LGTM!Good addition of
Teardown()to properly close database connections and other resources during graceful shutdown.AGENTS.md (3)
347-350: LGTM! Flag documentation updated correctly.The CLI flag documentation is updated consistently with the new naming convention (
--health-server-bindaddress,--enable-health-https) and correctly reflects the new port mappings (health on 8080, metrics on 9090).
689-690: LGTM! Port documentation is accurate.The port documentation correctly reflects the new architecture: health endpoints (
/healthz,/readyz) on port 8080 and metrics (/metrics) on port 9090.
775-777: Inconsistent health endpoint documentation.Line 775 still references
GET /healthcheckwhich appears to be outdated given the new endpoints are/healthz(liveness) and/readyz(readiness) on port 8080. This should be updated for consistency with the rest of the documentation.Suggested fix
-**Health Check**: `GET /healthcheck` returns 200 OK when database is accessible +**Health Check**: Liveness at `GET /healthz` (port 8080) returns 200 OK; Readiness at `GET /readyz` returns 200 OK when database is accessible (503 during shutdown)⛔ Skipped due to learnings
Learnt from: CR Repo: openshift-hyperfleet/hyperfleet-api PR: 0 File: AGENTS.md:0-0 Timestamp: 2026-01-08T01:23:20.647Z Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probesLearnt from: CR Repo: openshift-hyperfleet/hyperfleet-api PR: 0 File: AGENTS.md:0-0 Timestamp: 2026-01-08T01:23:20.647Z Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flagsLearnt from: CR Repo: openshift-hyperfleet/hyperfleet-api PR: 0 File: AGENTS.md:0-0 Timestamp: 2026-01-08T01:23:20.647Z Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-durationtest/helper.go (3)
63-63: LGTM! Field renamed consistently.The field rename from
HealthCheckServertoHealthServeraligns with the new API naming convention.
184-192: LGTM! Health server startup implementation is correct.The method correctly uses
server.NewHealthServer()matching the new API. The implementation follows the same pattern asstartMetricsServer().Minor note: The log messages say "health check server" (lines 188, 190) while the method/field names use "health server". This is a cosmetic inconsistency that could be cleaned up.
261-263: LGTM! URL helper updated correctly.The
HealthURLmethod correctly useshelper.AppConfig.Health.BindAddressto construct the URL, consistent with the new configuration structure.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
ea69bfb to
1a03475
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
charts/templates/service.yaml (1)
14-25: Duplicate port name and stale port definition will cause Kubernetes Service validation failure.There are two issues here:
Duplicate port names: Both ports 8083 (lines 14-17) and 8080 (lines 18-21) have
name: health. Kubernetes requires unique port names within a Service.Stale port: Port 8083 appears to be the legacy health check port that should have been removed. Based on the PR objectives, health endpoints are now on port 8080.
This will fail with a Kubernetes validation error similar to:
spec.ports[2].name: Duplicate value: "health".🐛 Proposed fix: Remove legacy port 8083
- port: {{ .Values.service.port }} targetPort: http protocol: TCP name: http - - port: 8083 - targetPort: health - protocol: TCP - name: health - port: 8080 targetPort: health protocol: TCP name: health - port: 9090 targetPort: metrics protocol: TCP name: metrics
♻️ Duplicate comments (1)
cmd/hyperfleet-api/server/health_server.go (1)
47-52: EmptyListen()andServe()methods are inconsistent with interface contract.These methods return no-op values while
apiServeractually implements theListen() → Serve()pattern. This is a Liskov Substitution Principle concern sincehealthServercannot be used interchangeably with otherServerimplementations that expect these methods to work. Consider either implementing properly or defining a simpler interface for servers that only useStart()/Stop().
🧹 Nitpick comments (1)
cmd/hyperfleet-api/servecmd/cmd.go (1)
115-119: OTel shutdown still uses unbounded context.Despite the past review indicating this was addressed, telemetry shutdown still uses
context.Background()instead of a bounded context. If OTel exporters are unresponsive, this could hang indefinitely. Consider using a context with timeout for consistency with other shutdown operations.♻️ Suggested fix
+ shutdownCtx, cancel := context.WithTimeout(context.Background(), env().Config.Health.ShutdownTimeout) + defer cancel() + if tp != nil { - if err := telemetry.Shutdown(context.Background(), tp); err != nil { + if err := telemetry.Shutdown(shutdownCtx, tp); err != nil { logger.WithError(ctx, err).Error("Failed to shutdown OpenTelemetry") } }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (22)
.claude/settings.local.jsonAGENTS.mdREADME.mdcharts/templates/deployment.yamlcharts/templates/service.yamlcharts/values.yamlcmd/hyperfleet-api/servecmd/cmd.gocmd/hyperfleet-api/server/health_server.gocmd/hyperfleet-api/server/healthcheck_server.gocmd/hyperfleet-api/server/metrics_server.gocmd/hyperfleet-api/server/server.godocs/deployment.mddocs/development.mddocs/hyperfleet-api.httpgo.modpkg/config/config.gopkg/config/health.gopkg/config/health_check.gopkg/config/metrics.gopkg/health/handler.gopkg/health/readiness.gotest/helper.go
💤 Files with no reviewable changes (2)
- cmd/hyperfleet-api/server/healthcheck_server.go
- pkg/config/health_check.go
🚧 Files skipped from review as they are similar to previous changes (10)
- docs/hyperfleet-api.http
- cmd/hyperfleet-api/server/metrics_server.go
- pkg/config/config.go
- go.mod
- pkg/health/readiness.go
- test/helper.go
- pkg/health/handler.go
- pkg/config/metrics.go
- .claude/settings.local.json
- README.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use Go 1.24.9 with FIPS-compliant crypto enabled (CGO_ENABLED=1, GOEXPERIMENT=boringcrypto)
Use structured logging via logger.NewOCMLogger(ctx) with context-aware log fields including operation ID, account ID, and transaction ID
Files:
cmd/hyperfleet-api/server/server.gocmd/hyperfleet-api/servecmd/cmd.gopkg/config/health.gocmd/hyperfleet-api/server/health_server.go
🧠 Learnings (13)
📓 Common learnings
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/serve/**/*.go : API server runs on port 8000, metrics endpoint on 8080, health check on 8083; these are configurable via CLI flags
Applied to files:
cmd/hyperfleet-api/server/server.gocmd/hyperfleet-api/servecmd/cmd.gocharts/values.yamlpkg/config/health.gocmd/hyperfleet-api/server/health_server.godocs/development.mddocs/deployment.mdcharts/templates/deployment.yamlAGENTS.mdcharts/templates/service.yaml
📚 Learning: 2026-01-13T04:16:08.743Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/middleware/masking.go:28-36
Timestamp: 2026-01-13T04:16:08.743Z
Learning: In the openshift-hyperfleet/hyperfleet-api repository, NewOCMLogger(ctx) is deprecated. Switch to the slog-based logger and use logger.With(ctx, ...) to produce structured, context-aware logs. Replace occurrences of NewOCMLogger(ctx) with logger.With(ctx, ...). Update imports to use the new logger and adjust any tests to expect structured logging behavior.
Applied to files:
cmd/hyperfleet-api/server/server.gocmd/hyperfleet-api/servecmd/cmd.gopkg/config/health.gocmd/hyperfleet-api/server/health_server.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Health check endpoint at GET /healthcheck returns 200 OK when database is accessible; used by Kubernetes liveness/readiness probes
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.gopkg/config/health.gocmd/hyperfleet-api/server/health_server.godocs/development.mddocs/deployment.mdcharts/templates/deployment.yamlAGENTS.md
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to cmd/hyperfleet/migrate/**/*.go : Use GORM AutoMigrate at startup via ./bin/hyperfleet-api migrate command; migrations are idempotent and track applied migrations
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/dao/**/*.go : All DAO methods must accept context.Context as the first parameter for transaction propagation
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,db}/**/*.go : Use context values for passing request-scoped data (session, operation ID, user info) rather than function parameters or global state
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/services/**/*.go : Status aggregation: Phase is 'Ready' if all adapters report Ready=True, 'Failed' if any adapter reports Ready=False, 'NotReady' otherwise
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T09:40:10.138Z
Learnt from: yasun1
Repo: openshift-hyperfleet/hyperfleet-api PR: 31
File: pkg/errors/errors.go:9-9
Timestamp: 2026-01-08T09:40:10.138Z
Learning: In openshift-hyperfleet/hyperfleet-api, pkg/api/openapi is a generated directory (excluded in .gitignore) created by `make generate` using openapi-generator-cli via Docker; linters that run before code generation will report false-positive import errors for this package.
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/api/**/*.go : Embed OpenAPI specification at compile time using Go 1.16+ //go:embed directive for self-contained binary deployment
Applied to files:
cmd/hyperfleet-api/servecmd/cmd.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : HTTP handlers must follow the pattern: parse request → call service/DAO → handle errors → send response with appropriate HTTP status codes
Applied to files:
cmd/hyperfleet-api/server/health_server.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/{handlers,services,dao}/**/*.go : Errors must use structured ServiceError type from pkg/errors/ with HttpCode, Code, and Reason fields; send via errors.SendError(w, r, err)
Applied to files:
cmd/hyperfleet-api/server/health_server.go
📚 Learning: 2026-01-08T01:23:20.647Z
Learnt from: CR
Repo: openshift-hyperfleet/hyperfleet-api PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-08T01:23:20.647Z
Learning: Applies to pkg/handlers/**/*.go : Prometheus metrics are exposed at GET /metrics endpoint for monitoring; configure metric collection via --label-metrics-inclusion-duration
Applied to files:
docs/deployment.mdAGENTS.md
🧬 Code graph analysis (1)
cmd/hyperfleet-api/server/health_server.go (4)
cmd/hyperfleet-api/server/server.go (1)
Server(13-18)pkg/health/handler.go (2)
NewHandler(16-20)Handler(11-13)cmd/hyperfleet-api/environments/types.go (1)
Database(36-38)pkg/logger/fields.go (1)
FieldBindAddress(13-13)
🔇 Additional comments (17)
docs/development.md (1)
98-100: LGTM!The documentation correctly reflects the new health and metrics endpoint configuration, which aligns with the Kubernetes probe standards (
/healthzfor liveness,/readyzfor readiness) and the updated port mappings in the Helm values.cmd/hyperfleet-api/server/server.go (1)
20-24: LGTM!Clean interface design following Go's small-interface idiom. The optional
ListenNotifierpattern allows servers to signal readiness without forcing allServerimplementations to implement it, supporting the graceful startup/shutdown mechanism.charts/values.yaml (1)
20-21: LGTM!The port remapping correctly consolidates health endpoints on port 8080 and moves metrics to port 9090, aligning with the PR objectives and documentation updates.
docs/deployment.md (3)
81-82: LGTM!Environment variable documentation correctly reflects the updated port defaults.
275-278: LGTM!Clear documentation of the health check behavior, properly distinguishing between liveness (
/healthz- process alive) and readiness (/readyz- ready for traffic, 503 during startup/shutdown) probes.
297-297: LGTM!Prometheus metrics URL correctly updated to reflect the new port 9090.
charts/templates/deployment.yaml (2)
56-66: Port configuration aligns with architecture standards.The three-port layout (API 8000, health 8080, metrics 9090) is correctly implemented. The container ports and bind addresses are consistent.
83-98: Probe configuration looks appropriate.The liveness probe (
/healthz) and readiness probe (/readyz) paths are correctly updated. The timing changes (liveness: 15s initial/20s period, readiness: 5s initial/5s period with 3s timeout) provide reasonable startup tolerance while enabling quick failure detection.cmd/hyperfleet-api/servecmd/cmd.go (3)
83-93: Readiness race condition properly addressed.The implementation correctly waits for
NotifyListening()before callingSetReady(), ensuring the health server is actually accepting connections before marking the application as ready. This addresses the previous race condition concern.
101-107: Shutdown sequence follows proper order.The shutdown flow correctly marks the service as not ready first (returning 503 on
/readyz), allowing load balancers to drain traffic before stopping servers.
121-122: Good addition of environment teardown.Adding
Environment().Teardown()ensures database connections and other resources are properly closed during shutdown.pkg/config/health.go (1)
9-31: Health configuration looks good.The struct, defaults, and flag bindings are well-structured. The default bind address (
:8080) and shutdown timeout (20s) align with the PR objectives and architecture standards.AGENTS.md (2)
347-350: Documentation correctly updated for new health/metrics configuration.Flag names and default ports are accurately documented to reflect the implementation changes.
689-690: Port documentation is accurate.Health endpoints on 8080 (
/healthz,/readyz) and metrics on 9090 (/metrics) correctly documented.cmd/hyperfleet-api/server/health_server.go (3)
17-37: Health server construction is well-structured.The server correctly wires health endpoints (
/healthz,/readyz), configures shutdown timeout from config, and initializes the listening channel for readiness signaling.
54-86:Start()implementation with listener notification is well-designed.The pattern of creating the listener first, signaling via channel close, then serving ensures reliable readiness detection. Error handling for both TLS and non-TLS modes is appropriate.
93-97:Stop()correctly implements bounded shutdown.Using
context.WithTimeoutwith the configured shutdown timeout ensures graceful shutdown won't hang indefinitely.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
…utdown - Separate health endpoints (8080) from metrics (9090) - Implement /healthz (liveness) and /readyz (readiness) endpoints - Add graceful shutdown with SIGTERM/SIGINT signal handling - Add ReadinessState for coordinated shutdown across components - Update Helm charts with new port configuration - Align with HyperFleet architecture standards
1a03475 to
bff6243
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rh-amarin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4f4c655
into
openshift-hyperfleet:main
Summary
/healthz,/readyz) on metrics server port 8080SHUTDOWN_TIMEOUT(default 20s)signal.NotifyContextfor SIGTERM/SIGINT handlingTest plan
/healthzreturns 200 OK on port 8080/readyzreturns 200 OK when ready, 503 during shutdownSIGTERMsignalSHUTDOWN_TIMEOUTenvironment variable is respectedmake testandmake lintSummary by CodeRabbit
New Features
Bug Fixes
Configuration
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.