Skip to content

Introduce deadline for init containers#4960

Open
nikola-jokic wants to merge 2 commits into
masterfrom
nikola-jokic/deadline-for-init-containers
Open

Introduce deadline for init containers#4960
nikola-jokic wants to merge 2 commits into
masterfrom
nikola-jokic/deadline-for-init-containers

Conversation

@nikola-jokic

Copy link
Copy Markdown
Contributor

What type of PR is this?

Enhancement

What this PR does / why we need it

Init containers could take time that is counted in the startup time of main containers. This change includes init containers so that checks related to the main container are based on the time where the last transition time occurred, including the init container state change.

The deadline doesn't include side-car containers (init containers with restart policy always).

@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a deadlineForInitContainers check to the pod startup flow so that stalled init containers can be detected and failed, independent of the main container startup deadline. The GetAction interface is refactored to internalize the LastStatusChange call (previously done by the caller), and LastStatusChange itself is extended to also scan InitContainerStatuses.

  • getInitContainerState walks Spec.InitContainers (skipping sidecar containers with RestartPolicy=Always) and tracks the earliest start time, currently-running container, and whether all non-sidecar init containers have successfully terminated.
  • LastStatusChange in pod_util.go now folds InitContainerStatuses into its scan, so the "time in state" calculation reflects the most recent init container transition alongside regular container transitions.

Confidence Score: 4/5

The runtime logic is sound, but two newly-added tests will fail as written.

The production path for init-container deadline enforcement is correct. However, two tests — Test_GetAction_InitContainerDeadlineUsesFirstInitStart and Test_GetAction_InitContainerDeadlineRequiresAllRegularInitContainersToComplete — will fail at assertion time: the expected message uses lowercase 'init' while the code emits uppercase 'Init', and both expected strings omit the running-container suffix that is appended whenever state.current is set. Both pods in those tests have a running init container, so the suffix is always appended. CI will catch this before the code ships, but the tests need to be updated before the PR can merge.

internal/executor/podchecks/pod_checks_test.go — lines 119 and 160 contain assertions that don't match the message the production code produces.

Important Files Changed

Filename Overview
internal/executor/podchecks/pod_checks.go Core logic for init container deadline enforcement; introduces getInitContainerState, earliestTime, latestTime helpers and restructures GetAction to own the LastStatusChange call.
internal/executor/podchecks/pod_checks_test.go Two new test assertions will fail: expected message is 'init containers…' (lowercase) but code produces 'Init containers…', and tests don't account for the appended running-container line when state.current != nil.
internal/executor/util/pod_util.go LastStatusChange now includes InitContainerStatuses in its scan; straightforward addition with no logic change.
internal/executor/service/pod_issue_handler.go LastStatusChange call removed from the handler; GetAction now called without the pre-computed time-in-state; cosmetic trailing-comma fix included.
internal/executor/configuration/podchecks/types.go Adds DeadlineForInitContainers field to the Checks config struct.
config/executor/config.yaml Adds deadlineForInitContainers: 5m to the default executor config.
internal/executor/application.go Passes clock.RealClock{} to the updated NewPodChecks constructor signature.
internal/executor/service/pod_issue_handler_test.go Updates makePendingPodChecker to pass realclock.RealClock{} to the updated NewPodChecks signature.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[GetAction called] --> B[LastStatusChange]
    B -->|error| C[return ActionWait]
    B -->|ok| D{deadlineForInitContainers > 0?}
    D -->|no| H[compute timeInState]
    D -->|yes| E[getInitContainerState]
    E --> F{first != nil AND last == nil AND elapsed > deadline?}
    F -->|no| H
    F -->|yes| G[return ActionFail / PodStartupIssue]
    H --> I{timeInState > deadlineForNodeAssignment AND not scheduled?}
    I -->|yes| J[return ActionRetry / NoNodeAssigned]
    I -->|no| K{isBadNode?}
    K -->|yes + over deadline| L[return ActionRetry / NoStatusUpdates]
    K -->|yes + under deadline| M[return ActionWait / NoStatusUpdates]
    K -->|no| N[eventChecks + containerStateChecks]
    N --> O[return maxAction result]
Loading

Reviews (2): Last reviewed commit: "pod checks extended message" | Re-trigger Greptile

Comment thread internal/executor/podchecks/pod_checks.go Outdated
Comment on lines 91 to 95

func Test_GetAction_BadNodeButUnderTimeLimit(t *testing.T) {
podChecks := podChecksWithMocks(ActionWait, ActionWait)
result, cause, message := podChecks.GetAction(createBasicPod(true), []*v1.Event{}, 10*time.Second)
result, cause, message := podChecks.GetAction(createBasicPodInStateFor(true, 10*time.Second), []*v1.Event{})
assert.Equal(t, result, ActionWait)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing test for init container in Waiting/CrashLoopBackoff state

There is test coverage for Waiting (before first start), Running, and Terminated (exit 0 and multi-container partial success), but no test for the state where an init container previously ran with a non-zero exit code and is now in Waiting state (CrashLoopBackoff). That scenario exercises the LastState gap described in the sibling comment on pod_checks.go, and a test would confirm whether the deadline is expected to fire or not during the backoff window.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>

assert.Equal(t, ActionFail, result)
assert.Equal(t, PodStartupIssue, cause)
assert.Equal(t, "init containers did not complete within deadline of 1m0s", message)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Test assertions will fail — message case mismatch and missing running-container suffix

Two new tests assert on the exact message string but it won't match what the code produces:

  1. Case mismatch: pod_checks.go line 67 writes "Init containers did not complete…" (capital I), while both assertions here (line 119) and at line 160 expect "init containers did not complete…" (lowercase i).

  2. Truncated expected string: When state.current != nil (true for both test scenarios — the pods have a running or running-as-second init container), the code appends "\nInit container <name> has been running for <duration>". The expected strings don't include this suffix, so assert.Equal will fail even after fixing the case.

Both Test_GetAction_InitContainerDeadlineUsesFirstInitStart and Test_GetAction_InitContainerDeadlineRequiresAllRegularInitContainersToComplete are affected. The simplest fix is to update the expected strings (or use assert.Contains for the prefix, and check the suffix separately), or update the production message to lowercase and include the running-container details in the expected strings.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant