Propagate proxy env to engine PUSH jobs (pippin builder)#392
Merged
Conversation
omriyonatani-tl
approved these changes
Jun 25, 2026
The pippin image-dependencies-builder init container runs its own dind daemon, which does not inherit the k3d node's containerd proxy/mirror config. Behind a corporate proxy this makes it fail to pull the engine-generic base image from public.ecr.aws (DNS + egress). Capture HTTP(S)_PROXY/NO_PROXY from the installer's own environment and inject them into the engine job template's init container, augmenting NO_PROXY with in-cluster targets (Zot registry, MinIO, cluster CIDRs, .svc/.cluster.local) so local traffic bypasses the proxy. No-op when no proxy is configured. node-server's setEnvParams merges by name and never rebuilds the env list, so these chart-declared vars are preserved. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
f9fa5d3 to
85d57fc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On a corporate-proxy / k3d install, PUSH jobs fail in the
image-dependencies-builder(pippin) init container:pippin runs its own docker-in-docker daemon to build the per-project
engine-genericimage. That daemon does not inherit the k3d node's containerd registry-mirror or proxy config (it's a separate daemon inside the pod), and it resolves via cluster DNS — so behind a proxy it can neither resolve nor reachpublic.ecr.aws.Fix
Capture
HTTP_PROXY/HTTPS_PROXY/NO_PROXYfrom the installer's own environment and inject them into the engine job template's init container, so pippin's dind egresses through the corporate proxy (which also resolves the external name, fixing the DNS error too).engine-job-template-cm.yaml: renderHTTP(S)_PROXY/NO_PROXY(upper + lower case) on the init container, only when set.engine/values.yaml: newhttp_proxy/https_proxy/no_proxy(default""→ no-op).pkg/helm/utils.go:ProxyEnvonServerHelmValuesParams, wired into thetensorleap-enginevalues.pkg/server/installation_params.go:GetEngineProxyEnv()captures proxy from env (uppercase-first, lowercase fallback;nilwhen unset) and augmentsNO_PROXYwith in-cluster targets (tensorleap-registry,tensorleap-minio,localhost,.svc,.cluster.local,10.42/10.43CIDRs) so the local Zot push, MinIO and DNS bypass the proxy.Why it's safe with node-server
node-server loads this template and mutates the init-container env via
setEnvParams, which merges by name (update-or-append) and never rebuilds/filters the array; it only setsIMAGE_TAG/DEPENDENCY_URL/BASE_IMAGEand appendsPIP_INDEX_URL/PIP_EXTRA_INDEX_URL. None collide with the proxy keys, so the chart-declared proxy vars are preserved.Validation
go build ./...,go vet,gofmt,go test ./pkg/server ./pkg/helm— pass (golden updated; newTestGetEngineProxyEnv).make build-helm;helm template— default render has 0 proxy env lines; with proxy values set the env block renders on the init container.make validate-images— 20 images valid.tensorleap1.6.28→1.6.29,tensorleap-engine1.0.610→1.0.611) — no cluster reinstall, applies on upgrade.Rollout
Operator upgrades with the container-reachable proxy exported (e.g.
cosmos-vip…:3128, notlocalhost:911); new PUSH jobs then pullengine-genericthrough the proxy.🤖 Generated with Claude Code