Skip to content

Propagate proxy env to engine PUSH jobs (pippin builder)#392

Merged
asafyehezkel merged 1 commit into
masterfrom
engine-job-proxy-env
Jun 28, 2026
Merged

Propagate proxy env to engine PUSH jobs (pippin builder)#392
asafyehezkel merged 1 commit into
masterfrom
engine-job-proxy-env

Conversation

@asafyehezkel

Copy link
Copy Markdown
Contributor

Problem

On a corporate-proxy / k3d install, PUSH jobs fail in the image-dependencies-builder (pippin) init container:

Get "https://public.ecr.aws/v2/": dial tcp: lookup public.ecr.aws on 10.43.0.10:53: no such host

pippin runs its own docker-in-docker daemon to build the per-project engine-generic image. That daemon does not inherit the k3d node's containerd registry-mirror or proxy config (it's a separate daemon inside the pod), and it resolves via cluster DNS — so behind a proxy it can neither resolve nor reach public.ecr.aws.

Fix

Capture HTTP_PROXY / HTTPS_PROXY / NO_PROXY from the installer's own environment and inject them into the engine job template's init container, so pippin's dind egresses through the corporate proxy (which also resolves the external name, fixing the DNS error too).

  • engine-job-template-cm.yaml: render HTTP(S)_PROXY / NO_PROXY (upper + lower case) on the init container, only when set.
  • engine/values.yaml: new http_proxy / https_proxy / no_proxy (default "" → no-op).
  • pkg/helm/utils.go: ProxyEnv on ServerHelmValuesParams, wired into the tensorleap-engine values.
  • pkg/server/installation_params.go: GetEngineProxyEnv() captures proxy from env (uppercase-first, lowercase fallback; nil when unset) and augments NO_PROXY with in-cluster targets (tensorleap-registry, tensorleap-minio, localhost, .svc, .cluster.local, 10.42/10.43 CIDRs) so the local Zot push, MinIO and DNS bypass the proxy.

Why it's safe with node-server

node-server loads this template and mutates the init-container env via setEnvParams, which merges by name (update-or-append) and never rebuilds/filters the array; it only sets IMAGE_TAG/DEPENDENCY_URL/BASE_IMAGE and appends PIP_INDEX_URL/PIP_EXTRA_INDEX_URL. None collide with the proxy keys, so the chart-declared proxy vars are preserved.

Validation

  • go build ./..., go vet, gofmt, go test ./pkg/server ./pkg/helm — pass (golden updated; new TestGetEngineProxyEnv).
  • make build-helm; helm template — default render has 0 proxy env lines; with proxy values set the env block renders on the init container. make validate-images — 20 images valid.
  • Patch version bumps (tensorleap 1.6.28→1.6.29, tensorleap-engine 1.0.610→1.0.611) — no cluster reinstall, applies on upgrade.

Rollout

Operator upgrades with the container-reachable proxy exported (e.g. cosmos-vip…:3128, not localhost:911); new PUSH jobs then pull engine-generic through the proxy.

🤖 Generated with Claude Code

@asafyehezkel asafyehezkel enabled auto-merge (squash) June 25, 2026 16:39
The pippin image-dependencies-builder init container runs its own dind
daemon, which does not inherit the k3d node's containerd proxy/mirror
config. Behind a corporate proxy this makes it fail to pull the
engine-generic base image from public.ecr.aws (DNS + egress).

Capture HTTP(S)_PROXY/NO_PROXY from the installer's own environment and
inject them into the engine job template's init container, augmenting
NO_PROXY with in-cluster targets (Zot registry, MinIO, cluster CIDRs,
.svc/.cluster.local) so local traffic bypasses the proxy. No-op when no
proxy is configured. node-server's setEnvParams merges by name and never
rebuilds the env list, so these chart-declared vars are preserved.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@asafyehezkel asafyehezkel force-pushed the engine-job-proxy-env branch from f9fa5d3 to 85d57fc Compare June 28, 2026 08:07
@asafyehezkel asafyehezkel merged commit 99e94bd into master Jun 28, 2026
1 check passed
@asafyehezkel asafyehezkel deleted the engine-job-proxy-env branch June 28, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants