Skip to content

ENH: Mirror dockcross images to GHCR for reliable Python wheel builds#124

Merged
hjmjohnson merged 3 commits intomainfrom
ghcr-mirror-dockcross-images
Apr 2, 2026
Merged

ENH: Mirror dockcross images to GHCR for reliable Python wheel builds#124
hjmjohnson merged 3 commits intomainfrom
ghcr-mirror-dockcross-images

Conversation

@hjmjohnson
Copy link
Copy Markdown
Member

@hjmjohnson hjmjohnson commented Apr 2, 2026

Summary

  • Add a scheduled workflow (mirror-dockcross-images.yml) that mirrors all pinned dockcross/manylinux container images to ghcr.io/insightsoftwareconsortium/
  • Add pre-pull steps with retry logic to the Linux x64 and ARM Python wheel build jobs that dynamically resolve the correct image tag and try the GHCR mirror first

Problem

Python wheel builds across ~55 remote modules intermittently fail due to transient Docker image pull errors:

  • Error: writing blob: storing blob to file "...": happened during read: unexpected EOF
  • Error fetching blob: invalid status code from registry 504 (Gateway Timeout)
  • Docker Hub rate limiting (401 Unauthorized / 429 Too Many Requests)

These failures are caused by pulling ~1-2 GB dockcross images from Docker Hub/Quay.io on every CI run using GitHub Actions shared runners.

How it works

Dynamic image tag resolution

The pre-pull steps fetch dockcross-manylinux-set-vars.sh from the same ITKPythonPackage version the build will use (via the itk-python-package-tag input). This ensures the pre-pull caches exactly the image the build script needs, regardless of which ITKPythonPackage version the module targets.

Pre-pull with GHCR mirror + fallback

  1. Resolve IMAGE_TAG and CONTAINER_SOURCE from the ITKPythonPackage build scripts
  2. Try pulling from the GHCR mirror (ghcr.io/insightsoftwareconsortium/dockcross-*) — same network as GitHub Actions, fast + reliable
  3. Fall back to the original Docker Hub / Quay.io source with retry
  4. Tag the pulled image with the original name so downstream scripts find it cached
  5. Gracefully warn if all attempts fail (build script will retry on its own)

Edge cases

  • Custom itk-python-package-tag refs (e.g., release-5.4): works — set-vars.sh exists on that branch with the correct tags
  • Experimental branches (e.g., python_based_build_scripts): graceful skip — no set-vars.sh on that branch, pre-pull exits cleanly
  • CXX-only modules: not affected (no Python build jobs)

Images mirrored (6 total, covering v5.4.x, v6.0b01, v6.0b02/main)

Source GHCR Mirror Used by
docker.io/dockcross/manylinux_2_28-x64:20240304-9e57d2b ghcr.io/.../dockcross-manylinux_2_28-x64:20240304-9e57d2b v5.4.0–v5.4.5
docker.io/dockcross/manylinux_2_28-x64:20250913-6ea98ba ghcr.io/.../dockcross-manylinux_2_28-x64:20250913-6ea98ba v6.0b01
docker.io/dockcross/manylinux_2_28-x64:20260203-3dfb3ff ghcr.io/.../dockcross-manylinux_2_28-x64:20260203-3dfb3ff v6.0b02, main
docker.io/dockcross/manylinux2014-x64:20240304-9e57d2b ghcr.io/.../dockcross-manylinux2014-x64:20240304-9e57d2b all versions
quay.io/pypa/manylinux_2_28_aarch64:2024-03-25-9206bd9 ghcr.io/.../dockcross-manylinux_2_28-aarch64:2024-03-25-9206bd9 v5.4.0–v5.4.5
quay.io/pypa/manylinux_2_28_aarch64:2025.08.12-1 ghcr.io/.../dockcross-manylinux_2_28-aarch64:2025.08.12-1 v6.0b01, v6.0b02, main

Testing

Tested across 9 remote modules by pointing their python-build-workflow at @ghcr-mirror-dockcross-images. See test results comment.

Pre-pull fallback path verified (GHCR miss → Docker Hub/Quay.io):

  • SkullStrip, SimpleITKFilters, TubeTK, IsotropicWavelets: all Python builds pass
  • PrincipalComponentsAnalysis: 1 ARM transient EOF on the un-cached image (exactly the failure this PR prevents once GHCR is populated)

GHCR pull path verified (5 of 6 images already populated):

  • 20240304-9e57d2b (x64), 20260203-3dfb3ff (x64), 20240304-9e57d2b (2014), 2024-03-25-9206bd9 (aarch64), 2025.08.12-1 (aarch64) — all in GHCR
  • 20250913-6ea98ba (x64, v6.0b01) — pending push

Post-merge steps

  • Trigger the Mirror dockcross images to GHCR workflow to populate any missing images
  • Clean up 9 WIP test PRs across remote modules
  • Tag a new release if needed for modules pinned to version tags

🤖 Generated with Claude Code

Python wheel CI builds pull large dockcross/manylinux container images
(~1-2 GB) from Docker Hub and Quay.io on every run. These pulls
intermittently fail with "unexpected EOF" or Docker Hub rate limiting
(401/429), causing wheel builds to fail across all remote modules.

Add a scheduled workflow that mirrors the pinned dockcross images to
GHCR (ghcr.io/insightsoftwareconsortium/), which has much better
connectivity from GitHub Actions runners (same network).

Add pre-pull steps to the Linux x64 and ARM build jobs that:
1. Try pulling from the GHCR mirror first
2. Fall back to the original Docker Hub / Quay.io source
3. Retry up to 3 times with backoff
4. Tag the GHCR image with the original name so the downstream
   ITKPythonPackage build scripts find it already cached

This eliminates transient Docker image pull failures that have been
causing sporadic Python wheel build failures across ~55 remote modules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hjmjohnson hjmjohnson requested a review from thewtex April 2, 2026 02:00
hjmjohnson added a commit to InsightSoftwareConsortium/ITKSkullStrip that referenced this pull request Apr 2, 2026
Temporarily point python-build-workflow at the
ghcr-mirror-dockcross-images branch of
ITKRemoteModuleBuildTestPackageAction to test the
pre-pull step with GHCR fallback logic.

See: InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction#124
hjmjohnson added a commit to InsightSoftwareConsortium/ITKPrincipalComponentsAnalysis that referenced this pull request Apr 2, 2026
Temporarily point python-build-workflow at the
ghcr-mirror-dockcross-images branch of
ITKRemoteModuleBuildTestPackageAction to test the
pre-pull step with GHCR fallback logic.

See: InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction#124
hjmjohnson added a commit to InsightSoftwareConsortium/ITKTubeTK that referenced this pull request Apr 2, 2026
Temporarily point python-build-workflow at the
ghcr-mirror-dockcross-images branch of
ITKRemoteModuleBuildTestPackageAction to test the
pre-pull step with GHCR fallback logic.

See: InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction#124
hjmjohnson added a commit to InsightSoftwareConsortium/ITKIsotropicWavelets that referenced this pull request Apr 2, 2026
Temporarily point python-build-workflow at the
ghcr-mirror-dockcross-images branch of
ITKRemoteModuleBuildTestPackageAction to test the
pre-pull step with GHCR fallback logic.

See: InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction#124
hjmjohnson added a commit to InsightSoftwareConsortium/ITKTotalVariation that referenced this pull request Apr 2, 2026
Temporarily point python-build-workflow at the
ghcr-mirror-dockcross-images branch of
ITKRemoteModuleBuildTestPackageAction to test the
pre-pull step with GHCR fallback logic.

See: InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction#124
hjmjohnson added a commit to InsightSoftwareConsortium/ITKFixedPointInverseDisplacementField that referenced this pull request Apr 2, 2026
Temporarily point python-build-workflow at the
ghcr-mirror-dockcross-images branch of
ITKRemoteModuleBuildTestPackageAction to test the
pre-pull step with GHCR fallback logic.

See: InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction#124
hjmjohnson added a commit to InsightSoftwareConsortium/LesionSizingToolkit that referenced this pull request Apr 2, 2026
Temporarily point python-build-workflow at the
ghcr-mirror-dockcross-images branch of
ITKRemoteModuleBuildTestPackageAction to test the
pre-pull step with GHCR fallback logic.

See: InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction#124
hjmjohnson added a commit to InsightSoftwareConsortium/ITKCuberille that referenced this pull request Apr 2, 2026
Temporarily point python-build-workflow at the
ghcr-mirror-dockcross-images branch of
ITKRemoteModuleBuildTestPackageAction to test the
pre-pull step with GHCR fallback logic.

See: InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction#124
hjmjohnson added a commit to InsightSoftwareConsortium/ITKSimpleITKFilters that referenced this pull request Apr 2, 2026
Temporarily point python-build-workflow at the
ghcr-mirror-dockcross-images branch of
ITKRemoteModuleBuildTestPackageAction to test the
pre-pull step with GHCR fallback logic.

See: InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction#124
@hjmjohnson
Copy link
Copy Markdown
Member Author

Testing status across 9 remote modules

Nine remote module PRs have been pointed at @ghcr-mirror-dockcross-images to test the pre-pull + fallback logic. The GHCR images are not yet populated (requires write:packages scope or merging this PR first and running the mirror workflow), so all runs are exercising the fallback path.

Pre-pull fallback results (GHCR miss → Docker Hub/Quay.io)

From SkullStrip run 23880135873 (first test) and run 23880137293:

Job Pre-pull Pull time Build
build-linux-py (10, _2_28-x64) GHCR miss → Docker Hub ~37s
build-linux-py (11, _2_28-x64) GHCR miss → Docker Hub ~37s
build-linux-arm-py (10) GHCR miss → Quay.io ~36s
build-linux-arm-py (11) GHCR miss → Quay.io ~36s

The pre-pull step correctly:

  1. Tries the GHCR mirror first
  2. Falls back to Docker Hub / Quay.io on failure
  3. Tags the image with the original name so downstream scripts find it cached
  4. Does not block the build if all pull attempts fail (graceful warning)

Module test PRs

Module PR Type Python CI
SkullStrip InsightSoftwareConsortium/ITKSkullStrip#29 Existing + WIP commit ✅ All pass
SimpleITKFilters InsightSoftwareConsortium/ITKSimpleITKFilters#30 New WIP PR ⌛ Running
TubeTK InsightSoftwareConsortium/ITKTubeTK#187 Existing + WIP commit ⌛ Running
IsotropicWavelets InsightSoftwareConsortium/ITKIsotropicWavelets#166 Existing + WIP commit ⌛ Running
PrincipalComponentsAnalysis InsightSoftwareConsortium/ITKPrincipalComponentsAnalysis#39 Existing + WIP commit ❌ 1 ARM fail (transient EOF — the exact issue this PR mitigates)
FixedPointInverseDisplacementField InsightSoftwareConsortium/ITKFixedPointInverseDisplacementField#34 New WIP PR ❌ Pre-existing Windows C++ issue
LesionSizingToolkit InsightSoftwareConsortium/LesionSizingToolkit#54 New WIP PR ❌ Pre-existing linker errors
Cuberille InsightSoftwareConsortium/ITKCuberille#94 New WIP PR ❌ Pre-existing build script path issue
TotalVariation InsightSoftwareConsortium/ITKTotalVariation#56 Existing + WIP commit ❌ Pre-existing C++17 issue (v6.0b01 beta)

Expected benefits once GHCR images are populated

  1. Reliability: GHCR is on the same network as GitHub Actions runners — no cross-network Docker Hub pulls that fail with unexpected EOF or 504 Gateway Timeout
  2. Rate limits: GHCR pulls using GITHUB_TOKEN are not subject to Docker Hub's anonymous rate limits (100 pulls/6hrs per shared IP)
  3. Speed: Same-network pulls should be significantly faster than Docker Hub (~2-5s vs ~37s for a ~1.5GB image)
  4. Graceful degradation: If GHCR is unavailable, falls back to Docker Hub with retry — no worse than today

Next steps

  • Populate GHCR images (merge this PR → trigger mirror workflow, or manually push with write:packages token)
  • Re-run a test PR to measure GHCR pull speed vs Docker Hub baseline
  • Clean up WIP commits from the 9 test PRs

@hjmjohnson
Copy link
Copy Markdown
Member Author

@thewtex I'm wondering if this improvement should be considered for a new v5.4.5 tag?

hjmjohnson and others added 2 commits April 2, 2026 07:42
The pre-pull step was hardcoding v5.4.x image tags, but modules using
v6.0b02 defaults pull different images (20260203-3dfb3ff for x64,
2025.08.12-1 for aarch64). The pre-pull cached the wrong image and
the actual build script still hit Docker Hub without caching.

Fetch dockcross-manylinux-set-vars.sh from the same ITKPythonPackage
tag the build will use, then source it to get the correct IMAGE_TAG
and CONTAINER_SOURCE. This ensures the pre-pull step always caches
exactly the image the build script will need.

Also add v6.0b02 image tags to the GHCR mirror workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add docker.io/dockcross/manylinux_2_28-x64:20250913-6ea98ba used by
ITKPythonPackage v6.0b01 (currently only ITKTotalVariation).

Complete image tag coverage across all ITKPythonPackage versions:
- v5.4.0-v5.4.5: 20240304-9e57d2b (x64), 2024-03-25-9206bd9 (aarch64)
- v6.0b01: 20250913-6ea98ba (x64), 2025.08.12-1 (aarch64)
- v6.0b02/main: 20260203-3dfb3ff (x64), 2025.08.12-1 (aarch64)
- manylinux2014-x64: 20240304-9e57d2b (all versions)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hjmjohnson
Copy link
Copy Markdown
Member Author

@dzenanz You might be interested in this work. It helps with remote module CI stability. How we deal with tags will be an issue. we might want a new tag v5.4.5.post01 or similar to use to take advantage of this.

@hjmjohnson hjmjohnson requested review from dzenanz April 2, 2026 13:00
@dzenanz
Copy link
Copy Markdown
Member

dzenanz commented Apr 2, 2026

Is there any benefit of v5.4.5.post01 over v5.4.6?

Copy link
Copy Markdown
Member

@dzenanz dzenanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thewtex
Copy link
Copy Markdown
Member

thewtex commented Apr 2, 2026

@hjmjohnson awesome!!

@thewtex I'm wondering if this improvement should be considered for a new v5.4.5 tag?

We could try it for v5.4.6.

The packages I see here are Private?

https://github.com/orgs/InsightSoftwareConsortium/packages

GHCR has a limit of 500 MB on for private images. I assume we will hit this fairly quick. Can they be made public?

We could use this. Also, for awareness, there is a GitHub Action dedicated to help with this that does retry with exponential backoff, used in ITK-Wasm:

https://github.com/InsightSoftwareConsortium/ITK-Wasm/blob/88362cc534b1027305f26876813df0da67ce70b1/.github/workflows/examples.yml#L26-L30

@hjmjohnson
Copy link
Copy Markdown
Member Author

@thewtex I don't have permission to make the items public.

Would you make the tag? I don't understand how tagging works across the various ITK related repos. We don't have a v5.4.6 tag in ITK, but one may be planned.

@hjmjohnson hjmjohnson merged commit 41f55c9 into main Apr 2, 2026
1 check passed
@dzenanz
Copy link
Copy Markdown
Member

dzenanz commented Apr 2, 2026

I do see an option to make it public, but it seems to be disabled by some setting somewhere else:
image

@thewtex
Copy link
Copy Markdown
Member

thewtex commented Apr 2, 2026

@hjmjohnson I'll make a call for additional patches for v5.4.6 next week, then tag later in the week. The tag MUST be immutable. Do not create a tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants