Conversation
Reviewer's GuideUpdates the install.sh script to support non-root installations without jq by using POSIX tools and dynamic install directories, enhances macOS support with Homebrew, and adds live end‑to‑end CLI lifecycle tests plus a GitHub Actions integration-test workflow that runs these tests as both root and non-root using a private API key when available. Sequence diagram for updated runpodctl installation flow (root vs non-root, Homebrew and binary)sequenceDiagram
actor User
participant Shell as install_sh
participant OS
participant Homebrew
participant GitHubAPI
participant FS as FileSystem
User->>Shell: Run install.sh
Shell->>OS: uname -s
alt macOS
Shell->>Homebrew: try_brew_install (brew install runpod/runpodctl/runpodctl)
alt Homebrew install succeeds
Homebrew-->>Shell: runpodctl installed
Shell-->>User: Exit 0
else Homebrew install fails
Shell-->>User: Log fallback to binary
Shell->>Shell: detect_install_dir
end
else non-macOS
Shell->>Shell: detect_install_dir
end
note over Shell,FS: detect_install_dir chooses INSTALL_DIR based on root/non-root and PATH
Shell->>Shell: check_root
alt non-root
Shell-->>User: Note: Installing to user-space
else root
Shell-->>User: Proceeding with system-wide install
end
Shell->>Shell: check_system_requirements
Shell->>OS: Check commands wget, tar, grep, sed
alt any command missing
Shell-->>User: Error: Missing required commands
Shell-->>User: Exit 1
else all commands present
Shell-->>User: Requirements satisfied
end
Shell->>GitHubAPI: GET /repos/runpod/runpodctl/releases/latest
GitHubAPI-->>Shell: JSON with tag_name
Shell->>Shell: fetch_latest_version via grep/sed
Shell->>Shell: download_url_constructor
Shell->>Shell: Build DOWNLOAD_URLS based on OS and arch
Shell->>GitHubAPI: wget each URL in DOWNLOAD_URLS
alt any download succeeds
GitHubAPI-->>Shell: runpodctl.tar.gz
Shell->>FS: tar -xzf runpodctl.tar.gz
FS-->>Shell: runpodctl binary
Shell->>FS: rm runpodctl.tar.gz
Shell->>FS: chmod +x runpodctl
Shell->>FS: Check INSTALL_DIR writable
alt INSTALL_DIR writable
Shell->>FS: mv runpodctl to INSTALL_DIR
Shell-->>User: runpodctl installed successfully
else INSTALL_DIR not writable
Shell->>FS: rm runpodctl
Shell-->>User: Error: Install directory not writable
end
else all downloads fail
Shell-->>User: Failed to download runpodctl from any URLs
end
File-Level Changes
|
|
Thanks for the gracious and welcoming response to my unsolicited PR. I probably should've opened an issue straightaway to discuss what you wanted, but I didn't want to burden you with what I wanted.
Whoops! A little accidental cross-polination. I was trying to juggle the pending #235 and also make sure the CI tasks supported both root and non-root workflows and evidently got a little bit of wobble in my wibble. Since you seem amenable to both features, I have opted to combine them into a single PR. It's much easier for me and I hope not much harder for you.
Excellent catch, thanks. I am not terribly experienced at using serverless features and had some trouble balancing the need to spin up a tiny API for testing as fast as possible in root/non-root configurations - ideally in parallel - while also staying under the account cap of five workers. I think the aggressive delete on bail was probably a Hail Mary to prevent CI failures due to being capped on workers despite using a tiny custom image for the tests, limiting each instance to 1 max worker, etc. It may be necessary for a relaxed cap grant for whatever you settle on as the CI account or possibly staggered runs / delays built into the actions, but that kind of tuning depends on preferences that aren't really mine to establish as policy.
Fair request and addressed. Cheers! |
TimPietruskyRunPod
left a comment
There was a problem hiding this comment.
thanks for combining #240 and #241 into a clean PR — the non-root install support and e2e test coverage are both welcome additions. a few things need to be addressed before we can merge.
must fix
runCLInaming conflict: bothcli_test.goandcli_lifecycle_test.godefinerunCLIin packagee2ewith different signatures — runninggo test -tags e2e ./e2e/will fail to compile- dead code:
slsImage/RUNPOD_TEST_SERVERLESS_IMAGEare wired up in CI and the test but never consumed - go version mismatch: workflow uses 1.22, existing ci.yml and release.yml use 1.25.7
should fix
- hardcoded template id
wvrr20un0lis fragile — should be configurable or documented cancel-in-progress: falsewith real-money e2e tests is costly- third-party docker image in CI should be under
runpodorg
minor
- box drawing alignment,
defervst.Cleanup, fragile json parsing
details in inline comments.
e2e/cli_lifecycle_test.go
Outdated
| } | ||
|
|
||
| // HELPER: execute the runpodctl binary | ||
| func runCLI(args ...string) (string, error) { |
There was a problem hiding this comment.
bug (compile conflict): the existing e2e/cli_test.go already defines runCLI in package e2e with signature (string, string, error). this file defines it as (string, error). both files share the e2e build tag.
the CI works only because it targets the specific file (./e2e/cli_lifecycle_test.go), but running go test -tags e2e ./e2e/ will fail to compile. this is fragile and will break as we add more e2e tests.
suggestion: rename this to something like runCLICombined or refactor both files to share a common helper in a helpers_test.go.
| t.Skip("RUNPOD_API_KEY is not set, skipping integration test") | ||
| } | ||
|
|
||
| slsImage := getEnvOrDefault("RUNPOD_TEST_SERVERLESS_IMAGE", defaultServerlessImage) |
There was a problem hiding this comment.
bug (dead code): slsImage is set here from RUNPOD_TEST_SERVERLESS_IMAGE but never actually used — the serverless create on line 315 uses --template-id instead. the RUNPOD_TEST_SERVERLESS_IMAGE env var is also wired up in the workflow (line 58) and the serverless_image workflow input (line 14) but goes nowhere.
either use the image in the test or remove the dead variable, env var, and workflow input.
e2e/cli_lifecycle_test.go
Outdated
| // For Serverless, current CLI requires a template-id. | ||
| // The user mentioned bwf8egptou/wvrr20un0l as their previous templates. | ||
| // We will use wvrr20un0l as a default if none provided. | ||
| slsTemplate := getEnvOrDefault("RUNPOD_TEST_SERVERLESS_TEMPLATE_ID", "wvrr20un0l") |
There was a problem hiding this comment.
concern (hardcoded dependency): the default template id wvrr20un0l is tied to a specific account. if this template is deleted or the CI api key changes to a different account, the test breaks silently.
consider either:
- creating the template as part of the test setup (and cleaning it up)
- making this a required env var with a clear skip message
- documenting which account/template this belongs to
e2e/cli_lifecycle_test.go
Outdated
| t.Logf("Created Pod ID: %s", podID) | ||
|
|
||
| // Defer cleanup to run even if test fails | ||
| defer func() { |
There was a problem hiding this comment.
nit: project convention (CLAUDE.md) says "always clean up resources with t.Cleanup". while defer achieves the same goal here, t.Cleanup is the idiomatic Go testing approach and runs even if the test calls t.FailNow() from a different goroutine. same applies to the serverless cleanup on line 334.
e2e/cli_lifecycle_test.go
Outdated
|
|
||
| // Sanitize the command echo to hide keys in arguments if any | ||
| cmdStr := fmt.Sprintf("%s %s", binaryPath, strings.Join(args, " ")) | ||
| fmt.Printf("DEBUG: Executing: %s\n", redactSensitive(cmdStr)) |
There was a problem hiding this comment.
nit: fmt.Printf("DEBUG: ...") will produce noisy output in CI logs for every CLI invocation. consider using t.Logf instead (requires passing *testing.T to runCLI), or gating this behind a verbose flag.
|
|
||
| concurrency: | ||
| group: integration-tests-${{ github.ref_name }} | ||
| cancel-in-progress: false |
There was a problem hiding this comment.
should fix: cancel-in-progress: false means multiple integration test runs can pile up when PRs are updated quickly. since these tests create real pods/endpoints that cost money, cancel-in-progress: true would be safer — it cancels the previous run when a new commit is pushed.
| serverless_image: | ||
| description: 'Image to test for Serverless lifecycle' | ||
| required: false | ||
| default: 'fngarvin/ci-minimal-serverless@sha256:6a33a9bac95b8bc871725db9092af2922a7f1e3b63175248b2191b38be4e93a0' |
There was a problem hiding this comment.
concern (supply chain): fngarvin/ci-minimal-serverless is hosted on a personal docker hub account. for CI pipelines that run on every PR to main, this creates a third-party dependency. if this image is compromised or removed, CI breaks or worse.
ideally this should be hosted under the runpod docker hub org (or ghcr.io/runpod). the pinned sha256 digest is a good practice though.
| env: | ||
| RUNPOD_API_KEY: ${{ secrets.RUNPOD_API_KEY }} | ||
| RUNPOD_TEST_POD_IMAGE: ${{ github.event.inputs.pod_image || 'docker.io/library/alpine' }} | ||
| RUNPOD_TEST_SERVERLESS_IMAGE: ${{ github.event.inputs.serverless_image || 'fngarvin/ci-minimal-serverless@sha256:6a33a9bac95b8bc871725db9092af2922a7f1e3b63175248b2191b38be4e93a0' }} |
There was a problem hiding this comment.
bug (unused): RUNPOD_TEST_SERVERLESS_IMAGE is set here and wired through to the test, but the test never uses it — the serverless create uses --template-id instead. either remove this or update the test to actually use the image.
install.sh
Outdated
| # Using grep/sed instead of jq for zero-dependency parsing | ||
| # - Restrict to the first matching tag_name line | ||
| # - Expect the canonical JSON indentation for the field | ||
| VERSION=$(wget -q -O- "$version_url" | grep -m1 '^ "tag_name":' | sed -E 's/^[^"]*"tag_name": "([^"]+)".*/\1/') |
There was a problem hiding this comment.
concern (fragile parsing): this depends on GitHub's API returning exactly 2-space indented JSON (^ "tag_name":). if GitHub ever changes their JSON formatting (minified, different indentation), this silently produces an empty version.
the fallback validation below (line 116) catches this, so it won't silently install a wrong version — but it would be more robust to use a pattern like:
grep -m1 '"tag_name"' | sed -E 's/.*"tag_name"[[:space:]]*:[[:space:]]*"([^"]+)".*/\1/'which doesn't depend on leading whitespace.
install.sh
Outdated
| echo "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓" | ||
| echo "┃ USER-SPACE INSTALLATION DETECTED ┃" | ||
| echo "┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫" | ||
| echo "┃ Target: $INSTALL_DIR ┃" |
There was a problem hiding this comment.
nit (cosmetic): the box border has fixed width but $INSTALL_DIR varies in length, so the right ┃ won't align. for example ~/.local/bin vs /home/longusername/.local/bin would produce different results. consider either padding dynamically or dropping the right border.
1b611c4 to
3d64721
Compare
|
@TimPietruskyRunPod wow, what a code review! Thanks. The continuing use of an existing template was a major gaff. I can see exactly where I lost the plot in making the changes. Same deal with testing the co code as part of the existing package. Not a go coder. Have made a commit attempting to address each point, including making sure the warning box intended for visibility has square edges and the URL format that we made strict upon request after the merge of #235 is now less strict. 🤷♂️ Can't really do anything with the suggestion to push an image to Runpod's Docker, but I did share my script and provide a sha commit to provide a stable test platform. Keep up the great work - I'm a direct beneficiary! C'ya, Maintainer Critique vs. Proposed Action1. Must Fix:
|
TimPietruskyRunPod
left a comment
There was a problem hiding this comment.
thanks for the thorough follow-up — most of our feedback was addressed well. a few things still need attention.
still open from previous review
- go version not updated — still
1.22.x, needs to be1.25.7to matchgo.mod,ci.yml, andrelease.yml. this was listed as "must fix" and the response said it would be changed to1.25.7, but the file still has1.22.x.
new issues
- license conflict — personal
License: MITheaders were added to files, but the project is GPL-3.0. this creates legal ambiguity. - emergency cleanup missing templates — the test now creates templates with
ci-test-tpl-prefix but the emergency cleanup step doesn't sweep them.
resolved (nice work)
runCLI→runE2ECmdwith sharedfindBinaryPath()— clean- dead
slsImageactivated via dynamictemplate createchain — exactly right - hardcoded template id eliminated
cancel-in-progress: true— good- fragile json parsing fixed
- dynamic box drawing — looks good
t.Cleanupfor API resources — correct- debug printf removed
details in inline comments.
| - name: Setup Go | ||
| uses: actions/setup-go@v5 | ||
| with: | ||
| go-version: '1.22.x' |
There was a problem hiding this comment.
must fix (still open): this is still 1.22.x. the project's go.mod specifies go 1.25.7, and both ci.yml and release.yml use go-version: 1.25.7. this needs to match:
go-version: '1.25.7'building and testing against a different go version than what's released can mask compatibility issues.
e2e/cli_lifecycle_test.go
Outdated
|
|
||
| // Author: FNGarvin | ||
| // License: MIT | ||
| // Year: 2026 |
There was a problem hiding this comment.
must fix (license conflict): the project is licensed under GPL-3.0 (see LICENSE file in repo root). adding License: MIT to individual files creates legal ambiguity — it's unclear which license applies to this file.
please either remove the license/author header entirely (no other files in the project have one), or change it to match the project license (GPL-3.0).
| echo "Ensuring safe sweeping of CI resources explicitly prefixed with 'ci-test-'..." | ||
| # Only delete pods named exactly starting with "ci-test-" | ||
| $RP pod list --output json 2>/dev/null | jq -r '.[] | select(.name | startswith("ci-test-")) | .id' | xargs -r -I {} $RP pod delete {} || true | ||
| $RP serverless list --output json 2>/dev/null | jq -r '.[] | select(.name | startswith("ci-test-")) | .id' | xargs -r -I {} $RP serverless delete {} || true |
There was a problem hiding this comment.
should fix: the serverless test now creates templates with ci-test-tpl- prefix, but the emergency cleanup only sweeps pods and serverless endpoints. if the test fails between template creation and endpoint creation (or if t.Cleanup doesn't run), orphaned templates will be left behind.
add a line to clean up templates too:
$RP template list --output json 2>/dev/null | jq -r '.[] | select(.name | startswith("ci-test-")) | .id' | xargs -r -I {} $RP template delete {} || true| RUNPOD_TEST_SERVERLESS_IMAGE: ${{ github.event.inputs.serverless_image || 'fngarvin/ci-minimal-serverless@sha256:6a33a9bac95b8bc871725db9092af2922a7f1e3b63175248b2191b38be4e93a0' }} | ||
| run: | | ||
| if [ "${{ matrix.user_mode }}" == "root" ]; then | ||
| go test -tags e2e -v ./e2e/cli_lifecycle_test.go |
There was a problem hiding this comment.
nit: now that the runCLI naming conflict is resolved, consider changing this to ./e2e/... to verify the full package compiles together. this would catch future naming conflicts early. same applies to line 72.
currently the single-file invocation means the new test is never compiled alongside the existing cli_test.go in CI.
2a39607 to
ec2d06f
Compare
|
Yep, not a Go coder; apologies for the friction there. The boilerplate header was a default environment oversight on my end. I appreciate the thorough review; you were right to be defensive about the template logic. Hygiene should be 100% now, and I hope we're parting friends. Cheers. |
|
@FNGarvin thanks for helping out, we appreciate it very much! |
|
hey — we tested the serverless lifecycle with since the test only exercises CRUD operations and never actually invokes the endpoint (no jobs are sent), the worker image doesn't need a runpod handler — any public image works. would you be okay swapping the default from |
ec2d06f to
66b4194
Compare
|
Hello again,
Brother, I intended my contribution to be a gift and not a point of concern or obligation. It's your prerogative to use whatever image you like.
Looking over my notes, I think my intent was to incorporate this logic into the e2e test. Works beautifully on a warmed up template and the logic translates easily enough to the e2e go test, but amplifies the max worker cap concerns. Even with the teeniest, tiniest image, there's some spin-up delay. If I were Runpod-FNG, I would probably be looking to relax the cap for official CI testing and also possibly ensuring the 200MB image was always cached. Or maybe reverting back to a template dedicated to CI that is always warm. But, like I said when I shared the build for the container I provided, it's outside of my purview to make such changes. So you can play around and see the upsides/downsides, I have now pushed code to run the API tests. It looks like this: And then the second, parallel non-root, attempt looks like this: Same success, but four minutes longer; all the sudden, you're back to potentially running out of workers and having to re-run the CI. Even with complete teardown, provisioning latency and resource contention are potentially an issue. These arethings you can work around, via infra changes or even by throwing a massive timeout at the task, but outside my purview. Hope that helps! Goodbye, |


Fng infra install fix v2
Replaces and combines #240 and #241.
Hot off the presses: freshly rebased for latest status of main as of this writing.
Test plan:
Verify that root and non-root installs still work in a clean Linux environment:
podman run --rm -v $(pwd):/workspace alpine:latest sh -c 'apk add --no-cache bash wget grep sed tar coreutils jq curl && cp /workspace/install.sh /tmp/install.sh && chmod 755 /tmp/install.sh && echo "=== TESTING ROOT INSTALL ===" && cd /tmp && bash /tmp/install.sh && /usr/local/bin/runpodctl version && echo -e "\n=== TESTING NON-ROOT INSTALL ===" && adduser -D tester && su tester -c "cd /home/tester && mkdir -p ~/.local/bin && bash /tmp/install.sh" && su tester -c "~/.local/bin/runpodctl version"''Local test of new e2e test against live Runpod services:
export RUNPOD_API_KEY=XXXYYYZZZ && go build -o runpodctl main.go && RUNPOD_E2E_TEST_CROC=1 go test -tags e2e -v ./e2e/cli_lifecycle_test.gofrom a shell.CI testing:
Demonstrating newly parameterized actions via manual trigger:




What the new actions look like, testing root and non-root workflows in parallel:
Full, go-based e2e test suite against live Runpod services to complement the existing tests:
Addendum:
I didn't get into the details of which container images I'm testing against because it would feel like trying to tell a baker which flour they ought to be using. But in the setup as presented, pod tests are being run against a basic and tiny Alpine image that simply idles in a sleep. The serverless tests are being conducted against an almost-as-simple alpine-python container:
Simple as. I have updated the CI to use a fixed commit hash so you can test it with a bit more confidence if you wish. No big deal either way. I like this option because it is only about 80MB and the smallest Runpod base that includes the API is closer to 3.5GB, AFAICT.
Cheers!