feat: rewrite operator in Rust, add TCP probes, status conditions (v0.7.0)#317
Merged
feat: rewrite operator in Rust, add TCP probes, status conditions (v0.7.0)#317
Conversation
Replace the Python/Kopf operator with a Rust implementation using kube-rs, following modern patterns from the pgroles operator. Key changes: - kube-rs 3.0 Controller watching NetworkAssertions with owned Jobs - Server-side apply for ConfigMaps and CronJobs; create/delete for Jobs - Structured JSON logging via tracing, optional OTLP metrics export - Health endpoints at /livez and /readyz (with /healthz compat) - Distroless container image (chainguard/static) replacing python:3.12 - Helm chart updated: env-based config, readiness probe, tighter RBAC - 30 unit tests covering CRD deserialization, rule transforms, result summarization, template overrides, and observability
Deploying netchecks-docs with
|
| Latest commit: |
772bc15
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://759beffa.netchecks-docs.pages.dev |
| Branch Preview URL: | https://feat-rust-operator.netchecks-docs.pages.dev |
Pull Request Test Coverage Report for Build 22828485828Details
💛 - Coveralls |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b6b2e4920d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Update check_versions.py to read from operator/Cargo.toml - Replace Poetry setup with pip install pytest in CI k8s job - Add operator_tests CI job (cargo fmt, clippy, test) - Bump docker/build-push-action to v4 for operator image - Align operator version to 0.6.0 to match CLI - Update AGENTS.md, operator/README.md, and releasing docs to reference Cargo.toml and Rust tooling
- Add patch verb to CronJobs RBAC (required for server-side apply) - Delete stale CronJob when schedule is removed from a NetworkAssertion - Preserve ConfigMap/Secret volume source options (items, optional, defaultMode) when constructing context volumes
- Replace .expect() with proper error returns in build_job/build_cron_job - Add tests for build_job, build_cron_job, build_job_spec - Add tests for context volume mounting with optional fields - Add tests for disable_redaction flag, node selector overrides - Add tests for OperatorConfig::from_env with overrides and fallbacks - Add edge case tests: empty/missing assertions, multiple assertions - Total: 41 unit tests (up from 30)
BuildKit cache mounts for cargo registry were shared across amd64 and arm64 builds, causing "File exists (os error 17)" on .cargo-ok marker files. Use TARGETPLATFORM in cache IDs for isolation.
The PolicyReport v1alpha2 CRD schema doesn't include scope.apiGroup, causing server-side apply to fail with 500. Removed the scope object from the report data and extended the fallback to catch both 422 and 500 error codes.
Integration tests check 'fail' not in summary dict (key membership), so summary must only include non-zero counts. This matches the behavior of the original Python operator.
The file imported from the deleted Python operator package (netchecks_operator.config). Config is now tested via Rust unit tests in context.rs.
Write reconciliation status back to the NetworkAssertion resource as standard Kubernetes conditions (Reconciled=True/False/Unknown), making errors and probe results visible via kubectl describe/get. - Add StatusCondition type and expand NetworkAssertionStatus with conditions and summary fields - Wrap reconcile with status update logic (best-effort on both success and failure paths) - Add .owns(cronjobs) to controller for event-driven CronJob processing; relax scheduled requeue from 60s to 300s safety net - Add subresources.status and printer columns to CRD YAML - Add 8 new unit tests (49 total)
Bump version across pyproject.toml, operator/Cargo.toml, and Chart.yaml. Upgrade Python dev dependencies: ruff 0.3.3->0.15.5, pytest-cov 4.1->6.3, coveralls 3.3.1->4.1.0, pydantic 2.12.1->2.12.5 (fixes yanked version). Add CHANGELOG.md covering all changes since 0.6.0.
…runners Replace QEMU-emulated arm64 builds with native ARM64 runners (ubuntu-24.04-arm, free for public repos). Each platform builds natively in parallel, then manifests are merged. Add cargo-chef to the operator Dockerfile to cache dependency compilation as a separate Docker layer. Source-only changes skip the ~90s dependency compilation entirely. Add GHA layer cache (cache-from/cache-to type=gha) to both probe and operator Docker builds for cross-run cache persistence in CI. Other CI improvements: - Bump docker actions to latest (setup-qemu v3, setup-buildx v3, build-push-action v6) - Replace manual actions/cache with Swatinem/rust-cache@v2 for operator unit tests - Fix integration test job to depend on operator_docker_merge
- Replace black badge with ruff badge - Mention TCP checks and Rust operator in description - Update development/releasing section with all version files - Fix "analyse" -> "analyze" (typos CI check)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tcpcheck for raw socket connectivity testing (netcheck tcp --host <host> --port <port>)kubectl get nas(Ready/Reason columns) andkubectl describe.owns(cronjobs)Breaking Changes
kubectl apply -f crds/networkassertions.yaml) for status subresourceBug Fixes
Test plan
cargo test)uv run pytest tests)cargo fmt --checkandcargo clippycleanruff checkcleancheck_versions.py)kubectl get nasshows Ready/Reason columnskubectl describe nas <name>shows conditions