Skip to content

feat: rewrite operator in Rust, add TCP probes, status conditions (v0.7.0)#317

Merged
hardbyte merged 12 commits intomainfrom
feat/rust-operator
Mar 8, 2026
Merged

feat: rewrite operator in Rust, add TCP probes, status conditions (v0.7.0)#317
hardbyte merged 12 commits intomainfrom
feat/rust-operator

Conversation

@hardbyte
Copy link
Owner

@hardbyte hardbyte commented Mar 8, 2026

Summary

  • Operator rewritten from Python/Kopf to Rust/kube-rs — ~10x lower memory, faster startup, compile-time type safety, distroless container image
  • TCP probe type — new tcp check for raw socket connectivity testing (netcheck tcp --host <host> --port <port>)
  • Status conditions on NetworkAssertion — reconciliation status visible via kubectl get nas (Ready/Reason columns) and kubectl describe
  • Event-driven CronJob processing — controller watches CronJob changes via .owns(cronjobs)
  • Structured JSON logging with optional OTLP metrics export
  • Python dependency upgrades — fixes yanked pydantic, updates ruff 0.3→0.15, pytest-cov, coveralls

Breaking Changes

  • CRD must be re-applied on existing clusters (kubectl apply -f crds/networkassertions.yaml) for status subresource
  • Operator image is now Rust-based (image name unchanged)
  • Removed Kopf peering CRDs

Bug Fixes

  • PolicyReport server-side apply compatibility with v1alpha2 schema
  • PolicyReport summary omits zero-valued counts
  • Multi-platform Docker build cache isolation (arm64/amd64)

Test plan

  • 49 Rust operator unit tests pass (cargo test)
  • 51 Python CLI tests pass (uv run pytest tests)
  • cargo fmt --check and cargo clippy clean
  • ruff check clean
  • Version check passes (check_versions.py)
  • CI pipeline (unit tests, Docker builds, K8s integration tests)
  • Manual verification: kubectl get nas shows Ready/Reason columns
  • Manual verification: kubectl describe nas <name> shows conditions

Replace the Python/Kopf operator with a Rust implementation using kube-rs,
following modern patterns from the pgroles operator.

Key changes:
- kube-rs 3.0 Controller watching NetworkAssertions with owned Jobs
- Server-side apply for ConfigMaps and CronJobs; create/delete for Jobs
- Structured JSON logging via tracing, optional OTLP metrics export
- Health endpoints at /livez and /readyz (with /healthz compat)
- Distroless container image (chainguard/static) replacing python:3.12
- Helm chart updated: env-based config, readiness probe, tighter RBAC
- 30 unit tests covering CRD deserialization, rule transforms,
  result summarization, template overrides, and observability
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 8, 2026

Deploying netchecks-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 772bc15
Status: ✅  Deploy successful!
Preview URL: https://759beffa.netchecks-docs.pages.dev
Branch Preview URL: https://feat-rust-operator.netchecks-docs.pages.dev

View logs

@coveralls
Copy link

coveralls commented Mar 8, 2026

Pull Request Test Coverage Report for Build 22828485828

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 91.26%

Totals Coverage Status
Change from base Build 22812094063: 0.0%
Covered Lines: 355
Relevant Lines: 389

💛 - Coveralls

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6b2e4920d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

hardbyte added 9 commits March 8, 2026 15:57
- Update check_versions.py to read from operator/Cargo.toml
- Replace Poetry setup with pip install pytest in CI k8s job
- Add operator_tests CI job (cargo fmt, clippy, test)
- Bump docker/build-push-action to v4 for operator image
- Align operator version to 0.6.0 to match CLI
- Update AGENTS.md, operator/README.md, and releasing docs
  to reference Cargo.toml and Rust tooling
- Add patch verb to CronJobs RBAC (required for server-side apply)
- Delete stale CronJob when schedule is removed from a NetworkAssertion
- Preserve ConfigMap/Secret volume source options (items, optional,
  defaultMode) when constructing context volumes
- Replace .expect() with proper error returns in build_job/build_cron_job
- Add tests for build_job, build_cron_job, build_job_spec
- Add tests for context volume mounting with optional fields
- Add tests for disable_redaction flag, node selector overrides
- Add tests for OperatorConfig::from_env with overrides and fallbacks
- Add edge case tests: empty/missing assertions, multiple assertions
- Total: 41 unit tests (up from 30)
BuildKit cache mounts for cargo registry were shared across amd64 and
arm64 builds, causing "File exists (os error 17)" on .cargo-ok marker
files. Use TARGETPLATFORM in cache IDs for isolation.
The PolicyReport v1alpha2 CRD schema doesn't include scope.apiGroup,
causing server-side apply to fail with 500. Removed the scope object
from the report data and extended the fallback to catch both 422 and
500 error codes.
Integration tests check 'fail' not in summary dict (key membership),
so summary must only include non-zero counts. This matches the
behavior of the original Python operator.
The file imported from the deleted Python operator package
(netchecks_operator.config). Config is now tested via Rust unit
tests in context.rs.
Write reconciliation status back to the NetworkAssertion resource as
standard Kubernetes conditions (Reconciled=True/False/Unknown), making
errors and probe results visible via kubectl describe/get.

- Add StatusCondition type and expand NetworkAssertionStatus with
  conditions and summary fields
- Wrap reconcile with status update logic (best-effort on both
  success and failure paths)
- Add .owns(cronjobs) to controller for event-driven CronJob
  processing; relax scheduled requeue from 60s to 300s safety net
- Add subresources.status and printer columns to CRD YAML
- Add 8 new unit tests (49 total)
Bump version across pyproject.toml, operator/Cargo.toml, and Chart.yaml.
Upgrade Python dev dependencies: ruff 0.3.3->0.15.5, pytest-cov 4.1->6.3,
coveralls 3.3.1->4.1.0, pydantic 2.12.1->2.12.5 (fixes yanked version).
Add CHANGELOG.md covering all changes since 0.6.0.
@hardbyte hardbyte changed the title feat: rewrite operator in Rust with kube-rs feat: rewrite operator in Rust, add TCP probes, status conditions (v0.7.0) Mar 8, 2026
hardbyte added 2 commits March 9, 2026 08:44
…runners

Replace QEMU-emulated arm64 builds with native ARM64 runners
(ubuntu-24.04-arm, free for public repos). Each platform builds
natively in parallel, then manifests are merged.

Add cargo-chef to the operator Dockerfile to cache dependency
compilation as a separate Docker layer. Source-only changes skip
the ~90s dependency compilation entirely.

Add GHA layer cache (cache-from/cache-to type=gha) to both probe
and operator Docker builds for cross-run cache persistence in CI.

Other CI improvements:
- Bump docker actions to latest (setup-qemu v3, setup-buildx v3,
  build-push-action v6)
- Replace manual actions/cache with Swatinem/rust-cache@v2 for
  operator unit tests
- Fix integration test job to depend on operator_docker_merge
- Replace black badge with ruff badge
- Mention TCP checks and Rust operator in description
- Update development/releasing section with all version files
- Fix "analyse" -> "analyze" (typos CI check)
@hardbyte hardbyte merged commit 910aec0 into main Mar 8, 2026
22 checks passed
@hardbyte hardbyte deleted the feat/rust-operator branch March 8, 2026 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants