Skip to content

Merge master into nix-next#1717

Open
github-actions[bot] wants to merge 142 commits into
nix-nextfrom
master
Open

Merge master into nix-next#1717
github-actions[bot] wants to merge 142 commits into
nix-nextfrom
master

Conversation

@github-actions

@github-actions github-actions Bot commented May 6, 2026

Copy link
Copy Markdown

Automated PR to keep nix-next in sync with master.

Ericson2314 and others added 30 commits May 6, 2026 15:48
Update `harmonia`; deduplicate various cargo output hashes
Prepare for reuse by pulling process-management and polling logic out
of `QueueRunnerBuildOne` into standalone modules:

- `ProcessGroup.pm`: reusable labelled process group with `spawn`,
  `pump_logs`, and ordered teardown.
- `wait_for_builds` in `QueueRunnerContext`: polling loop that checks
  the queue runner REST API until all build IDs are inactive.
- `start_queue_runner` and `start_nix_daemon` in `QueueRunnerContext`:
  extracted from `QueueRunnerBuildOne` so future test helpers can start
  queue infrastructure without duplicating the setup.

`QueueRunnerBuildOne` is rewritten to use these shared helpers. The
queue-runner harness is still registered manually because
`start_queue_runner` returns pre-bound sockets and a separate daemon
harness that don't fit `ProcessGroup::spawn`'s simple model.
hydra-tests: Extract `ProcessGroup` and shared test helpers
Apply [semantic line breaks](https://sembr.org/) (one sentence per line)
across all markdown files in the manual and repository. This makes diffs
easier to review since changing one sentence no longer reflows an entire
paragraph.

Also add `CONTRIBUTING.md` documenting the convention, and clean up
various pandoc artifacts along the way: escaped quotes (`\'`), escaped
underscores (`\_`), broken multi-line list items, and non-breaking
spaces.
Reformat markdown to use semantic line breaks
Forgot to do this when bumping harmonia. The commented stuff will return
when we merge `nix-next`.
`buildproducts.path` can contain a sub-path below a store output
(e.g. `/nix/store/…-nix-manual-2.31.4/share/doc/nix/manual`), but
`OwnedBuildProduct::parse_paths` was calling `StoreDir::parse` which
only accepts bare store paths. Any path with a `/` after the store
path name failed with `invalid store path / symbol`, breaking the
cached-build fast path for those builds.

Move `RelativeStorePath` from the queue-runner into a new
`store-path-utils` crate so the DB crate can use it directly. Change
`OwnedBuildProduct`'s default type parameter from `StorePath` to
`RelativeStorePath` and have `parse_path()` call
`RelativeStorePath::from_path`, which correctly splits the full path
into a `StorePath` base and a relative suffix.
When `create_step` encounters an existing step (`is_new=false`), it now
re-checks whether the step's outputs have appeared in the local nix
store since the step was first created. Without this check, builds
whose outputs became available between poll cycles (e.g. via
concurrent builds, substitution, or external upload) would get stuck
in an infinite re-load loop: the DB says `finished=0`, the step already
exists in memory so `create_build` inserts the build but never routes
it to `handle_cached_build`.

The fix adds two checks before returning `Valid(step)`:
1. If the step is already marked finished, return `None` immediately.
2. Query missing outputs from the local store; if all are present,
   mark the step finished, insert into `finished_drvs`, and return
   `None` so `create_build` routes to `handle_cached_build`.
Nix writes `Compression: bzip2` in narinfo files (see
`src/libstore/nar-info.cc`), but hydra's parser only accepted `bz2`
(the file extension, not the compression name). This caused errors
like `invalid value for Compression: bzip2` when reading old narinfo
entries from `cache.nixos.org`, making the queue runner treat those
store paths as missing and rebuild them.

Fix `FromStr` to accept `bzip2`, and fix `as_str` to write `bzip2`
(matching nix). The file extension method `ext()` correctly uses
`.nar.bz2` and is unchanged.

Examples of affected narinfo files:

- https://cache.nixos.org/2crhfrb8pqy61d32jaajpw33jjmny6xz.narinfo
- https://cache.nixos.org/9fggf2bx5p29da89jrbmdh2fcd68m1mx.narinfo
- https://cache.nixos.org/0xs59grdbgz7z2ci2ff4yfwb6jha1qaa.narinfo
The architecture doc was sitting outside `src/`, presumably because it
was mostly notes and not yet cleaned up enough to be part of the
user-facing docs. We will do that clean up next, but before we do that,
move the file and expose it, so the subsequent diff is readable and not
confused by the move.
- Organize components by deployment topology: master machine vs builder
  machines vs destination store, reflecting which components share a Nix
  store and which don't
- Add `hydra-builder`, `hydra-notify`, `hydra-eval-jobset` — components
  that weren't documented before
- Update `hydra-queue-runner` language from C++ to Rust
- Add source layout and build system overview
- Add database schema section with pointers to the SQL and migration docs
- Update FAQ: both Nix and Hydra now have experimental dynamic derivations
- Fix typos (`hydra-build-producs`, `obsolute`)
Add updated architecture section to the manual
Pick up latest changes from nix-community/harmonia.
Both `hydra-builder` and `hydra-queue-runner` had near-identical
`build.rs` files that compiled the same `.proto` file and generated
the same version constant. This consolidates that into a single
`hydra-proto` crate with feature flags:

- `client` — generates tonic client stubs (used by `hydra-builder`)

- `server` — generates tonic server traits (used by `hydra-queue-runner`)

- `db` — `From` impls bridging proto enums to `db::models` types

Proto ↔ native type conversions are also consolidated here:

- `ProtoStorePath` moved from `shared::proto`, now depends on
  `harmonia-store-core` directly rather than `nix-utils`

- `Pressure`/`PressureState` — both components now use the generated
  proto types directly, eliminating duplicate structs and field-by-field
  conversion boilerplate

- `PresignedUploadOpts`, `ConfigUpdate` — queue-runner now uses the
  proto types directly instead of maintaining trivial local wrappers

- `BuildResultState` — restructured as a nested enum wrapping
  `hydra_proto::BuildResultState` with extra `Aborted`/`Cancelled`
  variants for queue-runner-internal states

- `RelativeStorePath` — new proto message type making the store path
  vs sub-path split explicit on the wire; `From`/`TryFrom` conversions
  in `hydra-proto` bridge to `store_path_utils::RelativeStorePath`

- `BuildProduct.path` changed from opaque `string` to structured
  `RelativeStorePath` message

Also:

- Groups all local workspace crates under `[workspace.dependencies]`
  with `workspace = true` references instead of relative paths

- Updates the architecture doc with descriptions of each shared crate
Extract hydra-proto crate for shared `gRPC` code generation
Introduces `store.proto` (`nix.store.v1` package) containing types that
are generic to any Nix store, not specific to Hydra:

- `StorePath`, `StorePaths`, `RelativeStorePath`

- `ValidPathInfo` — store path with NAR hash/size, references, deriver, CA

- `NarInfo` — wraps `ValidPathInfo` with cache-specific fields
  (url, compression, file hash/size)

`streaming.proto` (`runner.v1`) imports these and uses them for
`BuildMessage.drv`, `BuildProduct.path`, `PresignedUploadComplete`, etc.

`PresignedUploadComplete` is restructured to contain a `NarInfo` message
instead of flattening all fields, making the layering explicit.
Split `.proto` into generic Nix store types and Hydra-specific types
The `binary-cache` crate had its own `NarInfo` struct that duplicated
what harmonia now provides as `harmonia_store_nar_info::NarInfo`. This
replaces it with the upstream type, which also brings in
`harmonia-store-path-info` for `ValidPathInfo`/`UnkeyedValidPathInfo`.

The local struct was flat; the harmonia type is nested:
`NarInfo { path, info: UnkeyedNarInfo { info: UnkeyedValidPathInfo, url, compression, ... } }`.
All construction and field access sites are updated accordingly.

`From<harmonia_store_nar_info::NarInfo>` and `TryFrom<proto::NarInfo>`
conversions are added in the proto crate so both components can convert
between native and wire representations without manual field-by-field
marshalling. Round-trip unit tests verify the conversion is lossless.

Also adds `store_dir` field to proto `ValidPathInfo` to match the
harmonia type. The queue-runner now asserts the store dir matches
rather than silently overriding it.
Replace local `NarInfo` with `harmonia-store-nar-info`
Consumers now import `StorePath`, `StoreDir`, `OutputName`, etc.
directly from `harmonia-store-core` rather than through `nix-utils`
re-exports. This makes the actual dependency on harmonia explicit
at each use site.

`nix-utils` retains its own local functionality (`parse_store_path`,
`BaseStore`, `LocalStore`, `realise_drv`, etc.) but no longer acts as
a forwarding layer for harmonia types.
Remove harmonia re-exports from `nix-utils`
Replace stringly-typed fields in `store.proto` with proper message types:

- `Hash` — algorithm enum + raw digest bytes
- `Signature` — key name + signature string
- `ContentAddress` — method enum + hash

`UnkeyedValidPathInfo.nar_hash` is now `Hash` (not string),
`signatures` is `repeated Signature` (not `repeated string`),
`ca` is `optional ContentAddress` (not `optional string`).
`UnkeyedNarInfo.file_hash`/`file_size` renamed to
`download_hash`/`download_size` with proper `Hash` type.

All `From`/`TryFrom` impls use `&` for harmonia→proto direction
(no need to consume the source) and by-value for proto→harmonia
(proto types are consumed). Round-trip tests cover every layer.
Use structured proto types for `Hash`, `Signature`, and `ContentAddress`
Move each test from the monolithic `nixos-tests.nix` into its own file
under `nixos-tests/`, with shared VM configuration extracted to
`common.nix`. No functional changes to existing tests.
Split NixOS tests into separate files
Ericson2314 and others added 30 commits June 5, 2026 20:38
Remove `nix-utils` C++ FFI, use harmonia daemon protocol
The nix daemons (one for the queue-runner, one for the builder) were
started with bare `IPC::Run` harnesses whose stdout/stderr were never
flushed to test output. When a daemon misbehaved (e.g. connection
reset), there was no log to diagnose it.

To fix this have `start_nix_daemon` register the daemon in the caller's
`ProcessGroup`. The `ProcessGroup` machinery we made is exactly for this
purpose, and we should have had this function use it.

In particular, now that it is part of the process group, `pump_logs`
will pick it up. Also, there is no need for the separate
`$daemon_harness` return value from `start_queue_runner` — callers no
longer need to manually `kill_kill` daemon harnesses since `$pg->stop`
handles everything — so that return value is removed too.

Finally, factor out the newly improved `start_nix_daemon` into its own
module, since it should not be more strongly associated with either the
queue-runner or the daemon.

The queue-runner daemon is labelled `queue-runner daemon` and the
builder daemon `builder daemon`, making it easy to tell them apart in
test output.

Hopefully this helps us debug the spurious issues we're seeing.
hydra-tests: pump nix daemon logs via `ProcessGroup`
build(deps): bump treefmt-nix from `790751f` to `db94781`
The builder was creating a new `ConnectionPool` in every
`process_build` call. Under concurrent load each pool opened its own
set of daemon connections, so the total could reach
`num_builds × (CPU_cores + 1)` — all hitting the same nix daemon and
its SQLite database.

CI logs (now visible thanks to the daemon log pumping) showed
`SQLite database is busy` errors from the builder daemon, which
surfaced as `Connection reset by peer` on the builder side.

Fix by creating the pool once in `State::new` and storing it as
`self.pool`.
Ship build logs to queue-runner via gRPC

Capture log messages from the daemon's `build_paths_with_results`
`ResultLog` by polling it as a Stream alongside the Future, then
zstd-compress the collected output and ship via the existing
`BuildLog` gRPC method so the queue-runner can write it to
`/nix/var/log/nix/drvs/`.

Co-authored-by: amaanq <amaanq@users.noreply.github.com>
hydra-builder: share a single `ConnectionPool` across builds
Use daemon protocol rather than shelling out to build derivation
The old C++ queue-runner in `build-remote.cc` resolved input derivations
before sending a build to the remote: `sendInputs()` iterated
`inputDrvs`, looked up each output path via `DerivationOutput::path()`,
and stuffed them into `inputSrcs` on a `BasicDerivation`. It then copied
just `inputSrcs` to the remote and called `buildDerivation` with the
in-memory `BasicDerivation` — the `.drv` file was never sent.

The Rust rewrite lost this: the builder would call
`fetch_drv_requisites` to discover and import the full drv closure from
the queue runner's store, requiring the `.drv` file and all its
references to be accessible. While this has some advantages — notably,
trusted-user mode is needed for the force-resolved derivations (in the
input-addressed case) but not the original ones — it also has major
disadvantages:

- Sends unnecessary data to builders.

- Allows builders to (erroneously) rebuild dependencies that should
  already be built, rather than giving them no choice but to fail if
  they cannot download the pre-built outputs.

We were hitting the second problem in practice.

Restore the original design. Resolve derivation inputs on the
queue-runner side using `try_resolve_force` (which resolves `Built`
input references to concrete store paths for all derivation types),
encode the result as protobuf, and send it to the builder as
`resolved_drv` (a `nix.store.derivation.v1.Basic`) in the
`BuildMessage`.

The builder decodes the `BasicDerivation` and calls `build_derivation`
via the harmonia daemon protocol, passing it in-memory — no `.drv` file
needs to exist in the builder's store. `FetchRequisites` now takes
`StorePaths` (the resolved `inputSrcs`) rather than the drv path, so
only the actual input closure is fetched.

This also improves on the old C++ code for CA floating derivations:
`DerivationOutput::path()` returns `nullopt` for those, so the old code
couldn't resolve their inputs. `try_resolve_force` properly resolves
them via recorded realisations in the DB.

For CA floating derivations that resolve to a different drv path, the
existing two-phase build dance is preserved.

Derivations with `Deferred` outputs (input-addressed depending on a CA
derivation) also need their output paths filled in after
`try_resolve_force`, then routed through the two-phase dance like
CAFloating to detect if resolution changed the drv path. Without this,
`$out` is empty and builds fail.

Also introduces `store/derivation.proto` with protobuf definitions for
`BasicDerivation`, `DerivationOutput`, and
`ContentAddressMethodAlgorithm`, along with bidirectional conversion
impls between proto and harmonia types.

Co-authored-by: Amaan Qureshi <git@amaanq.com>
This adds color-eyre and tracing-error as workspace dependencies. hydra-tracing
installs `tracing_error::ErrorLayer` in the subscriber, and calls `color_eyre::install()`
from `init()` so both daemons get the report renderer and panic hook.
main now returns color_eyre::Result, so a fatal BuilderError (or a typed
error like TracingInitError) is rendered by the color-eyre handler the
scaffold installs, with source chain and location. State::new and the
rest of the state layer are still on anyhow, so into_report bridges those
residual sites by carrying the chain as the report message until they are
typed.
Send force-resolved `BasicDerivation` to builder via protobuf
hydra-builder: typed errors + eyre/tracing-error reporting
queue-runner: replace anyhow with eyre and typed errors
`!$project eq ""` is actually interpreted as `(!$project) eq ""`,
certainly not what was intended here.
It is unclear how bad the problem is (e.g. are we actually building
derivations twice, or just failing to mark one things cached?). So let's
just skip this part of the cache for now.
Skip the part of the early-cutoff test that fails intermittently
The `StorePath` field of a narinfo holds the full path
(`/nix/store/<hash>-<name>`); cache.nixos.org serves it that way and
`harmonia_store_nar_info`, used for our own uploads, writes it that way.
Our hand-rolled `parse_narinfo` fed the value into `StorePath::from_str`,
which only accepts the base name, so every spec-conforming narinfo failed
to parse.

The queue runner therefore could not read back narinfos from its own
binary cache: `has_narinfo` failed for every path, uploads were repeated
endlessly and `copy_paths` errors failed the affected builds. Observed on
a 120-build nixos-small evaluation against an S3 cache populated by the
runner itself: 2067 `invalid value for StorePath` errors and all 120
builds failing with status 2.

harmonia has since grown its own `parse_narinfo_txt` (mirroring nix's
`nar-info.cc`), so delegate to it instead of maintaining our own. It takes
the store directory explicitly, so carry it on `S3CacheConfig`, read from
the `store` query parameter of the cache URL (the nix S3 convention,
defaulting to `/nix/store`); both the upload and download paths now use
`self.cfg.store_dir` rather than threading it in. Besides reading
`StorePath` as a full path, this also fixes two ways our parser diverged
from nix: `Deriver: unknown-deriver` and an optional `Compression` field.
binary-cache: read narinfos with harmonia's parser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants