Skip to content

[queue runner] Fix db races in substituting steps.#1790

Open
Mic92 wants to merge 1 commit into
NixOS:masterfrom
Mic92:queue-runner-substitution-stepnr
Open

[queue runner] Fix db races in substituting steps.#1790
Mic92 wants to merge 1 commit into
NixOS:masterfrom
Mic92:queue-runner-substitution-stepnr

Conversation

@Mic92

@Mic92 Mic92 commented Jun 11, 2026

Copy link
Copy Markdown
Member

I am currently testing nixos minimal in a private staging deployment.
I found this race in journal of my builder:

  WARN queue_monitor_loop:...:create_step:substitute_output:
    create_substitution_step:insert_build_step: sqlx::query:
    slow statement: execution time exceeded alert threshold
    summary="WITH max AS (SELECT ..." elapsed=14.453825934
    drv_path=4i1pqi0za...-vm-test-run-installer-simpleUefiSystemdBoot.drv

  WARN queue_monitor_loop:...:create_step:query_known_drv_outputs:
    sqlx::pool::acquire: acquired connection, but time to acquire
    exceeded slow threshold aquired_after_secs=3.064450723
    slow_acquire_threshold_secs=2.0

Still testing the fix.
Check the commits for more details.

@Mic92 Mic92 changed the title Fix queue runner db races when substituting store paths. Fix queue runner db races in substituting steps. Jun 11, 2026
@Mic92 Mic92 force-pushed the queue-runner-substitution-stepnr branch from 1034662 to 254d37d Compare June 11, 2026 11:25
@Mic92 Mic92 changed the title Fix queue runner db races in substituting steps. [queue runner] Fix db races in substituting steps. Jun 11, 2026
@Mic92 Mic92 force-pushed the queue-runner-substitution-stepnr branch from 254d37d to 855e192 Compare June 11, 2026 12:11
@Mic92

Mic92 commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Looks good now:

run slow stmts stepnr conflicts WARNs
original 260 (inserts up to 15.5s) invisible hundreds
advisory lock 0 0 0

@Mic92 Mic92 marked this pull request as ready for review June 11, 2026 12:29
@Mic92 Mic92 force-pushed the queue-runner-substitution-stepnr branch 2 times, most recently from 5626e85 to caac547 Compare June 11, 2026 13:49
stepnr is computed as MAX(stepnr) + 1 inside the insert. Transactions
inserting steps for the same build at the same time pick the same
number: the loser blocks on the winner's uncommitted unique-index
entry, hits ON CONFLICT DO NOTHING once the winner commits and retries.

The queue monitor substitutes outputs of a build concurrently
(buffer_unordered(10), also across derivations of the same build), so
this happened constantly. Each blocked insert held a connection from
the pool. Observed while processing a 120-build nixos-small evaluation
against an empty binary cache: 260 slow statement warnings, single
inserts taking up to 15s and unrelated queries waiting 3s for a pool
connection.

Take pg_advisory_xact_lock(class, build_id) before the insert. Writers
for the same build now queue up for the duration of a short
transaction instead of fighting over the same stepnr.
@Mic92 Mic92 force-pushed the queue-runner-substitution-stepnr branch 2 times, most recently from 30d85e5 to 51a926a Compare June 11, 2026 14:05
@Ericson2314 Ericson2314 added this pull request to the merge queue Jun 11, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 11, 2026
conn
}

fn substitution_step(build_id: i32, drv_path: &StorePath) -> InsertBuildStep<'_> {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we only need this function for test we can mark restrict it to test

assert_eq!(result, Some(sp("final-result")));
}

async fn replica_conn(pool: &sqlx::PgPool) -> Connection {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we only need this function for test we can mark restrict it to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants