Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@railway-app
Copy link

railway-app bot commented Jan 20, 2026

🚅 Deployed to the rivet-pr-3982 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-cloud 😴 Sleeping (View Logs) Web Jan 26, 2026 at 9:32 pm
frontend-inspector 😴 Sleeping (View Logs) Web Jan 26, 2026 at 9:32 pm
website ❌ Build Failed (View Logs) Web Jan 26, 2026 at 9:24 pm

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3982 January 20, 2026 18:47 Destroyed
Copy link
Contributor Author

MasterPtato commented Jan 20, 2026


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Jan 20, 2026

PR Review: fix(pb): always check actor queue on init

Overview

This PR addresses a bug where the actor queue wasn't being checked on runner initialization, potentially leaving actors in a pending state when runner capacity became available. The changes improve the Pegboard runner workflow and include related Grafana dashboard updates.

Code Quality & Best Practices ✅

Strengths:

  1. Good refactoring - Extracted check_queue() helper function eliminates code duplication across three call sites (runner2.rs:190-209)
  2. Consistent function signature - Uses &str for name parameter, matching Rust conventions
  3. Clear fix - The bug fix is straightforward: adding check_queue() call after initialization (runner2.rs:84)
  4. Improved comment accuracy - Fixed misleading comment about timestamp comparison (gasoline/src/db/kv/mod.rs:1063)

Observations:

  1. The comment at gasoline/src/db/kv/mod.rs:1187 states "More up to date now than prev var" - while functionally correct, this is documenting a delta rather than explaining why it's necessary. Per CLAUDE.md guidelines, avoid documenting deltas. Consider either removing this comment or explaining the business reason for getting a fresh timestamp here.

Potential Issues 🔍

Minor concerns:

  1. TODO in get_error.rs (line 72): The added TODO suggests querying runner pool workflows to check if dead. This is good to note, but:

    • Is this related to the current fix?
    • Should this be tracked in a separate issue?
    • Consider adding an issue number reference to the TODO
  2. Error handling completeness: The check_queue() function propagates errors with ? operator. Verify that failures here won't leave the runner in an inconsistent state during initialization. The workflow system should handle this, but worth confirming.

Performance Considerations ⚡

  1. Additional queue check on init: Adding check_queue() at line 84 means every runner initialization now performs an actor allocation check. This is the intended fix, but consider:

    • Under high runner churn, this could increase database load
    • The transaction in allocate_pending_actors has a 2.5s timeout (EARLY_TXN_TIMEOUT)
    • Metrics should be monitored post-deployment
  2. Grafana query improvements: The updated PromQL queries now filter by rivet_gasoline_worker_last_metrics_publish to exclude stale metrics from dead workers. This is excellent for accuracy but adds query complexity. Performance should be acceptable given Prometheus's optimization.

Security Concerns 🔒

No security issues identified. The changes maintain existing authentication and authorization patterns.

Test Coverage 📋

Recommendation: Consider adding integration tests for:

  1. Runner initialization with pending actors in queue
  2. Verification that actors are allocated when new runner capacity appears
  3. Edge case: runner initializes, crashes before queue check, then reinitializes

Testing can be done manually if integration test infrastructure isn't in place, but this scenario (actors stuck in pending state) suggests gaps in test coverage.

Grafana Dashboard Changes

The dashboard changes look solid:

  • Added sishort units for better metric formatting
  • Fixed queries to only count active workers
  • Removed unnecessary series overrides
  • Changed default time range from 15m to 5m (better for debugging)

These are all improvements that enhance observability.

Recommendations

  1. Clarify or remove the delta-documenting comment at gasoline/src/db/kv/mod.rs:1187
  2. Consider linking the TODO to an issue tracker reference
  3. Monitor metrics after deployment for any performance impact from the additional queue check
  4. Add integration tests if infrastructure permits

Verdict

LGTM ✅ - This is a solid bug fix with good code hygiene. The refactoring improves maintainability, and the fix addresses the root cause effectively. The minor suggestions above are for polish, not blockers.


Review generated using codebase context from CLAUDE.md guidelines

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 20, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3982

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3982

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3982

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3982

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3982

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3982

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3982

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3982

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3982

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3982

commit: d055795

@graphite-app
Copy link
Contributor

graphite-app bot commented Jan 27, 2026

Merge activity

  • Jan 27, 2:49 AM UTC: MasterPtato added this pull request to the Graphite merge queue.
  • Jan 27, 2:50 AM UTC: The Graphite merge queue couldn't merge this PR because it had merge conflicts.
  • Jan 27, 2:58 AM UTC: MasterPtato added this pull request to the Graphite merge queue.
  • Jan 27, 2:59 AM UTC: CI is running for this pull request on a draft pull request (#4045) due to your merge queue CI optimization settings.
  • Jan 27, 3:00 AM UTC: Merged by the Graphite merge queue via draft PR: #4045.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants