Skip to content

Conversation

@MasterPtato
Copy link
Contributor

@MasterPtato MasterPtato commented Jan 17, 2026

No description provided.

@railway-app
Copy link

railway-app bot commented Jan 17, 2026

🚅 Deployed to the rivet-pr-3952 environment in rivet-frontend

Service Status Web Updated (UTC)
website 🕒 Building (View Logs) Web Jan 27, 2026 at 3:17 am
frontend-inspector 🕒 Building (View Logs) Web Jan 27, 2026 at 3:17 am
frontend-cloud 🕒 Building (View Logs) Web Jan 27, 2026 at 3:17 am

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 17, 2026 00:30 Destroyed
Copy link
Contributor Author

MasterPtato commented Jan 17, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@MasterPtato MasterPtato changed the title fix: day 1 patch, add dashboards feat: add actor and kv metrics Jan 17, 2026
@MasterPtato MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from 67c4a86 to 8b7ad2b Compare January 20, 2026 18:48
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 20, 2026 18:48 Destroyed
@claude
Copy link

claude bot commented Jan 20, 2026

Pull Request Review: Add Actor and KV Metrics

Summary

This PR adds comprehensive metrics tracking for Rivet Actors and KV operations, including actor uptime, KV storage usage, network traffic, and request counts. The implementation includes database keys, metrics recording in workflows, and Grafana dashboards for visualization.

Code Quality & Best Practices

✅ Strengths

  1. Well-structured metric system: The new keys::ns::metric module provides a clean enum-based metric system with proper serialization/deserialization.

  2. Consistent error handling: Uses the repository's custom error system correctly with anyhow::Result.

  3. Proper use of atomic operations: Uses MutationType::Add for incrementing metrics, which is appropriate for concurrent updates.

  4. Comprehensive test coverage: Added test files kv_list_edge_cases.rs and kv_operations.rs.

  5. Documentation updates: Added Grafana dashboards and updated documentation in docs/engine/GASOLINE/.

⚠️ Issues & Concerns

1. Inconsistent Byte Encoding (Critical)

Location: engine/packages/pegboard/src/keys/ns/metric.rs:85-90

Issue: This file uses little-endian (LE) encoding for metrics, while engine/packages/pegboard/src/keys/actor.rs:36-41 uses big-endian (BE) encoding.

// In metric.rs - Uses LE
fn deserialize(&self, raw: &[u8]) -> Result<Self::Value> {
    Ok(i64::from_le_bytes(raw.try_into()?))
}

// In actor.rs - Uses BE
fn deserialize(&self, raw: &[u8]) -> Result<Self::Value> {
    Ok(i64::from_be_bytes(raw.try_into()?))
}

Why this matters: Mixing endianness can lead to difficult debugging, potential bugs, and confusion for future developers.

Recommendation:

  1. Document WHY LE encoding was chosen for metrics (is it for atomic operations compatibility?)
  2. Consider standardizing on one endianness across the entire codebase
  3. Add a comment explaining the trade-offs

2. Potential Integer Overflow (Medium)

Location: engine/packages/pegboard/src/actor_kv/mod.rs:124-130, 236-242, 276-283, 347-354

Issue: Multiple locations round up byte sizes without overflow protection:

let total_size_chunked = (total_size as u64).div_ceil(util::metric::KV_BILLABLE_CHUNK)
    * util::metric::KV_BILLABLE_CHUNK;
keys::ns::metric::inc(
    &tx.with_subspace(keys::subspace()),
    recipient.namespace_id,
    keys::ns::metric::Metric::KvRead(recipient.name.clone()),
    total_size_chunked.try_into().unwrap_or_default(),
);

Problem:

  • total_size_chunked could theoretically overflow when multiplied back
  • try_into().unwrap_or_default() silently converts overflow to 0, which would undercount metrics

Recommendation: Use saturating arithmetic:

let total_size_chunked = (total_size as u64)
    .div_ceil(util::metric::KV_BILLABLE_CHUNK)
    .saturating_mul(util::metric::KV_BILLABLE_CHUNK);
let metric_value = total_size_chunked.try_into().unwrap_or(i64::MAX);

3. Unbounded Metrics Growth (Critical)

Location: engine/packages/pegboard/src/keys/ns/metric.rs

Issue: Metrics are stored as database keys per actor name, with no apparent cleanup mechanism. This could lead to unbounded database growth if:

  • Actor names are dynamically generated
  • Old actor names are never cleaned up
  • Metrics accumulate indefinitely

Recommendation:

  1. Implement a metric cleanup/archival strategy
  2. Add documentation about metric lifecycle
  3. Consider TTL or periodic aggregation

4. Atomic Load Ordering (Performance)

Location: engine/packages/pegboard-gateway/src/metrics_task.rs:61-62

Issue: Uses Acquire ordering for atomic loads:

let new_ingress_bytes = ingress_bytes.load(std::sync::atomic::Ordering::Acquire);
let new_egress_bytes = egress_bytes.load(std::sync::atomic::Ordering::Acquire);

Analysis: Acquire ordering is stronger than necessary for simple counter reads. Relaxed would suffice and be more efficient.

Recommendation:

let new_ingress_bytes = ingress_bytes.load(std::sync::atomic::Ordering::Relaxed);
let new_egress_bytes = egress_bytes.load(std::sync::atomic::Ordering::Relaxed);

5. Missing Error Context (Minor)

Location: engine/packages/pegboard/src/actor_kv/entry.rs:39, 44

Issue: Errors lack context about which key failed:

ensure\!(\!self.value.is_empty(), "empty value at key");
self.metadata.context("no metadata for key")?

Recommendation: Include the key in error messages:

ensure\!(\!self.value.is_empty(), "empty value at key {:?}", self.key.0);
self.metadata.with_context(|| format\!("no metadata for key {:?}", self.key.0))?

Security Concerns

✅ No Critical Security Issues Found

  1. Input validation: KV operations properly validate keys and sizes
  2. CORS handling: Proper CORS implementation with origin echoing and credentials support
  3. No SQL injection: Uses parameterized database operations

Performance Considerations

⚠️ Potential Performance Issues

  1. Frequent Metric Updates

    • Metrics are updated every 15 seconds via METRICS_INTERVAL_MS
    • Each update performs a full KV size estimation
    • Consider batching or reducing frequency for high-traffic scenarios
  2. KV Size Estimation (engine/packages/pegboard/src/actor_kv/mod.rs:32-37)

    • Called during deallocation and periodic metrics recording
    • Could be expensive for large KV stores
    • Recommendation: Cache estimates between intervals or use incremental tracking
  3. Stream Processing (engine/packages/pegboard/src/actor_kv/mod.rs:286-327)

    • Uses .buffer_unordered(32) for parallel processing
    • Could cause memory pressure with large batches
    • Recommendation: Consider reducing concurrency for memory-constrained environments

Test Coverage

✅ Good Test Infrastructure

  • Added test files for KV operations
  • Tests cover edge cases for listing operations

⚠️ Missing Test Coverage

  1. No tests for metric recording accuracy
  2. No tests for overflow scenarios in metric calculations
  3. No tests for concurrent metric updates
  4. No tests for metric cleanup/archival

Recommendation: Add unit tests for edge cases and concurrent operations.

Documentation

✅ Strengths

  1. Updated Grafana dashboards for all Docker environments
  2. Added documentation in docs/engine/GASOLINE/

⚠️ Areas for Improvement

  1. No inline documentation explaining metric types and their meaning
  2. Missing documentation on metric retention/cleanup policies
  3. No examples of querying metrics

Recommendation: Add module-level documentation explaining metric types, lifecycle, and usage patterns.

Recommendations Summary

Must Fix (Before Merge)

  1. ⚠️ Document or fix LE/BE encoding inconsistency - This could cause subtle bugs
  2. ⚠️ Add overflow protection for metric calculations - Prevents silent data loss
  3. ⚠️ Add metric lifecycle documentation and cleanup strategy - Prevents unbounded growth

Should Fix (Soon After Merge)

  1. 🔧 Improve error messages with key context
  2. ⚡ Optimize atomic operations (Acquire → Relaxed)
  3. 🧪 Add comprehensive metric tests

Nice to Have

  1. 📚 Add module-level documentation with examples
  2. ⚡ Consider caching KV size estimates

Overall Assessment

Rating: 🟡 Approve with Minor Changes

This PR adds valuable observability to Rivet Actors with a well-structured metrics system. The implementation follows most repository conventions and includes proper Grafana dashboards. However, there are a few concerns that should be addressed:

  1. Critical: The LE/BE encoding inconsistency should be documented or resolved
  2. Important: Integer overflow handling needs improvement
  3. Important: Missing strategy for metric cleanup/growth management

The code quality is generally good, and the issues identified are addressable with minor changes. The PR significantly improves observability without introducing major security vulnerabilities.


Reviewed with attention to Rivet conventions from CLAUDE.md

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 20, 2026

More templates

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3952

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3952

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3952

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3952

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3952

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3952

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3952

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3952

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3952

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3952

commit: 46b43db

@MasterPtato MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from 8b7ad2b to b47e44c Compare January 20, 2026 20:53
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 20, 2026 20:53 Destroyed
@MasterPtato MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from b47e44c to fc05f9d Compare January 21, 2026 21:33
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 21, 2026 21:33 Destroyed
@MasterPtato MasterPtato marked this pull request as ready for review January 21, 2026 21:33
@MasterPtato MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from fc05f9d to 8246e45 Compare January 23, 2026 01:09
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 23, 2026 01:09 Destroyed
@MasterPtato MasterPtato changed the base branch from main to graphite-base/3952 January 26, 2026 21:23
@MasterPtato MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from 8246e45 to d1cdcc5 Compare January 26, 2026 21:23
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 26, 2026 21:23 Destroyed
@MasterPtato MasterPtato changed the base branch from graphite-base/3952 to 01-26-fix_remove_attributes_from_last_metric_push_metric January 26, 2026 21:23
@graphite-app
Copy link
Contributor

graphite-app bot commented Jan 27, 2026

Merge activity

@graphite-app graphite-app bot force-pushed the 01-26-fix_remove_attributes_from_last_metric_push_metric branch 2 times, most recently from 0bd45c1 to e0d8017 Compare January 27, 2026 03:02
@graphite-app graphite-app bot changed the base branch from 01-26-fix_remove_attributes_from_last_metric_push_metric to graphite-base/3952 January 27, 2026 03:14
@MasterPtato MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from d1cdcc5 to 46b43db Compare January 27, 2026 03:16
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 27, 2026 03:16 Destroyed
@MasterPtato MasterPtato changed the base branch from graphite-base/3952 to main January 27, 2026 03:16
graphite-app bot pushed a commit that referenced this pull request Jan 27, 2026
@graphite-app graphite-app bot closed this Jan 27, 2026
@graphite-app graphite-app bot deleted the 01-15-feat_add_actor_and_kv_metrics2 branch January 27, 2026 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants