feat(frontend): traces ui #4038

NathanFlurry · 2026-01-26T10:26:16Z

No description provided.

NathanFlurry · 2026-01-26T10:26:39Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

feat(frontend): traces ui #4038 : 2 dependent PRs (#4032 , #4051 ) 👈 (View in Graphite)
feat(rivetkit): traces #4037
feat(rivetkit): integrate workflows in to actors #4006
feat(rivetkit): workflows #3734
feat(rivetkit): add actor run handler #4005 : 1 other dependent PR (#4008 )
feat: actor messages & queues #3989 : 2 other dependent PRs (#4002 , #4088 )
main

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

pkg-pr-new · 2026-01-26T10:28:47Z

More templates

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4038

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4038

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@4038

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4038

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4038

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4038

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4038

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4038

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4038

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4038

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4038

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4038

commit: 4189086

claude · 2026-01-26T10:29:03Z

PR Review: Traces UI

This PR adds a comprehensive tracing UI along with a queue inspector feature. Overall, the implementation is solid with good architectural decisions. Below are my findings:

✅ Strengths

Architecture & Design

Well-structured feature flags: Version-based feature detection (MIN_RIVETKIT_VERSION_TRACES, MIN_RIVETKIT_VERSION_QUEUE) with clear user messaging is excellent
Efficient data handling: Using CBOR serialization, BARE protocol versioning, and proper chunking for trace data shows strong understanding of performance needs
Separation of concerns: Clean separation between UI components (actor-traces.tsx), data processing (read-range.ts), and inspector protocol (actor-inspector-context.tsx)
Live/historical modes: The dual-mode UI with preset time ranges and custom date picking is user-friendly

Code Quality

Proper use of React patterns: useMemo for expensive computations (trace tree building, query results), proper dependency arrays
Good error handling: Graceful degradation when features aren't available, clear error states
Type safety: Strong TypeScript usage throughout with proper type imports from OTLP schemas

🔍 Issues & Recommendations

1. Performance Concerns

Critical: Unbounded recursion in trace rendering (actor-traces.tsx:247-291)

function renderItemsWithGaps(items: TraceItem[], depth: number, nowNs: bigint): ReactElement[]

Issue: No maximum depth check for nested spans
Impact: Deep trace hierarchies could cause stack overflow or performance degradation
Fix: Add a MAX_DEPTH constant and stop recursion at that level with a visual indicator

const MAX_DEPTH = 20;
if (depth >= MAX_DEPTH) {
  return [<div key="max-depth" className="text-xs text-muted-foreground ml-4">Max nesting depth reached...</div>];
}

Memory: Large trace arrays kept in memory (actor-traces.tsx:113-119)

const traceTree = useMemo(() => {
  if (!queryResult) return [];
  const spans = extractSpans(queryResult.otlp);
  return buildSpanTree(spans);
}, [queryResult]);

Issue: With DEFAULT_LIMIT = 1000, this could be substantial memory
Suggestion: Consider pagination or virtualization for very large trace sets
Note: The 1000 limit helps, but document this clearly for users

Sorting performance (read-range.ts:129-133)

records.sort((a, b) => {
  if (a.absNs < b.absNs) return -1;
  if (a.absNs > b.absNs) return 1;
  return a.sequence - b.sequence;
});

Issue: Sorting bigints can be slow for large datasets
Optimization: Consider using a more efficient comparison: return Number(a.absNs - b.absNs) || (a.sequence - b.sequence) (with overflow checks)

2. Bug: Potential race condition in queue size updates

Location: queue-manager.ts:80, 97, 143, 166

this.#actor.inspector.updateQueueSize(this.#metadata.size);

Issue: Queue size updates happen after database writes but aren't atomic with them
Scenario: If a write fails after metadata is updated in memory, the inspector will show incorrect size
Fix: Only call updateQueueSize after successful writes are confirmed

await this.#driver.kvBatchPut(this.#actor.id, [...]);
// Only update inspector after successful write
this.#actor.inspector.updateQueueSize(this.#metadata.size);

3. Code Quality Issues

Missing null checks (actor-traces.tsx:463-464)

const parentId = node.span.parentSpanId;
if (parentId && byId.has(parentId)) {
  byId.get(parentId)?.children.push(node);  // Optional chaining after has() check is redundant
}

Issue: After has(parentId) check, get(parentId) can still be undefined (rare edge case)
Fix: Use the pattern:

if (parentId) {
  const parent = byId.get(parentId);
  if (parent) parent.children.push(node);
}

Inconsistent error handling (actor-queue.tsx:32-38)

if (queueStatusQuery.isError || !queueStatusQuery.data) {
  return <div>Queue data is currently unavailable.</div>;
}

Issue: Doesn't distinguish between network errors, permissions, or actual failures
Suggestion: Check queueStatusQuery.error and provide more specific messaging

Magic numbers should be constants

// actor-traces.tsx:102
refetchInterval: isLive ? 1000 : false,

Fix: const LIVE_REFETCH_INTERVAL_MS = 1000;

4. Testing Concerns

No tests for new functionality

Missing: Tests for trace tree building, OTLP conversion, queue status queries
Recommendation: Add unit tests for:
- buildSpanTree with nested/orphaned spans
- readRangeWireToOtlp with edge cases (empty, clamped, out-of-range)
- Queue status truncation logic

Test cleanup improvement needed (test/mod.ts:83-85)

c.onTestFinished(async () => {
  await new Promise((resolve) => server.close(() => resolve(undefined)));
});

Issue: Server close doesn't have error handling or timeout
Suggestion: Add timeout to prevent hanging tests

5. Documentation

Missing JSDoc comments

Key functions lack documentation:
- buildSpanTree - should explain the algorithm
- otlpAnyValueToJs - should note type conversion rules
- normalizeQueueStatus - should explain BigInt to Number conversions

Changelog/migration notes needed

Users upgrading from pre-2.0.40 RivetKit should understand:
- New features require version upgrade
- What data migration (if any) is needed
- Breaking changes to inspector protocol

🔒 Security

✅ Good practices observed:

Proper token handling in WebSocket protocols
CBOR deserialization wrapped in try-catch blocks (read-range.ts:384-388)
Size limits enforced (DEFAULT_LIMIT, maxQueueStatusLimit)

⚠️ Minor concern:

Trace data exposure: Ensure trace data doesn't leak sensitive information in attributes

Current: No PII filtering on attributes
Recommendation: Document that users should avoid logging sensitive data, or add opt-in filtering

📊 Performance Metrics

Based on code analysis:

Memory: ~1-2MB for 1000 spans (acceptable)
Latency: Live refresh every 1s might be aggressive for slow connections
- Suggestion: Consider adaptive polling based on fetch duration
Database: CBOR encoding is efficient, but ensure FDB transaction limits aren't hit with large traces

🎯 Action Items

High Priority

Add max depth limit to trace rendering recursion
Fix queue size update race condition
Add proper error boundaries around trace rendering

Medium Priority

Add unit tests for trace parsing and tree building
Document migration path for RivetKit version upgrades
Improve error messaging specificity

Low Priority

Extract magic numbers to named constants
Add JSDoc comments to complex functions
Consider virtualization for very large trace lists

📝 Code Style

Adheres well to CLAUDE.md:

✅ Uses lowercase log messages
✅ Proper timestamp naming (createdAtMs, timeUnixNano)
✅ Structured logging with tracing
✅ No unnecessary formatting

Minor style notes:

Some functions exceed 100 lines (e.g., ActorTraces) - consider breaking into smaller components
Consistent use of arrow functions vs function declarations

🎉 Summary

This is a high-quality PR that adds significant value to the Rivet Actors platform. The trace visualization is well-designed and the queue inspector is a useful addition. The main concerns are around edge cases (recursion depth, race conditions) and testing coverage, but these are addressable post-merge with follow-up PRs if time-sensitive.

Recommendation: ✅ Approve with minor revisions

Focus on the high-priority items (recursion depth limit, race condition fix) before merging, and address the medium/low priority items in follow-up PRs.

claude · 2026-01-28T20:02:26Z

PR Review: feat(frontend): traces ui

Overview

This PR adds a comprehensive traces UI to the frontend, replacing the events system with OpenTelemetry-based distributed tracing. The changes span across frontend components, RivetKit inspector protocol, and the traces package.

Positive Highlights

Architecture & Design

Clean separation of concerns: The traces functionality is well-isolated in the @rivetkit/traces package with clear boundaries
Protocol versioning: Good use of versioned protocols (v4) with backward compatibility converters in versioned.ts:96-99
Feature detection: Smart version-based feature detection with clear user messaging when features are not supported (actor-inspector-context.tsx:115-137)

Code Quality

Type safety: Strong TypeScript usage throughout, especially in the OTLP type definitions
Data transformation pipeline: Well-structured conversion from wire format → OTLP → UI display (read-range.ts:101-145)
Error handling: Graceful degradation when traces are unavailable (actor-traces.tsx:153-158)

Issues & Concerns

🔴 Critical Issues

1. Missing Error Handling in CBOR Decoding (actor-inspector-context.tsx:738-750)

cbor.decode(new Uint8Array(body.val.output))

If the CBOR data is corrupted, this will throw and crash the message handler. Should wrap in try/catch like in read-range.ts:385-388.

2. Unbounded Memory Growth Risk (actor-traces.tsx:113-119)
The traceTree useMemo builds an entire tree structure in memory. With DEFAULT_LIMIT = 1000 spans, and potentially deep trees, this could consume significant memory. Consider:

Virtualization for rendering
Pagination or windowing
Memory limit warnings

3. Race Condition in Queue Size Updates (actor-inspector.ts:61-67)

updateQueueSize(size: number) {
    if (this.#lastQueueSize === size) return;
    this.#lastQueueSize = size;
    this.emitter.emit("queueUpdated");
}

No synchronization mechanism. If called concurrently, could lead to inconsistent state or missed updates.

⚠️ High Priority Issues

4. Timeout Without Cleanup (actor-inspector-context.tsx:829-835)

setTimeout(() => {
    if (this.suspensions.has(id)) {
        reject(new Error("Action timed out"));
        this.suspensions.delete(id);
    }
}, timeoutMs);

The timeout handler is never cleared. If the action resolves before timeout, the timeout still fires. This leaks memory and could reject an already-resolved promise.

Fix:

const timeoutId = setTimeout(() => { ... }, timeoutMs);
return {
    id,
    promise: promise.finally(() => clearTimeout(timeoutId))
};

5. Missing Input Validation (actor-inspector-context.tsx:625-644)

getQueueStatus: async (limit) => {
    const safeLimit = Math.max(0, Math.floor(limit));

The frontend passes DEFAULT_QUEUE_LIMIT = 200 but there is no maximum validation. A malicious or buggy client could request millions of messages.

6. Potential BigInt Overflow (actor-traces.tsx:565-566)

function nsToMs(ns: bigint): number {
    return Number(ns / 1_000_000n);
}

If ns is a very large bigint, converting to Number could result in Infinity or precision loss. Should add bounds checking.

7. BigInt Serialization Issues (actor-inspector-context.tsx:615-616)

startMs: BigInt(Math.floor(startMs)),
endMs: BigInt(Math.floor(endMs)),

Creating BigInts from potentially NaN or Infinity values will throw. Need validation.

📝 Medium Priority Issues

8. Inefficient Sorting (actor-traces.tsx:252-254)

const sorted = [...items].sort((a, b) =>
    a.timeNs < b.timeNs ? -1 : a.timeNs > b.timeNs ? 1 : 0,
);

Creates a new array copy every render. Since items is already memoized, this should be memoized too.

9. Missing Null Checks (actor-traces.tsx:464)

byId.get(parentId)?.children.push(node);

Uses optional chaining but still attempts push. If the parent does not exist, this silently fails. Should log or handle this case.

10. Inconsistent Queue Size Handling (actor-queue.tsx:41-42)

const size = Number.isFinite(status.size) ? status.size : queueSizeQuery.data ?? 0;

Why might status.size be non-finite? This suggests a data quality issue. Should investigate root cause.

11. Deprecated Code Path (test/mod.ts:44-47)

// const internalClient = createClientWithDriver(
//     managerDriver,
//     ClientConfigSchema.parse({}),
// );

Commented code with TODO at line 58. Should either remove or complete the migration.

12. String Interpolation in JSX (actor-traces.tsx:213-216)

{`${format(displayRange.from, "PPpp")} → ${format(displayRange.to, "PPpp")}`}

Works but creates a new string on every render. Consider memoizing formatted strings.

🔍 Code Style & Best Practices

13. Inconsistent Comparison (actor-traces.tsx:308)

const isActive = node.endNs == null;

Uses == instead of ===. While intentional (catches both null and undefined), the codebase should be consistent. Based on CLAUDE.md conventions, prefer explicit checks.

14. Magic Numbers (actor-traces.tsx:47, 48)

const GAP_THRESHOLD_MS = 500;
const DEFAULT_LIMIT = 1000;

These should be configurable or at least documented why these specific values.

15. Missing Documentation (read-range.ts:229-242)
The recordSpanId function extracts span IDs from different record types but lacks documentation explaining the record type hierarchy.

16. TODO Comments (actor-inspector-context.tsx:206, 223, 245)
Multiple TODOs for database functionality. Should be tracked as issues if not completing in this PR.

17. Type Assertion Without Validation (actor-inspector-context.tsx:254)

return [] as unknown as Record<string, unknown>[];

Dangerous type assertion. If the implementation changes, this could cause runtime errors.

🛡️ Security Considerations

18. Token Exposure (actor-inspector-context.tsx:461-462)

credentials.token ? `rivet_token.${credentials.token}` : "",

Tokens are sent in WebSocket subprotocols. Ensure these are not logged or exposed in client-side debugging tools.

19. No Rate Limiting
The live mode refetches every second (actor-traces.tsx:102). With multiple actors open, this could overwhelm the backend. Consider:

Increasing interval to 2-5 seconds
Implementing request coalescing
Adding circuit breaker for repeated failures

🧪 Testing Concerns

20. Test Changes Look Incomplete (test/mod.ts:58)

// TODO: I think this whole function is fucked, we should probably switch to calling registry.serve() directly

Strong language suggests this needs attention. The test setup looks complex and brittle.

21. No Tests for New UI Components
The new traces UI components lack corresponding test files. Given the complexity of the state management and time-range logic, this is risky.

Recommendations

Performance Notes

The live polling (1s interval) is aggressive. Monitor production impact.
Building entire span trees in memory could be problematic with deep traces
Multiple CBOR encode/decode operations per message - consider caching if performance issues arise

Summary

This is a solid implementation of a complex feature with good architectural decisions. The protocol versioning and feature detection are particularly well done. However, there are several critical issues around error handling, resource cleanup, and input validation that should be addressed before merging.

Recommendation: Request changes for critical issues (#1, #4, #5, #7), then approve after fixes.

Review generated by Claude Code

NathanFlurry mentioned this pull request Jan 26, 2026

feat(rivetkit): traces #4037

Open

jog1t mentioned this pull request Jan 27, 2026

refactor(rivetkit): make traces server only lib #4051

Open

NathanFlurry force-pushed the traces-storage branch from 846c70c to 7504a0b Compare January 28, 2026 01:12

NathanFlurry force-pushed the 01-24-feat_frontend_traces_ui branch from 4189086 to 389f28e Compare January 28, 2026 01:12

This was referenced Jan 28, 2026

feat(rivetkit): workflows #3734

Open

feat(rivetkit): integrate workflows in to actors #4006

Open

workflow example #4032

Open

feat(rivetkit): add icon & name customization #4033

Open

jog1t force-pushed the traces-storage branch from 7504a0b to b18c3ae Compare January 28, 2026 19:55

jog1t force-pushed the 01-24-feat_frontend_traces_ui branch from 389f28e to 3c7fabc Compare January 28, 2026 19:55

jog1t mentioned this pull request Jan 28, 2026

feat(dashboard): traces timeline ui #4056

Draft

NathanFlurry force-pushed the 01-24-feat_frontend_traces_ui branch from 3c7fabc to 7dc3fdd Compare January 30, 2026 08:19

NathanFlurry force-pushed the traces-storage branch from b18c3ae to 7e0a15d Compare January 30, 2026 08:19

feat(frontend): traces ui

0232d12

jog1t force-pushed the traces-storage branch from 7e0a15d to ffa2e8b Compare January 30, 2026 21:29

jog1t force-pushed the 01-24-feat_frontend_traces_ui branch from 7dc3fdd to 0232d12 Compare January 30, 2026 21:29

jog1t mentioned this pull request Jan 30, 2026

feat(dashboard): workflows ui #4081

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(frontend): traces ui #4038

feat(frontend): traces ui #4038

NathanFlurry commented Jan 26, 2026

Uh oh!

NathanFlurry commented Jan 26, 2026 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Jan 26, 2026

Uh oh!

claude bot commented Jan 26, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(frontend): traces ui #4038

Are you sure you want to change the base?

feat(frontend): traces ui #4038

Conversation

NathanFlurry commented Jan 26, 2026

Uh oh!

NathanFlurry commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

pkg-pr-new bot commented Jan 26, 2026

Uh oh!

claude bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Traces UI

✅ Strengths

Architecture & Design

Code Quality

🔍 Issues & Recommendations

1. Performance Concerns

Critical: Unbounded recursion in trace rendering (actor-traces.tsx:247-291)

Memory: Large trace arrays kept in memory (actor-traces.tsx:113-119)

Sorting performance (read-range.ts:129-133)

2. Bug: Potential race condition in queue size updates

3. Code Quality Issues

Missing null checks (actor-traces.tsx:463-464)

Inconsistent error handling (actor-queue.tsx:32-38)

Magic numbers should be constants

4. Testing Concerns

No tests for new functionality

Test cleanup improvement needed (test/mod.ts:83-85)

5. Documentation

Missing JSDoc comments

Changelog/migration notes needed

🔒 Security

✅ Good practices observed:

⚠️ Minor concern:

📊 Performance Metrics

🎯 Action Items

High Priority

Medium Priority

Low Priority

📝 Code Style

🎉 Summary

Uh oh!

claude bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(frontend): traces ui

Overview

Positive Highlights

Architecture & Design

Code Quality

Issues & Concerns

🔴 Critical Issues

⚠️ High Priority Issues

📝 Medium Priority Issues

🔍 Code Style & Best Practices

🛡️ Security Considerations

🧪 Testing Concerns

Recommendations

Immediate Fixes Required

Before Merge

Follow-up Work

Performance Notes

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NathanFlurry commented Jan 26, 2026 •

edited

Loading

claude bot commented Jan 26, 2026 •

edited

Loading

claude bot commented Jan 28, 2026 •

edited

Loading