Skip to content

feat(v2:telemetry): more metrics & grafana dashboards#1442

Open
InKryption wants to merge 14 commits into
mainfrom
ink/more-metrics-and-grafana
Open

feat(v2:telemetry): more metrics & grafana dashboards#1442
InKryption wants to merge 14 commits into
mainfrom
ink/more-metrics-and-grafana

Conversation

@InKryption

Copy link
Copy Markdown
Contributor

No description provided.

@github-project-automation github-project-automation Bot moved this to 🏗 In progress in Sig May 11, 2026
@InKryption InKryption changed the base branch from main to ink/telemetry-log-filters May 11, 2026 04:33
@InKryption InKryption force-pushed the ink/telemetry-log-filters branch from 93d1755 to ceafbbe Compare May 11, 2026 04:36
@InKryption InKryption force-pushed the ink/more-metrics-and-grafana branch from 3b8edf7 to a88dab6 Compare May 11, 2026 04:36
@InKryption InKryption changed the title feat(v2,telemetry): more metrics & grafana dashboards feat(v2:telemetry): more metrics & grafana dashboards May 11, 2026
@InKryption InKryption force-pushed the ink/more-metrics-and-grafana branch from a88dab6 to 3942c1b Compare May 11, 2026 04:49
@codecov

codecov Bot commented May 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 201 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
v2/lib/gossip/node.zig 0.00% 88 Missing ⚠️
v2/lib/time.zig 0.00% 28 Missing ⚠️
v2/lib/snapshot/download.zig 0.00% 27 Missing ⚠️
v2/lib/telemetry/metric.zig 0.00% 23 Missing ⚠️
v2/init/main.zig 0.00% 16 Missing ⚠️
v2/lib/telemetry.zig 0.00% 11 Missing ⚠️
v2/services/net.zig 0.00% 6 Missing ⚠️
v2/services/gossip.zig 0.00% 2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (85.00%). You can increase the patch coverage or adjust the target coverage.

Files with missing lines Coverage Δ
v2/lib/gossip.zig 0.00% <ø> (ø)
v2/services/gossip.zig 0.00% <0.00%> (ø)
v2/services/net.zig 0.00% <0.00%> (ø)
v2/lib/telemetry.zig 57.83% <0.00%> (-2.75%) ⬇️
v2/init/main.zig 0.00% <0.00%> (ø)
v2/lib/telemetry/metric.zig 0.00% <0.00%> (ø)
v2/lib/snapshot/download.zig 10.46% <0.00%> (ø)
v2/lib/time.zig 0.00% <0.00%> (ø)
v2/lib/gossip/node.zig 0.00% <0.00%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@InKryption InKryption force-pushed the ink/more-metrics-and-grafana branch from 3942c1b to 532c76e Compare May 11, 2026 05:38
@InKryption InKryption force-pushed the ink/telemetry-log-filters branch from ceafbbe to 48c4947 Compare May 13, 2026 17:02
@InKryption InKryption force-pushed the ink/more-metrics-and-grafana branch 2 times, most recently from dbe1c5a to a6c99dc Compare May 14, 2026 11:43
@InKryption InKryption force-pushed the ink/telemetry-log-filters branch from 48c4947 to 32475ad Compare May 14, 2026 11:43
@InKryption InKryption requested review from dnut and kprotty May 19, 2026 00:24
@prestonsn prestonsn self-requested a review May 19, 2026 13:04
Base automatically changed from ink/telemetry-log-filters to main May 19, 2026 16:38
@InKryption InKryption force-pushed the ink/more-metrics-and-grafana branch from a6c99dc to dce1e5d Compare May 20, 2026 16:00
@InKryption InKryption changed the base branch from main to ink/fix-zig-build-for-macos May 20, 2026 16:04
@InKryption InKryption changed the base branch from ink/fix-zig-build-for-macos to main May 20, 2026 16:05
@InKryption InKryption force-pushed the ink/more-metrics-and-grafana branch 5 times, most recently from d0d11ab to 8407233 Compare May 26, 2026 14:55
@dnut dnut linked an issue May 29, 2026 that may be closed by this pull request
@InKryption InKryption force-pushed the ink/more-metrics-and-grafana branch 4 times, most recently from 96c8137 to 24b61de Compare June 3, 2026 17:36
@InKryption InKryption force-pushed the ink/more-metrics-and-grafana branch 3 times, most recently from 563c1e6 to 08f7192 Compare June 9, 2026 20:42
Comment thread v2/lib/telemetry.zig Outdated
Comment on lines +280 to +284
/// Equivalent to `Logger`, but with a runtime `scope`.
/// This is intended as the parameter of "leaf" or "branch" functions that want to inherit the
/// caller's scope; it is for this reason that it can only convert to and from an actual scoped
/// logger, and not another `AnyLogger`.
pub const AnyLogger = struct {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this abstraction needed yet? I don't see any usages of AnyLogger outside this file.

My preference would be for a log message's scope to reflect the location where it is emitted. That matches how loggers commonly behave across languages, and it gives readers a stable, predictable signal about where the log originated.

I agree there can be value in adding tracing context, especially for general-purpose functions called from many places. I just would not overload scope to serve both purposes, since that makes the field ambiguous for people reading the logs. I would rather keep scope as the source location and add a separate contextual field for tracing or caller-specific information. That preserves a conventional and precise meaning for scope while still supporting the additional context this is aiming for.

Given that, I think it would be simpler to remove this abstraction for now and keep scope static. Supporting both static and dynamic scope adds another layer of complexity that callers and readers need to reason about. We can add dynamic context fields once there is a clear need for them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally did have a change that used this, but it was swept away after a rebase; the main reason I introduced it was because there was a general-purpose function in gossip which was called from many contexts, and in fact its behaviour depended explicitly on who the caller was (it had an enum that directly said who the caller was).

But yeah, given it isn't needed, I have no issue removing it, long live YAGNI.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped the commit

@InKryption InKryption force-pushed the ink/more-metrics-and-grafana branch 2 times, most recently from 301146b to 004ff44 Compare June 11, 2026 16:02
@prestonsn

prestonsn commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Not related to the changes in this PR directly, but while testing the new grafana setup locally I found that prometheus was rejecting/not decoding the histogram metrics (Explore tab was not showing any data for histogram metrics)

I think this has to do with a formatting issue here:

try w.print("le={f}", .{numberFmt(bucket.upper_bound)});

Changing that line to include the outer double-quotes in the payload seems to fix this:

try w.print("le=\"{f}\"", .{numberFmt(bucket.upper_bound)});

versions i'm using in case it has something to do with it:

Grafana: 13.0.1 
Prometheus: 3.12.0

Update: it seems like my prometheus setup can't scrape any metrics unless I make the above code change

Another update: I did sidestep the docker setup and just ran prometheus and grafana services directly for testing things on my end. Maybe this isn't an issue in proper setups, not something that is blocking for me 🫡

Another update after the other update: it seems like prometheus scrape fails without the double quote change. From the spec it seems like le=x is actually treated like a label, so it has to be in double quotes (le="x") in the payload

@prestonsn prestonsn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just a few minor things I noticed

Comment thread v2/lib/time.zig
Comment on lines +13 to +23
/// Queries the system for the current moment of time as an Instant.
/// This is not guaranteed to be monotonic or steadily increasing, but for
/// most implementations it is.
/// Returns `error.Unsupported` when a suitable clock is not detected.
pub fn now() Instant {
const clock_id: std.posix.CLOCK = .BOOTTIME;
const ts = std.posix.clock_gettime(clock_id) catch |err| {
std.debug.panic("clock_gettime unsupported: {}", .{err});
};
return .{ .timestamp = ts };
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the Instant and Timer abstractions a lot. I've got a couple questions tho:

  1. Is there a reason why we don't use .MONOTONIC here? last i checked, the std lib uses it too (but perhaps that's part of the reason why its getting overhauled?)
  2. Could we integrate/use lib.clock here in some way? It has the warmup path already setup. Wondering if this separation was intended

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On point 1, not sure iiuc. Are you saying stdlib uses MONOTONIC? Because that may be true for something else, but the Instant abstraction uses BOOTTIME on linux. If you're suggesting replacing BOOTTIME to stray from what stdlib was using, then sure I suppose, I think it probably doesn't matter a huge amount in our case.
On point 2, yeah, I implemented this before lib.clock got implemented. I'll give this another pass and probably move this into there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on 1, I see your point actually. I was mostly curious as why we chose BOOTTIME here. I checked the zig std lib and was totally wrong on stating it uses MONOTONIC, I thought it did for some reason. I'm cool with sticking with BOOTTIME for this

Comment thread v2/lib/gossip/node.zig
Comment thread v2/lib/snapshot/download.zig
Comment thread v2/lib/snapshot/download.zig
Comment thread v2/metrics/grafana/dashboards/gossip.json
@InKryption InKryption force-pushed the ink/more-metrics-and-grafana branch from 004ff44 to 9f965e6 Compare June 12, 2026 04:22
@InKryption InKryption force-pushed the ink/more-metrics-and-grafana branch from 9f965e6 to 4cf6876 Compare June 12, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🏗 In progress

Development

Successfully merging this pull request may close these issues.

feat(v2): dashboards from telemetry data

3 participants