Skip to content

feat: Readiness endpoint#2188

Open
sergerad wants to merge 17 commits into
nextfrom
sergerad-readiness-endpoint
Open

feat: Readiness endpoint#2188
sergerad wants to merge 17 commits into
nextfrom
sergerad-readiness-endpoint

Conversation

@sergerad
Copy link
Copy Markdown
Collaborator

@sergerad sergerad commented Jun 3, 2026

Relates to #2176.

Adds support for grpc.health.v1.Health/Check in RPC server.

Results locally:

miden-node/proto/proto on  sergerad-readiness-endpoint [$] 
❯ grpcurl -plaintext -d '{"service": "rpc.Api"}' "localhost:57291" grpc.health.v1.Health/Check

{
  "status": "SERVING"
}

miden-node/proto/proto on  sergerad-readiness-endpoint [$] 
❮ grpcurl -plaintext -d '{"service": "rpc.Api"}' "localhost:57292" grpc.health.v1.Health/Check

{
  "status": "SERVING"
}

miden-node/proto/proto on  sergerad-readiness-endpoint [$] 
❮ grpcurl -plaintext -d '{"service": "rpc.Api"}' "localhost:57293" grpc.health.v1.Health/Check

{
  "status": "SERVING"
}

@sergerad sergerad changed the title Readiness endpoint feat: Readiness endpoint Jun 3, 2026
Comment thread proto/proto/rpc.proto
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that bothers me here though, is that the load-balancer must also serve this endpoint 🤔

But I suppose that they would just forward it, and if anything is available it returns Ok, and if nothing is available they wouldn't have a service. So maybe it just works :)

@Mirko-von-Leipzig
Copy link
Copy Markdown
Collaborator

After some digging, it looks like gRPC has its own health check service defined. We should probably consider that instead since it probably is supported by load-balancers out the box (..maybe).

https://github.com/grpc/grpc/blob/master/doc/health-checking.md

@Mirko-von-Leipzig
Copy link
Copy Markdown
Collaborator

Mirko-von-Leipzig commented Jun 4, 2026

I think this approach adds some complexity that we would never leverage (service name, additional status codes). Happy to do it if we think we will use it. But if not, prefer to keep simple with just 0 and 14 codes and no service names.

The way I understand it, its still just the same except we return an enum instead of the status code.

And since each server effectively only has a single gRPC service (+ health service I guess), we only need to support the empty, health (always returns healthy), and the specific gRPC service the server implements? If we do this only for nodes then its always:

"" - overall server health
"rpc.Api"
"Health"

@sergerad sergerad marked this pull request as ready for review June 5, 2026 01:09
Comment thread crates/rpc/src/server/mod.rs Outdated
Comment on lines +188 to +207
const POLL_INTERVAL: Duration = Duration::from_secs(30);

loop {
let is_ready = match source_rpc.status(Request::new(())).await {
Ok(response) => {
let upstream_tip =
response.into_inner().block_producer.map_or(u32::MAX, |b| b.chain_tip);
let local_tip = store.chain_tip(Finality::Committed).await.as_u32();
upstream_tip.saturating_sub(local_tip) <= readiness_threshold
},
Err(_) => false,
};

if is_ready {
reporter.set_serving::<api_server::ApiServer<api::RpcService>>().await;
} else {
reporter.set_not_serving::<api_server::ApiServer<api::RpcService>>().await;
}

tokio::time::sleep(POLL_INTERVAL).await;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, this is why I don't want to use rocksdb as the source of truth.

This would be much easier if we had a chain tip watch channel to monitor instead.

Comment thread crates/rpc/src/server/mod.rs Outdated
Comment thread bin/node/src/commands/rpc.rs
Comment thread crates/rpc/src/server/mod.rs Outdated
Comment thread crates/rpc/src/server/mod.rs Outdated
let local_tip = store.chain_tip(Finality::Committed).await.as_u32();
upstream_tip.saturating_sub(local_tip) <= readiness_threshold
},
Err(_) => false,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a problem? This means a single request failure from upstream removes this node entirely.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a threshold of const 3. LMK thoughts

Comment thread CHANGELOG.md Outdated
};
self.reporter.set_service_status(Self::SERVICE_NAME, status).await;
}
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mirko-von-Leipzig Wondering why the rpc sync code is in block producer crate. Maybe in followup I can look at moving the sync related stuff out? Unsure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants