-
Notifications
You must be signed in to change notification settings - Fork 427
Parallelize ChannelMonitor loading from async KVStores
#4147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Parallelize ChannelMonitor loading from async KVStores
#4147
Conversation
|
👋 Thanks for assigning @jkczyz as a reviewer! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4147 +/- ##
==========================================
- Coverage 89.36% 86.50% -2.87%
==========================================
Files 180 158 -22
Lines 139366 101937 -37429
Branches 139366 101937 -37429
==========================================
- Hits 124547 88177 -36370
+ Misses 12227 11336 -891
+ Partials 2592 2424 -168
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4a773c9 to
31fbd16
Compare
|
This doesn't actually work. monitor loading is (at least for a semi-local disk) CPU-bound, so we really need to spawn each monitor load task rather than having one task. |
64412a5 to
2c807b9
Compare
2c807b9 to
0a2547a
Compare
|
This requires GATs which requires an MSRV bump (to 1.64), but we're planning on doing that soon anyway. |
0a2547a to
a4a778c
Compare
|
🔔 1st Reminder Hey @jkczyz! This PR has been waiting for your review. |
a4a778c to
3346cba
Compare
|
🔔 2nd Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 3rd Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 4th Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 5th Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 6th Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 7th Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 8th Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 9th Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 10th Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 11th Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 12th Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 13th Reminder Hey @jkczyz! This PR has been waiting for your review. |
15706ef to
d607625
Compare
|
🔔 2nd Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 3rd Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 4th Reminder Hey @jkczyz! This PR has been waiting for your review. |
| log_error!( | ||
| self.logger, | ||
| "Monitor update failed. monitor: {} update: {} reason: {:?}", | ||
| monitor_key, | ||
| update_name.as_str(), | ||
| e | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation looks off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take it up with rustfmt 🤷♂️
|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
f2b14fb to
fa48356
Compare
|
Tagging 0.3 since we've had complaints about startup time with remote persistence (which appears to be becoming more common). |
jkczyz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please squash.
82cbb76 to
8873d1c
Compare
|
Squashed and pushed with only reverting the change to make |
8873d1c to
5444e76
Compare
|
Oops, no, it actually has to be |
5444e76 to
b47ea70
Compare
|
Also silenced clippy: $ git diff-tree -U1 5444e7613 986a3caa8
diff --git a/lightning/src/util/native_async.rs b/lightning/src/util/native_async.rs
index 32c6049f3b..0c380f2b1d 100644
--- a/lightning/src/util/native_async.rs
+++ b/lightning/src/util/native_async.rs
@@ -82,3 +82,10 @@ impl<O> Clone for FutureQueueCompletion<O> {
fn clone(&self) -> Self {
- Self(self.0.clone())
+ #[cfg(all(test, feature = "std"))]
+ {
+ Self(Arc::clone(&self.0))
+ }
+ #[cfg(all(test, not(feature = "std")))]
+ {
+ Self(Rc::clone(&self.0))
+ }
} |
`tokio::spawn` can be use both to spawn a forever-running background task or to spawn a task which gets `poll`ed independently and eventually returns a result which the callsite wants. In LDK, we have only ever needed the first, and thus didn't bother defining a return type for `FutureSpawner::spawn`. However, in the next commit we'll start using `FutureSpawner` in a context where we actually do want the spawned future's result. Thus, here, we add a result output to `FutureSpawner::spawn`, mirroring the `tokio::spawn` API.
`MonitorUpdatingPersister::read_all_channel_monitors_with_updates` was made to do the IO operations in parallel in a previous commit, however in practice this doesn't provide material parallelism for large routing nodes. Because deserializing `ChannelMonitor`s is the bulk of the work (when IO operations are sufficiently fast), we end up blocked in single-threaded work nearly the entire time. Here, we add an alternative option - a new `read_all_channel_monitors_with_updates_parallel` method which uses the `FutureSpawner` to cause the deserialization operations to proceed in parallel.
When reading `ChannelMonitor`s from a `MonitorUpdatingPersister` on startup, we have to make sure to load any `ChannelMonitorUpdate`s and re-apply them as well. For users of async persistence who don't have any `ChannelMonitorUpdate`s (e.g. because they set `maximum_pending_updates` to 0 or, in the future, we avoid persisting updates for small `ChannelMonitor`s), this means two round-trips to the storage backend, one to load the `ChannelMonitor` and one to try to read the next `ChannelMonitorUpdate` only to have it fail. Instead, here, we use `KVStore::list` to fetch the list of stored `ChannelMonitorUpdate`s, which for async `KVStore` users allows us to parallelize the list of update fetching and the `ChannelMonitor` loading itself. Then we know exactly when to stop reading `ChannelMonitorUpdate`s, including reading none if there are none to read. This also avoids relying on `KVStore::read` correctly returning `NotFound` in order to correctly discover when to stop reading `ChannelMonitorUpdate`s.
When reading `ChannelMonitor`s from a `MonitorUpdatingPersister` on startup, we have to make sure to load any `ChannelMonitorUpdate`s and re-apply them as well. Now that we know which `ChannelMonitorUpdate`s to load from `list`ing the entries from the `KVStore` we can parallelize the reads themselves, which we do here. Now, loading all `ChannelMonitor`s from an async `KVStore` requires only three full RTTs - one to list the set of `ChannelMonitor`s, one to both fetch the `ChanelMonitor` and list the set of `ChannelMonitorUpdate`s, and one to fetch all the `ChannelMonitorUpdate`s (with the last one skipped when there are no `ChannelMonitorUpdate`s to read).
This is really dumb, `assert!(cfg!(fuzzing))` is a perfectly reasonable thing to write!
b47ea70 to
986a3ca
Compare
|
✅ Added second reviewer: @wpaulino |
Based on
#4146#4175just cause I don't want to resolve conflicts.