-
Notifications
You must be signed in to change notification settings - Fork 5.2k
http: fix potential FD leak when zombie streams prevent connection close #42859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I am chasing a case where a client sends POST messages on a HTTP1.1 connection, but is being systematically denied. The connection is immediately reused by the client. In some case we seem to trigger a racy case where the number of open file descriptor explodes. When a stream becomes a zombie (waiting for codec encode completion) and the connection is in DrainState::Closing, the connection is not closed because checkForDeferredClose() is not called after the zombie stream was finally destroyed. This commonly occurs with HTTP/1.1 connections when: - ext_authz (or similar filter) denies a request early, sending a 403 response before the full request body is received - The connection is (potentially) marked DrainState::Closing - The stream becomes a zombie waiting for the codec to finish encoding - When onCodecEncodeComplete() fires, doDeferredStreamDestroy() was called but checkForDeferredClose() was not, leaving the connection open In scenarios where clients try to reuse the same HTTP/1.1 connection and requests are systematically denied (e.g., by ext_authz), this causes rapid FD exhaustion as each denied request leaves an orphaned connection. The fix adds checkForDeferredClose() calls after zombie stream destruction in both onCodecEncodeComplete() and onCodecLowLevelReset(). Signed-off-by: William Dauchy <[email protected]>
|
@KBaichoo, Kevin, do you want also to take a look for this? |
botengyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this, and left a high level question.
/wait
| // closed. doEndStream() call that created the zombie may have set | ||
| // drain_state_ to Closing, but checkForDeferredClose() couldn't close the | ||
| // connection at that time because streams_ wasn't empty yet. | ||
| connection_manager_.checkForDeferredClose(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then, should we distinguish all the conditions like in the doEndStream
envoy/source/common/http/conn_manager_impl.cc
Lines 314 to 328 in 25fedad
| if (check_for_deferred_close) { | |
| // If HTTP/1.0 has no content length, it is framed by close and won't consider | |
| // the request complete until the FIN is read. Don't delay close in this case. | |
| const bool http_10_sans_cl = | |
| (codec_->protocol() == Protocol::Http10) && | |
| (!stream.response_headers_ || !stream.response_headers_->ContentLength()); | |
| // We also don't delay-close in the case of HTTP/1.1 where the request is | |
| // fully read, as there's no race condition to avoid. | |
| const bool connection_close = | |
| stream.filter_manager_.streamInfo().shouldDrainConnectionUponCompletion(); | |
| const bool request_complete = stream.filter_manager_.hasLastDownstreamByteReceived(); | |
| // Don't do delay close for HTTP/1.0 or if the request is complete. | |
| checkForDeferredClose(connection_close && (request_complete || http_10_sans_cl)); | |
| } |
checkForDeferredClose(false) here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. When doEndStream() creates a zombie stream, it does call checkForDeferredClose() with the computed skip_delay_close value, but that call is a no-op because streams_ is possibly not empty yet, so the computed value is effectively lost.
I chose checkForDeferredClose(false) to avoid duplicating that conditional logic, accepting the tradeoff that we always use the delay close path even when it's not strictly necessary.
If you prefer consistency with doEndStream(), I can add the same conditional logic here but I am not sure this is 100% needed in the zombie case. Alternatively, we could store the computed skip_delay_close value in stream state when it becomes a zombie and reuse it later. Let me know which approach you'd prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we would need to keep the logic consistent to the existing logic, and maybe refactoring it a bit. There is a long history for the deferClose as these conditions were added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the feedback. I updated the PR about it
495f026 to
c96a9be
Compare
Signed-off-by: William Dauchy <[email protected]>
|
/retest transients |
botengyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/wait
| if (state_.is_zombie_stream_) { | ||
| const bool skip_delay = shouldSkipDeferredCloseDelay(); | ||
| connection_manager_.doDeferredStreamDestroy(*this); | ||
| // After destroying a zombie stream, check if the connection should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind adding a runtime guard and a release note? I think this will be a high risk change since the release will be next week, does merging it after early next week work for you?
Commit Message:
I am chasing a case where a client sends POST messages on a HTTP1.1 connection, but is being systematically denied. The connection is immediately reused by the client. In some case we seem to trigger a racy case where the number of open file descriptor explodes.
When a stream becomes a zombie (waiting for codec encode completion) and the connection is in DrainState::Closing, the connection is not closed because checkForDeferredClose() is not called after the zombie stream was finally destroyed.
This commonly occurs with HTTP/1.1 connections when:
In scenarios where clients try to reuse the same HTTP/1.1 connection and requests are systematically denied (e.g., by ext_authz), this causes rapid FD exhaustion as each denied request leaves an orphaned connection.
The fix adds checkForDeferredClose() calls after zombie stream destruction in both onCodecEncodeComplete() and onCodecLowLevelReset().
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]