Skip to content

Conversation

@wdauchy
Copy link
Contributor

@wdauchy wdauchy commented Jan 5, 2026

Commit Message:
I am chasing a case where a client sends POST messages on a HTTP1.1 connection, but is being systematically denied. The connection is immediately reused by the client. In some case we seem to trigger a racy case where the number of open file descriptor explodes.

When a stream becomes a zombie (waiting for codec encode completion) and the connection is in DrainState::Closing, the connection is not closed because checkForDeferredClose() is not called after the zombie stream was finally destroyed.

This commonly occurs with HTTP/1.1 connections when:

  • ext_authz (or similar filter) denies a request early, sending a 403 response before the full request body is received
  • The connection is (potentially) marked DrainState::Closing
  • The stream becomes a zombie waiting for the codec to finish encoding
  • When onCodecEncodeComplete() fires, doDeferredStreamDestroy() was called but checkForDeferredClose() was not, leaving the connection open

In scenarios where clients try to reuse the same HTTP/1.1 connection and requests are systematically denied (e.g., by ext_authz), this causes rapid FD exhaustion as each denied request leaves an orphaned connection.

The fix adds checkForDeferredClose() calls after zombie stream destruction in both onCodecEncodeComplete() and onCodecLowLevelReset().
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

@repokitteh-read-only
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #42859 was opened by wdauchy.

see: more, trace.

I am chasing a case where a client sends POST messages on a HTTP1.1
connection, but is being systematically denied. The connection is
immediately reused by the client. In some case we seem to trigger a racy
case where the number of open file descriptor explodes.

When a stream becomes a zombie (waiting for codec encode completion) and
the connection is in DrainState::Closing, the connection is not closed
because checkForDeferredClose() is not called after the zombie stream
was finally destroyed.

This commonly occurs with HTTP/1.1 connections when:
- ext_authz (or similar filter) denies a request early, sending a 403
  response before the full request body is received
- The connection is (potentially) marked DrainState::Closing
- The stream becomes a zombie waiting for the codec to finish encoding
- When onCodecEncodeComplete() fires, doDeferredStreamDestroy() was called
  but checkForDeferredClose() was not, leaving the connection open

In scenarios where clients try to reuse the same HTTP/1.1 connection and
requests are systematically denied (e.g., by ext_authz), this causes
rapid FD exhaustion as each denied request leaves an orphaned
connection.

The fix adds checkForDeferredClose() calls after zombie stream destruction
in both onCodecEncodeComplete() and onCodecLowLevelReset().

Signed-off-by: William Dauchy <[email protected]>
@botengyao
Copy link
Member

botengyao commented Jan 6, 2026

@KBaichoo, Kevin, do you want also to take a look for this?

Copy link
Member

@botengyao botengyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, and left a high level question.

/wait

// closed. doEndStream() call that created the zombie may have set
// drain_state_ to Closing, but checkForDeferredClose() couldn't close the
// connection at that time because streams_ wasn't empty yet.
connection_manager_.checkForDeferredClose(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, should we distinguish all the conditions like in the doEndStream

if (check_for_deferred_close) {
// If HTTP/1.0 has no content length, it is framed by close and won't consider
// the request complete until the FIN is read. Don't delay close in this case.
const bool http_10_sans_cl =
(codec_->protocol() == Protocol::Http10) &&
(!stream.response_headers_ || !stream.response_headers_->ContentLength());
// We also don't delay-close in the case of HTTP/1.1 where the request is
// fully read, as there's no race condition to avoid.
const bool connection_close =
stream.filter_manager_.streamInfo().shouldDrainConnectionUponCompletion();
const bool request_complete = stream.filter_manager_.hasLastDownstreamByteReceived();
// Don't do delay close for HTTP/1.0 or if the request is complete.
checkForDeferredClose(connection_close && (request_complete || http_10_sans_cl));
}
to call checkForDeferredClose(false) here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. When doEndStream() creates a zombie stream, it does call checkForDeferredClose() with the computed skip_delay_close value, but that call is a no-op because streams_ is possibly not empty yet, so the computed value is effectively lost.

I chose checkForDeferredClose(false) to avoid duplicating that conditional logic, accepting the tradeoff that we always use the delay close path even when it's not strictly necessary.

If you prefer consistency with doEndStream(), I can add the same conditional logic here but I am not sure this is 100% needed in the zombie case. Alternatively, we could store the computed skip_delay_close value in stream state when it becomes a zombie and reuse it later. Let me know which approach you'd prefer.

Copy link
Member

@botengyao botengyao Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we would need to keep the logic consistent to the existing logic, and maybe refactoring it a bit. There is a long history for the deferClose as these conditions were added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feedback. I updated the PR about it

@botengyao botengyao self-assigned this Jan 7, 2026
@wdauchy wdauchy marked this pull request as ready for review January 7, 2026 18:16
@wdauchy wdauchy force-pushed the fd_leak branch 6 times, most recently from 495f026 to c96a9be Compare January 8, 2026 08:42
@wdauchy wdauchy requested a review from botengyao January 8, 2026 10:34
@wdauchy
Copy link
Contributor Author

wdauchy commented Jan 8, 2026

/retest transients

Copy link
Member

@botengyao botengyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

if (state_.is_zombie_stream_) {
const bool skip_delay = shouldSkipDeferredCloseDelay();
connection_manager_.doDeferredStreamDestroy(*this);
// After destroying a zombie stream, check if the connection should be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding a runtime guard and a release note? I think this will be a high risk change since the release will be next week, does merging it after early next week work for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants