Skip to content

Conversation

@jadami10
Copy link
Contributor

@jadami10 jadami10 commented Jan 2, 2026

This is a potential bugfix addressing #16801.

I know we've been saying to use log4j, but query logs need to be correlated since we log once when the query is received and once when it is completed. Also, this feature already exists, and log4j is tedious to configure/test. #15264 switched the received query logs to .info and to not respect rate limiting. Instead, "received" logs just consume rate limiting which, for high QPS clusters, leads to the majority of logs just being "received" logs.

In this PR, the rate limiting is determined up front when the query received is logged. We choose whether to log "received", and that determines whether we log "completed" as well. This way, you should always get both logs for a given query rather than some arbitrary mix of both.

This is slightly backwards incompatible. For clusters with default settings or low RPS, they won't notice a differences. They will continue to see all logs. For clusters where "log before query" is false, this will continue to work the same way.

For clusters with higher RPS, CONFIG_OF_BROKER_QUERY_LOG_MAX_RATE_PER_SECOND is semantically changing to control the number of queries logged per second rather than the number of logs per second. For clusters where rate limit == RPS, they may see 2x the logs since this change will effectively cause "completed" to show up for each received query. For clusters where RPS >> rate limit, they will see a reduction in logs since the "received" query logs will not be rate limited, but the trade off is they will consistently see received/completed per query.

I specifically tested this on an internal cluster that only saw intermittent rate limiting every hour.
image

After my change, you can see the log volume doubled during rate limiting, but there's no longer a mismatch between the number of "received" vs "completed" logs.

@jadami10
Copy link
Contributor Author

jadami10 commented Jan 2, 2026

cc @Jackie-Jiang @gortiz

this hopefully is a better setup for most Pinot clusters and should address the spike in log volume associated with logging "query received".

@codecov-commenter
Copy link

codecov-commenter commented Jan 2, 2026

Codecov Report

❌ Patch coverage is 61.53846% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.27%. Comparing base (668bf02) to head (dccbc73).

Files with missing lines Patch % Lines
...requesthandler/MultiStageBrokerRequestHandler.java 0.00% 3 Missing ⚠️
...sthandler/BaseSingleStageBrokerRequestHandler.java 33.33% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17452      +/-   ##
============================================
- Coverage     63.28%   63.27%   -0.01%     
  Complexity     1474     1474              
============================================
  Files          3161     3161              
  Lines        188588   188589       +1     
  Branches      28857    28858       +1     
============================================
- Hits         119345   119329      -16     
- Misses        59992    60018      +26     
+ Partials       9251     9242       -9     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (?)
java-11 63.24% <61.53%> (-0.01%) ⬇️
java-21 63.24% <61.53%> (+7.67%) ⬆️
temurin 63.27% <61.53%> (-0.01%) ⬇️
unittests 63.27% <61.53%> (-0.01%) ⬇️
unittests1 55.59% <ø> (-0.02%) ⬇️
unittests2 34.01% <61.53%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants