-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[Core] support limit pushdown with pk table #6914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Core] support limit pushdown with pk table #6914
Conversation
594b49b to
f9bb040
Compare
| files = postFilterManifestEntries(files); | ||
| } | ||
|
|
||
| if (supportsLimitPushManifestEntries()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to do performance test again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I will do it.
f9bb040 to
e1f938c
Compare
e1f938c to
8d0768d
Compare
|
I've conducted a performance comparison for the Limit PushDown feature. The results prove that it helps to improve the speed of OLAP queries on Paimon using a Flink Session Cluster. 一、Conclusion 二、Background 三、Append Only Table
四、PK Table
|
8d0768d to
37e87b7
Compare
37e87b7 to
b73726c
Compare
|
@JingsongLi |




Purpose
一、Background
Issue: #6847
Append table limit pushdown : #6848
This pr is to support limit pushdown with pk table.
二、Code Logic
We try best to filtering manifest entries before data reading.
Limit pushdown is only enabled for DEDUPLICATE/FIRST_ROW merge engines without deletion vectors, as accurate counting requires no merge operations or deleted rows.
We groups files by (partition, bucket) pairs and processes buckets sequentially. For each bucket, the algorithm checks if safe pushdown is possible: files must have no overlapping (same LSM level, excluding level 0) and no delete rows.
1、If safe, it accumulates row counts from file metadata until reaching the limit, then stops processing remaining buckets. 2、If unsafe (overlapping files or delete rows exist), all files in that bucket are included.
Linked issue: close #xxx
Tests
API and Format
Documentation