[Core] support limit pushdown with pk table #6914

wwj6591812 · 2025-12-28T14:57:48Z

Purpose

一、Background
Issue: #6847
Append table limit pushdown : #6848
This pr is to support limit pushdown with pk table.

二、Code Logic
We try best to filtering manifest entries before data reading.
Limit pushdown is only enabled for DEDUPLICATE/FIRST_ROW merge engines without deletion vectors, as accurate counting requires no merge operations or deleted rows.
We groups files by (partition, bucket) pairs and processes buckets sequentially. For each bucket, the algorithm checks if safe pushdown is possible: files must have no overlapping (same LSM level, excluding level 0) and no delete rows.
1、If safe, it accumulates row counts from file metadata until reaching the limit, then stops processing remaining buckets. 2、If unsafe (overlapping files or delete rows exist), all files in that bucket are included.

Linked issue: close #xxx

Tests

API and Format

Documentation

JingsongLi · 2025-12-29T14:12:22Z

paimon-core/src/main/java/org/apache/paimon/operation/AbstractFileStoreScan.java

            files = postFilterManifestEntries(files);
        }

+        if (supportsLimitPushManifestEntries()) {


You need to do performance test again.

You are right, I will do it.

wwj6591812 · 2026-01-09T11:34:23Z

I've conducted a performance comparison for the Limit PushDown feature. The results prove that it helps to improve the speed of OLAP queries on Paimon using a Flink Session Cluster.

一、Conclusion
1、Limit PushDown provides clear benefits for both PK and Append-Only tables.
2、Due to its complex implementation for PK tables, the feature is controlled by a configuration option.

二、Background
#6847

三、Append Only Table
1、Table Info And SQL

	No Limit Pushdown	With Limit Pushdown
Paimon Table Location	dfs://na61dfsalake1--cn-zhangjiakou/PanguVolume1/alake/omega_sample/s_holo_mainse_rank_xfc_all_features_swift_parsed
SQL	/+ config(cluster=hongli-duibi-0528, job_name=hl_006) / select * from `alake`.`omega_sample`.`s_holo_mainse_rank_xfc_all_features_swift_parsed` /+ `OPTIONS`('scan.dedicated-split-generation' = 'true', 'limit-pushdown.with.pk.table.enabled' = 'false') / limit 10;	/+ config(cluster=hongli-duibi-0528, job_name=hl_007) / select * from `alake`.`omega_sample`.`s_holo_mainse_rank_xfc_all_features_swift_parsed` /+ `OPTIONS`('scan.dedicated-split-generation' = 'true', 'limit-pushdown.with.pk.table.enabled' = 'true') / limit 10;

2、The First Test

3、The Second Test

四、PK Table
1、Table Info And SQL

	No Limit Pushdown	With Limit Pushdown
Paimon Table Location	dfs://ea119dfsalake1--cn-shanghai/PanguVolume2/alake/datalake_dt_rtcdm/s_atplog_base_hour_fi
SQL	/+ config(cluster=hongli-duibi-0528, job_name=hl_008) / select * from `alake`.`datalake_dt_rtcdm`.`s_atplog_base_hour_fi` /+ `OPTIONS`('scan.dedicated-split-generation' = 'true', 'limit-pushdown.with.pk.table.enabled' = 'false') / limit 10;	/+ config(cluster=hongli-duibi-0528, job_name=hl_009) / select * from `alake`.`datalake_dt_rtcdm`.`s_atplog_base_hour_fi` /+ `OPTIONS`('scan.dedicated-split-generation' = 'true', 'limit-pushdown.with.pk.table.enabled' = 'true') / limit 10;

2、The First Test

3、The Second Test

wwj6591812 · 2026-01-10T05:18:13Z

@JingsongLi
Hi，Please CC, Thx.

JingsongLi · 2026-01-13T07:41:30Z

docs/layouts/shortcodes/generated/core_configuration.html

            <td>You can specify a pattern to get a timestamp from partitions. The formatter pattern is defined by 'partition.timestamp-formatter'.<ul><li>By default, read from the first field.</li><li>If the timestamp in the partition is a single field called 'dt', you can use '$dt'.</li><li>If it is spread across multiple fields for year, month, day, and hour, you can use '$year-$month-$day $hour:00:00'.</li><li>If the timestamp is in fields dt and hour, you can use '$dt $hour:00:00'.</li></ul></td>
        </tr>
+        <tr>
+            <td><h5>pk-table-limit-push-down.enabled</h5></td>


Let's remove this config?

JingsongLi

Can you merge limitPushManifestEntries into postFilterManifestEntries?

wwj6591812 · 2026-01-13T14:24:07Z

Can you merge limitPushManifestEntries into postFilterManifestEntries?

done, thx.

please CC . @JingsongLi

JingsongLi · 2026-01-14T02:29:09Z

Hi @wwj6591812

We can optimize the bucket path function (this is the performance bottleneck) and test its performance to see if the optimization effect can be achieved without pushing down the limit to scan.
If the performance improvement is not obvious, it is necessary to merge the two methods, postFilterManifestEntries and limitPushManifestEntries, and only keep one, postFilterManifestEntries.

wwj6591812 · 2026-01-15T14:26:15Z

Hi @wwj6591812

We can optimize the bucket path function (this is the performance bottleneck) and test its performance to see if the optimization effect can be achieved without pushing down the limit to scan.

If the performance improvement is not obvious, it is necessary to merge the two methods, postFilterManifestEntries and limitPushManifestEntries, and only keep one, postFilterManifestEntries.

Hi，@JingsongLi ，I have added caffeine cache to test.

一、The result of test is :
1、Append Table

2、PK Table

二、After test, I've decided to：
1、Performance can degrade with a low cache hit rate. Therefore, instead of adding a cache for bucketPath construction, I've moved this logic out of the for loop to reduce the number of calls to FileStorePathFactory#bucketPath.
2、Merged the limitPushManifestEntries method into postFilterManifestEntries, keeping only the latter.

Please CC, @JingsongLi, Thx

JingsongLi

+1 thanks @wwj6591812 !

wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from 594b49b to f9bb040 Compare December 29, 2025 06:48

JingsongLi reviewed Dec 29, 2025

View reviewed changes

wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from f9bb040 to e1f938c Compare December 30, 2025 00:56

wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from e1f938c to 8d0768d Compare January 9, 2026 10:46

wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch 2 times, most recently from 37e87b7 to b73726c Compare January 10, 2026 04:28

JingsongLi reviewed Jan 13, 2026

View reviewed changes

wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from b73726c to ba32222 Compare January 13, 2026 08:15

JingsongLi reviewed Jan 13, 2026

View reviewed changes

wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from ba32222 to db73d70 Compare January 13, 2026 13:09

[Core] support limit pushdown with pk table

ae3748b

wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from db73d70 to ae3748b Compare January 15, 2026 14:14

JingsongLi approved these changes Jan 16, 2026

View reviewed changes

JingsongLi merged commit 7f34bd3 into apache:master Jan 16, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] support limit pushdown with pk table #6914

[Core] support limit pushdown with pk table #6914

Uh oh!

wwj6591812 commented Dec 28, 2025 •

edited

Loading

Uh oh!

JingsongLi Dec 29, 2025

Uh oh!

wwj6591812 Jan 5, 2026

Uh oh!

wwj6591812 commented Jan 9, 2026 •

edited

Loading

Uh oh!

wwj6591812 commented Jan 10, 2026

Uh oh!

JingsongLi Jan 13, 2026

Uh oh!

wwj6591812 Jan 13, 2026

Uh oh!

JingsongLi left a comment

Uh oh!

wwj6591812 commented Jan 13, 2026

Uh oh!

JingsongLi commented Jan 14, 2026

Uh oh!

wwj6591812 commented Jan 15, 2026 •

edited

Loading

Uh oh!

JingsongLi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Core] support limit pushdown with pk table #6914

[Core] support limit pushdown with pk table #6914

Uh oh!

Conversation

wwj6591812 commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

API and Format

Documentation

Uh oh!

JingsongLi Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

wwj6591812 Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

wwj6591812 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wwj6591812 commented Jan 10, 2026

Uh oh!

JingsongLi Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

wwj6591812 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

wwj6591812 commented Jan 13, 2026

Uh oh!

JingsongLi commented Jan 14, 2026

Uh oh!

wwj6591812 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wwj6591812 commented Dec 28, 2025 •

edited

Loading

wwj6591812 commented Jan 9, 2026 •

edited

Loading

wwj6591812 commented Jan 15, 2026 •

edited

Loading