Skip to content

Conversation

@wwj6591812
Copy link
Contributor

@wwj6591812 wwj6591812 commented Dec 28, 2025

Purpose

一、Background
Issue: #6847
Append table limit pushdown : #6848
This pr is to support limit pushdown with pk table.

二、Code Logic
We try best to filtering manifest entries before data reading.
Limit pushdown is only enabled for DEDUPLICATE/FIRST_ROW merge engines without deletion vectors, as accurate counting requires no merge operations or deleted rows.
We groups files by (partition, bucket) pairs and processes buckets sequentially. For each bucket, the algorithm checks if safe pushdown is possible: files must have no overlapping (same LSM level, excluding level 0) and no delete rows.
1、If safe, it accumulates row counts from file metadata until reaching the limit, then stops processing remaining buckets. 2、If unsafe (overlapping files or delete rows exist), all files in that bucket are included.

Linked issue: close #xxx

Tests

API and Format

Documentation

@wwj6591812 wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from 594b49b to f9bb040 Compare December 29, 2025 06:48
files = postFilterManifestEntries(files);
}

if (supportsLimitPushManifestEntries()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to do performance test again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I will do it.

@wwj6591812 wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from f9bb040 to e1f938c Compare December 30, 2025 00:56
@wwj6591812 wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from e1f938c to 8d0768d Compare January 9, 2026 10:46
@wwj6591812
Copy link
Contributor Author

wwj6591812 commented Jan 9, 2026

I've conducted a performance comparison for the Limit PushDown feature. The results prove that it helps to improve the speed of OLAP queries on Paimon using a Flink Session Cluster.

一、Conclusion
1、Limit PushDown provides clear benefits for both PK and Append-Only tables.
2、Due to its complex implementation for PK tables, the feature is controlled by a configuration option.

二、Background
#6847

三、Append Only Table
1、Table Info And SQL

  No Limit Pushdown With Limit Pushdown
Paimon Table Location dfs://na61dfsalake1--cn-zhangjiakou/PanguVolume1/alake/omega_sample/s_holo_mainse_rank_xfc_all_features_swift_parsed
SQL /*+ config(cluster=hongli-duibi-0528, job_name=hl_006) / select * from alake.omega_sample.s_holo_mainse_rank_xfc_all_features_swift_parsed /+ OPTIONS('scan.dedicated-split-generation' = 'true', 'limit-pushdown.with.pk.table.enabled' = 'false') */ limit 10; /*+ config(cluster=hongli-duibi-0528, job_name=hl_007) / select * from alake.omega_sample.s_holo_mainse_rank_xfc_all_features_swift_parsed /+ OPTIONS('scan.dedicated-split-generation' = 'true', 'limit-pushdown.with.pk.table.enabled' = 'true') */ limit 10;

2、The First Test
image

3、The Second Test
image

四、PK Table
1、Table Info And SQL

  No Limit Pushdown With Limit Pushdown
Paimon Table Location dfs://ea119dfsalake1--cn-shanghai/PanguVolume2/alake/datalake_dt_rtcdm/s_atplog_base_hour_fi
SQL /*+ config(cluster=hongli-duibi-0528, job_name=hl_008) / select * from alake.datalake_dt_rtcdm.s_atplog_base_hour_fi /+ OPTIONS('scan.dedicated-split-generation' = 'true', 'limit-pushdown.with.pk.table.enabled' = 'false') */ limit 10; /*+ config(cluster=hongli-duibi-0528, job_name=hl_009) / select * from alake.datalake_dt_rtcdm.s_atplog_base_hour_fi /+ OPTIONS('scan.dedicated-split-generation' = 'true', 'limit-pushdown.with.pk.table.enabled' = 'true') */ limit 10;

2、The First Test
image

3、The Second Test
image

@wwj6591812 wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from 8d0768d to 37e87b7 Compare January 9, 2026 11:51
@wwj6591812 wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from 37e87b7 to b73726c Compare January 10, 2026 04:28
@wwj6591812
Copy link
Contributor Author

@JingsongLi
Hi,Please CC, Thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants