Skip to content

Conversation

@rluvaton
Copy link
Member

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@rluvaton
Copy link
Member Author

run benchmark aggregate_query_sql

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Dec 15, 2025
@rluvaton
Copy link
Member Author

show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @rluvaton, you asked to view the benchmark queue (#19346 (comment)).

Job User Benchmarks Comment
19287_3657520190.sh alamb external_aggr https://github.com/apache/datafusion/pull/19287#issuecomment-3657520190
19344_3657763461.sh alamb default https://github.com/apache/datafusion/pull/19344#issuecomment-3657763461
19346_3657820674.sh rluvaton aggregate_query_sql https://github.com/apache/datafusion/pull/19346#issuecomment-3657820674

@Dandandan
Copy link
Contributor

run benchmark aggregate_query_sql

@Dandandan
Copy link
Contributor

show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @Dandandan, you asked to view the benchmark queue (#19346 (comment)).

Job User Benchmarks Comment
19287_3657520190.sh alamb external_aggr https://github.com/apache/datafusion/pull/19287#issuecomment-3657520190
19344_3657763461.sh alamb default https://github.com/apache/datafusion/pull/19344#issuecomment-3657763461
19346_3657820674.sh rluvaton aggregate_query_sql https://github.com/apache/datafusion/pull/19346#issuecomment-3657820674
19346_3659197246.sh Dandandan aggregate_query_sql https://github.com/apache/datafusion/pull/19346#issuecomment-3659197246

@rluvaton
Copy link
Member Author

@alamb any idea why it's not working?

@alamb
Copy link
Contributor

alamb commented Dec 16, 2025

@alamb any idea why it's not working?

My script runner bails in error and there was something wrong with one of the jobs. I need to make it more resilent to errors and better error reporting

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing check-cost-of-hashing (3aa62b0) to 58377bf diff
BENCH_NAME=aggregate_query_sql
BENCH_COMMAND=cargo bench --all-features --bench aggregate_query_sql
BENCH_FILTER=
BENCH_BRANCH_NAME=check-cost-of-hashing
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Dec 16, 2025

show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @alamb, you asked to view the benchmark queue (#19346 (comment)).

Job User Benchmarks Comment
19346_3657820674.sh rluvaton aggregate_query_sql https://github.com/apache/datafusion/pull/19346#issuecomment-3657820674
19346_3659197246.sh Dandandan aggregate_query_sql https://github.com/apache/datafusion/pull/19346#issuecomment-3659197246
19344_3660516858.sh Dandandan tpch https://github.com/apache/datafusion/pull/19344#issuecomment-3660516858
19239_3660734795.sh alamb default https://github.com/apache/datafusion/pull/19239#issuecomment-3660734795

@rluvaton
Copy link
Member Author

show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @rluvaton, you asked to view the benchmark queue (#19346 (comment)).

Job User Benchmarks Comment
19346_3657820674.sh rluvaton aggregate_query_sql https://github.com/apache/datafusion/pull/19346#issuecomment-3657820674
19346_3659197246.sh Dandandan aggregate_query_sql https://github.com/apache/datafusion/pull/19346#issuecomment-3659197246
19344_3660516858.sh Dandandan tpch https://github.com/apache/datafusion/pull/19344#issuecomment-3660516858
19239_3660734795.sh alamb default https://github.com/apache/datafusion/pull/19239#issuecomment-3660734795
19344_3660885262.sh alamb default https://github.com/apache/datafusion/pull/19344#issuecomment-3660885262

@alamb
Copy link
Contributor

alamb commented Dec 16, 2025

I just checked the runner. Whatever this benchmark is it is taking a long time to complete

The most recent one (note it says it is going to take 3379.1s, almost an hour!!! to run a single benchmark )

  8 (8.00%) high mild

Benchmarking aggregate_query_group_by_wide_u64_and_string_without_aggregate_expressions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 3379.1s, or reduce sample count to 10.
Benchmarking aggregate_query_group_by_wide_u64_and_string_without_aggregate_expressions: Collecting 100 samples in estimated 3379.1 s (100 iterations)

@rluvaton
Copy link
Member Author

Probably this pr cause that I think.

@rluvaton
Copy link
Member Author

Is it possible to cancel benchmark runs?

@rluvaton
Copy link
Member Author

And is there a way to see the current job status to know for next time?

@rluvaton
Copy link
Member Author

And is the run logs from main or this branch?

@alamb
Copy link
Contributor

alamb commented Dec 16, 2025

Is it possible to cancel benchmark runs?

I just did it manually. I don't have any automated way to do it yet

And is there a way to see the current job status to know for next time?

not that I know of yet. That would also be a great feature 🤔

@Dandandan
Copy link
Contributor

I think the change probably makes it super slow by creating a lot of hash "collisions" (by only looking at the first column)

@rluvaton
Copy link
Member Author

rluvaton commented Dec 17, 2025

I wanted to see for group by wide u64 and string how much saving the string hashing would save us as it's irrelevant - in wide u64 case all values are unique, so you don't need to hash by string.

So yes, I was aware that I would create hash collisions for the rest of the benchmarks (I didn't know it will be that slow though). What surprise me is that the group by wide was the one to take a long time which I explained shouldn't be as it's the same as hashing by a single column

@rluvaton
Copy link
Member Author

we can check if useful after:

@rluvaton rluvaton closed this Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants