Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Dec 17, 2025

Which issue does this PR close?

Rationale for this change

I want to optimize hashing for StringViewArray. In order to do I would like a benchmark to show it works

What changes are included in this PR?

Add benchmark for with_hashes

Run like

cargo bench  --bench with_hashes

Note I did not add all the possible types of arrays as I don't plan to optimize othrs

Are these changes tested?

I ran it manually

Are there any user-facing changes?

@alamb alamb added the performance Make DataFusion faster label Dec 17, 2025
@github-actions github-actions bot added the common Related to common crate label Dec 17, 2025
@alamb alamb marked this pull request as ready for review December 17, 2025 17:49
}

fn criterion_benchmark(c: &mut Criterion) {
let pool = StringPool::new(100, 64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these pools are randomized each run, is the sample size large enough to avoid noise? If not we could always generate a single pool of strings and commit it in a txt file or something...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe could you run this 5 times locally and report back variability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHile it is randomized, the random number generator uses a fixed seed:

pub fn make_rng() -> StdRng {
    StdRng::seed_from_u64(42)
}

So the same strings are used each time.

I double checked the strings are the same each run by printing them out to confirm.

diff --git a/datafusion/common/benches/with_hashes.rs b/datafusion/common/benches/with_hashes.rs
index 8154c20df..56b970fdc 100644
--- a/datafusion/common/benches/with_hashes.rs
+++ b/datafusion/common/benches/with_hashes.rs
@@ -41,6 +41,7 @@ struct BenchData {

 fn criterion_benchmark(c: &mut Criterion) {
     let pool = StringPool::new(100, 64);
+    println!("StringPool strings:\n{:#?}", pool.strings);
     // poll with small strings for string view tests (<=12 bytes are inlined)
     let small_pool = StringPool::new(100, 5);
     let cases = [
@@ -139,6 +140,7 @@ pub fn make_rng() -> StdRng {
 }

 /// String pool for generating low cardinality data (for dictionaries and string views)
+#[derive(Debug)]
 struct StringPool {
     strings: Vec<String>,
 }
Run 1
StringPool strings:
[
    "hPi3oZCna",
    "Pi3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdD",
    "i3oZCnaWvL2oIeA0",
    "3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ",
    "oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM4",
    "ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFW",
    "ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
    "CnaWvL2oIeA07mg3ZtJzh0NoAK",
    "CnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
    "naW",
    "aWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTc",
    "WvL2oIeA07mg3ZtJzh0NoAKhdDq",
    "vL2oIeA07mg3ZtJzh0NoAKh",
    "L2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
    "2oIeA07mg3Zt",
    "oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzO",
    "IeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZK",
    "eA07mg3Zt",
    "A07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWT",
    "0",
    "7mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
    "mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "g3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
    "3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZ",
    "ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
    "tJzh0NoAKhdDqpQ2dfgaDFWTcI",
    "Jzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWk",
    "zh0NoAKhdD",
    "h0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
    "0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM47",
    "NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4c",
    "oAKhdDqpQ2dfga",
    "AKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
    "K",
    "hdDqpQ2dfga",
    "dDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppz",
    "DqpQ2dfgaDFWTcIylNhZKp3bM477b3",
    "qpQ2",
    "pQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "Q2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "2dfgaDFWTcIylNhZK",
    "dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28",
    "fgaDFWTcIylNhZKp3bM477b3ppzOWk",
    "gaDFWTcIylNhZKp3bM477b3ppzOWkYYm",
    "gaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ",
    "aDFWTcIylNhZKp3bM477b3ppzOWkYYmEG",
    "DFWTcIylNhZKp3bM477b3ppzOWk",
    "FWTc",
    "WTcIyl",
    "TcIylNhZKp3bM477b3ppzOW",
    "cIylNhZKp3bM477b3ppz",
    "IylNhZKp3bM477b3ppzOWkYYmEGbC",
    "ylNhZKp3b",
    "lNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2",
    "NhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAf",
    "hZKp3bM477b3pp",
    "ZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQY",
    "Kp3bM477b3ppzOWkYYmEGbCym4",
    "p3bM477b3pp",
    "3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xy",
    "bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe7",
    "M477b3ppzOWkYYmEGbCym4cPB4JQ",
    "477b3ppzOWkYY",
    "77b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kv",
    "7b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
    "b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioH",
    "3ppzOWkYYmEGbCym4cPB4JQYAfz9",
    "ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJi",
    "pzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
    "zOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P",
    "OWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJ",
    "WkYYmEGbCym4cPB",
    "kYYmEGbCym4cPB4JQYAfz9f",
    "YYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
    "YmEGbCym4cPB4JQYAfz9f28i8",
    "mEGbCym4cPB4JQYAfz9f28i8x",
    "EGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe",
    "GbCym",
    "bCym4cP",
    "Cym4cPB4JQYAfz9f28i8xyzk2PYZ",
    "ym4",
    "ym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
    "m4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2",
    "4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kvd",
    "cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny",
    "PB4JQYAfz9f28i8xyzk2PYZ4O9P4o",
    "B4JQYAfz9f28i8xy",
    "4J",
    "JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
    "QYAfz9f28i",
    "YAfz9f28i8xyzk2PY",
    "Afz9f28i8xyzk2PYZ4O9P4oTe",
    "f",
    "z9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
    "9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNf",
    "f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wj",
    "28i8xyzk2PYZ4O9P4oTe798KvdJioHjV",
    "8i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zC",
    "i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wjUe",
    "8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2r",
]
Run 2
StringPool strings:
[
    "hPi3oZCna",
    "Pi3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdD",
    "i3oZCnaWvL2oIeA0",
    "3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ",
    "oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM4",
    "ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFW",
    "ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
    "CnaWvL2oIeA07mg3ZtJzh0NoAK",
    "CnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
    "naW",
    "aWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTc",
    "WvL2oIeA07mg3ZtJzh0NoAKhdDq",
    "vL2oIeA07mg3ZtJzh0NoAKh",
    "L2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
    "2oIeA07mg3Zt",
    "oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzO",
    "IeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZK",
    "eA07mg3Zt",
    "A07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWT",
    "0",
    "7mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
    "mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "g3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
    "3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZ",
    "ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
    "tJzh0NoAKhdDqpQ2dfgaDFWTcI",
    "Jzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWk",
    "zh0NoAKhdD",
    "h0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
    "0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM47",
    "NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4c",
    "oAKhdDqpQ2dfga",
    "AKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
    "K",
    "hdDqpQ2dfga",
    "dDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppz",
    "DqpQ2dfgaDFWTcIylNhZKp3bM477b3",
    "qpQ2",
    "pQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "Q2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "2dfgaDFWTcIylNhZK",
    "dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28",
    "fgaDFWTcIylNhZKp3bM477b3ppzOWk",
    "gaDFWTcIylNhZKp3bM477b3ppzOWkYYm",
    "gaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ",
    "aDFWTcIylNhZKp3bM477b3ppzOWkYYmEG",
    "DFWTcIylNhZKp3bM477b3ppzOWk",
    "FWTc",
    "WTcIyl",
    "TcIylNhZKp3bM477b3ppzOW",
    "cIylNhZKp3bM477b3ppz",
    "IylNhZKp3bM477b3ppzOWkYYmEGbC",
    "ylNhZKp3b",
    "lNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2",
    "NhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAf",
    "hZKp3bM477b3pp",
    "ZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQY",
    "Kp3bM477b3ppzOWkYYmEGbCym4",
    "p3bM477b3pp",
    "3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xy",
    "bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe7",
    "M477b3ppzOWkYYmEGbCym4cPB4JQ",
    "477b3ppzOWkYY",
    "77b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kv",
    "7b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
    "b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioH",
    "3ppzOWkYYmEGbCym4cPB4JQYAfz9",
    "ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJi",
    "pzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
    "zOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P",
    "OWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJ",
    "WkYYmEGbCym4cPB",
    "kYYmEGbCym4cPB4JQYAfz9f",
    "YYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
    "YmEGbCym4cPB4JQYAfz9f28i8",
    "mEGbCym4cPB4JQYAfz9f28i8x",
    "EGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe",
    "GbCym",
    "bCym4cP",
    "Cym4cPB4JQYAfz9f28i8xyzk2PYZ",
    "ym4",
    "ym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
    "m4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2",
    "4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kvd",
    "cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny",
    "PB4JQYAfz9f28i8xyzk2PYZ4O9P4o",
    "B4JQYAfz9f28i8xy",
    "4J",
    "JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
    "QYAfz9f28i",
    "YAfz9f28i8xyzk2PY",
    "Afz9f28i8xyzk2PYZ4O9P4oTe",
    "f",
    "z9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
    "9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNf",
    "f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wj",
    "28i8xyzk2PYZ4O9P4oTe798KvdJioHjV",
    "8i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zC",
    "i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wjUe",
    "8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2r",
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I had missed that

@Dandandan
Copy link
Contributor

Let's merge and tweak if needed

@Dandandan Dandandan added this pull request to the merge queue Dec 18, 2025
Merged via the queue into apache:main with commit d68b629 Dec 18, 2025
30 checks passed
@alamb
Copy link
Contributor Author

alamb commented Dec 18, 2025

Thanks @Dandandan

Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adriangb and @Dandandan

}

fn criterion_benchmark(c: &mut Criterion) {
let pool = StringPool::new(100, 64);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHile it is randomized, the random number generator uses a fixed seed:

pub fn make_rng() -> StdRng {
    StdRng::seed_from_u64(42)
}

So the same strings are used each time.

I double checked the strings are the same each run by printing them out to confirm.

diff --git a/datafusion/common/benches/with_hashes.rs b/datafusion/common/benches/with_hashes.rs
index 8154c20df..56b970fdc 100644
--- a/datafusion/common/benches/with_hashes.rs
+++ b/datafusion/common/benches/with_hashes.rs
@@ -41,6 +41,7 @@ struct BenchData {

 fn criterion_benchmark(c: &mut Criterion) {
     let pool = StringPool::new(100, 64);
+    println!("StringPool strings:\n{:#?}", pool.strings);
     // poll with small strings for string view tests (<=12 bytes are inlined)
     let small_pool = StringPool::new(100, 5);
     let cases = [
@@ -139,6 +140,7 @@ pub fn make_rng() -> StdRng {
 }

 /// String pool for generating low cardinality data (for dictionaries and string views)
+#[derive(Debug)]
 struct StringPool {
     strings: Vec<String>,
 }
Run 1
StringPool strings:
[
    "hPi3oZCna",
    "Pi3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdD",
    "i3oZCnaWvL2oIeA0",
    "3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ",
    "oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM4",
    "ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFW",
    "ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
    "CnaWvL2oIeA07mg3ZtJzh0NoAK",
    "CnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
    "naW",
    "aWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTc",
    "WvL2oIeA07mg3ZtJzh0NoAKhdDq",
    "vL2oIeA07mg3ZtJzh0NoAKh",
    "L2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
    "2oIeA07mg3Zt",
    "oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzO",
    "IeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZK",
    "eA07mg3Zt",
    "A07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWT",
    "0",
    "7mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
    "mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "g3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
    "3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZ",
    "ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
    "tJzh0NoAKhdDqpQ2dfgaDFWTcI",
    "Jzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWk",
    "zh0NoAKhdD",
    "h0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
    "0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM47",
    "NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4c",
    "oAKhdDqpQ2dfga",
    "AKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
    "K",
    "hdDqpQ2dfga",
    "dDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppz",
    "DqpQ2dfgaDFWTcIylNhZKp3bM477b3",
    "qpQ2",
    "pQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "Q2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "2dfgaDFWTcIylNhZK",
    "dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28",
    "fgaDFWTcIylNhZKp3bM477b3ppzOWk",
    "gaDFWTcIylNhZKp3bM477b3ppzOWkYYm",
    "gaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ",
    "aDFWTcIylNhZKp3bM477b3ppzOWkYYmEG",
    "DFWTcIylNhZKp3bM477b3ppzOWk",
    "FWTc",
    "WTcIyl",
    "TcIylNhZKp3bM477b3ppzOW",
    "cIylNhZKp3bM477b3ppz",
    "IylNhZKp3bM477b3ppzOWkYYmEGbC",
    "ylNhZKp3b",
    "lNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2",
    "NhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAf",
    "hZKp3bM477b3pp",
    "ZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQY",
    "Kp3bM477b3ppzOWkYYmEGbCym4",
    "p3bM477b3pp",
    "3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xy",
    "bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe7",
    "M477b3ppzOWkYYmEGbCym4cPB4JQ",
    "477b3ppzOWkYY",
    "77b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kv",
    "7b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
    "b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioH",
    "3ppzOWkYYmEGbCym4cPB4JQYAfz9",
    "ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJi",
    "pzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
    "zOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P",
    "OWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJ",
    "WkYYmEGbCym4cPB",
    "kYYmEGbCym4cPB4JQYAfz9f",
    "YYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
    "YmEGbCym4cPB4JQYAfz9f28i8",
    "mEGbCym4cPB4JQYAfz9f28i8x",
    "EGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe",
    "GbCym",
    "bCym4cP",
    "Cym4cPB4JQYAfz9f28i8xyzk2PYZ",
    "ym4",
    "ym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
    "m4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2",
    "4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kvd",
    "cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny",
    "PB4JQYAfz9f28i8xyzk2PYZ4O9P4o",
    "B4JQYAfz9f28i8xy",
    "4J",
    "JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
    "QYAfz9f28i",
    "YAfz9f28i8xyzk2PY",
    "Afz9f28i8xyzk2PYZ4O9P4oTe",
    "f",
    "z9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
    "9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNf",
    "f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wj",
    "28i8xyzk2PYZ4O9P4oTe798KvdJioHjV",
    "8i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zC",
    "i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wjUe",
    "8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2r",
]
Run 2
StringPool strings:
[
    "hPi3oZCna",
    "Pi3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdD",
    "i3oZCnaWvL2oIeA0",
    "3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ",
    "oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM4",
    "ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFW",
    "ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
    "CnaWvL2oIeA07mg3ZtJzh0NoAK",
    "CnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
    "naW",
    "aWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTc",
    "WvL2oIeA07mg3ZtJzh0NoAKhdDq",
    "vL2oIeA07mg3ZtJzh0NoAKh",
    "L2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
    "2oIeA07mg3Zt",
    "oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzO",
    "IeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZK",
    "eA07mg3Zt",
    "A07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWT",
    "0",
    "7mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
    "mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "g3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
    "3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZ",
    "ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
    "tJzh0NoAKhdDqpQ2dfgaDFWTcI",
    "Jzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWk",
    "zh0NoAKhdD",
    "h0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
    "0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM47",
    "NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4c",
    "oAKhdDqpQ2dfga",
    "AKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
    "K",
    "hdDqpQ2dfga",
    "dDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppz",
    "DqpQ2dfgaDFWTcIylNhZKp3bM477b3",
    "qpQ2",
    "pQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "Q2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
    "2dfgaDFWTcIylNhZK",
    "dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28",
    "fgaDFWTcIylNhZKp3bM477b3ppzOWk",
    "gaDFWTcIylNhZKp3bM477b3ppzOWkYYm",
    "gaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ",
    "aDFWTcIylNhZKp3bM477b3ppzOWkYYmEG",
    "DFWTcIylNhZKp3bM477b3ppzOWk",
    "FWTc",
    "WTcIyl",
    "TcIylNhZKp3bM477b3ppzOW",
    "cIylNhZKp3bM477b3ppz",
    "IylNhZKp3bM477b3ppzOWkYYmEGbC",
    "ylNhZKp3b",
    "lNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2",
    "NhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAf",
    "hZKp3bM477b3pp",
    "ZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQY",
    "Kp3bM477b3ppzOWkYYmEGbCym4",
    "p3bM477b3pp",
    "3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xy",
    "bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe7",
    "M477b3ppzOWkYYmEGbCym4cPB4JQ",
    "477b3ppzOWkYY",
    "77b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kv",
    "7b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
    "b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioH",
    "3ppzOWkYYmEGbCym4cPB4JQYAfz9",
    "ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJi",
    "pzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
    "zOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P",
    "OWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJ",
    "WkYYmEGbCym4cPB",
    "kYYmEGbCym4cPB4JQYAfz9f",
    "YYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
    "YmEGbCym4cPB4JQYAfz9f28i8",
    "mEGbCym4cPB4JQYAfz9f28i8x",
    "EGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe",
    "GbCym",
    "bCym4cP",
    "Cym4cPB4JQYAfz9f28i8xyzk2PYZ",
    "ym4",
    "ym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
    "m4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2",
    "4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kvd",
    "cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny",
    "PB4JQYAfz9f28i8xyzk2PYZ4O9P4o",
    "B4JQYAfz9f28i8xy",
    "4J",
    "JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
    "QYAfz9f28i",
    "YAfz9f28i8xyzk2PY",
    "Afz9f28i8xyzk2PYZ4O9P4oTe",
    "f",
    "z9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
    "9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNf",
    "f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wj",
    "28i8xyzk2PYZ4O9P4oTe798KvdJioHjV",
    "8i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zC",
    "i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wjUe",
    "8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2r",
]

@alamb alamb deleted the alamb/hash_bench branch December 18, 2025 12:34
github-merge-queue bot pushed a commit that referenced this pull request Dec 20, 2025
## Which issue does this PR close?

- builds on #19373
- part of #18411
- Broken out of #19344
- Closes #19344

## Rationale for this change

While looking at performance as part of
#18411, I noticed we could
speed up string view hashing by optimizing for small strings

## What changes are included in this PR?

Optimize StringView hashing, specifically by using the inlined view for
short strings

## Are these changes tested?

Functionally by existing coverage

Performance by benchmarks (added in
#19373) which show
* 15%-20% faster for mixed short/long strings
* 50%-70% faster for "short" arrays where we know there are no strings
longer than 12 bytes

```
utf8_view (small): multiple, no nulls        1.00     47.9±1.71µs        ? ?/sec    4.00    191.6±1.15µs        ? ?/sec
utf8_view (small): multiple, nulls           1.00     78.4±0.48µs        ? ?/sec    3.08    241.6±1.11µs        ? ?/sec
utf8_view (small): single, no nulls          1.00     13.9±0.19µs        ? ?/sec    4.29     59.7±0.30µs        ? ?/sec
utf8_view (small): single, nulls             1.00     23.8±0.20µs        ? ?/sec    3.10     73.7±1.03µs        ? ?/sec
utf8_view: multiple, no nulls                1.00    235.4±2.14µs        ? ?/sec    1.11    262.2±1.34µs        ? ?/sec
utf8_view: multiple, nulls                   1.00    227.2±2.11µs        ? ?/sec    1.34    303.9±2.23µs        ? ?/sec
utf8_view: single, no nulls                  1.00     71.6±0.74µs        ? ?/sec    1.05     75.2±1.27µs        ? ?/sec
utf8_view: single, nulls                     1.00     71.5±1.92µs        ? ?/sec    1.28     91.6±4.65µs  
```


<details><summary>Details</summary>
<p>

```
Gnuplot not found, using plotters backend
utf8_view: single, no nulls
                        time:   [20.872 µs 20.906 µs 20.944 µs]
                        change: [−15.863% −15.614% −15.331%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  8 (8.00%) high mild
  5 (5.00%) high severe

utf8_view: single, nulls
                        time:   [22.968 µs 23.050 µs 23.130 µs]
                        change: [−17.796% −17.384% −16.918%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

utf8_view: multiple, no nulls
                        time:   [66.005 µs 66.155 µs 66.325 µs]
                        change: [−19.077% −18.785% −18.512%] (p = 0.00 < 0.05)
                        Performance has improved.

utf8_view: multiple, nulls
                        time:   [72.155 µs 72.375 µs 72.649 µs]
                        change: [−17.944% −17.612% −17.266%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

utf8_view (small): single, no nulls
                        time:   [6.1401 µs 6.1563 µs 6.1747 µs]
                        change: [−69.623% −69.484% −69.333%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

utf8_view (small): single, nulls
                        time:   [10.234 µs 10.250 µs 10.270 µs]
                        change: [−53.969% −53.815% −53.666%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high severe

utf8_view (small): multiple, no nulls
                        time:   [20.853 µs 20.905 µs 20.961 µs]
                        change: [−66.006% −65.883% −65.759%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

utf8_view (small): multiple, nulls
                        time:   [32.519 µs 32.600 µs 32.675 µs]
                        change: [−53.937% −53.581% −53.232%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
```

</p>
</details> 

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants