-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add hashing microbenchmark with_hashes
#19373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1bccd06 to
273b46b
Compare
| } | ||
|
|
||
| fn criterion_benchmark(c: &mut Criterion) { | ||
| let pool = StringPool::new(100, 64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these pools are randomized each run, is the sample size large enough to avoid noise? If not we could always generate a single pool of strings and commit it in a txt file or something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe could you run this 5 times locally and report back variability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WHile it is randomized, the random number generator uses a fixed seed:
pub fn make_rng() -> StdRng {
StdRng::seed_from_u64(42)
}So the same strings are used each time.
I double checked the strings are the same each run by printing them out to confirm.
diff --git a/datafusion/common/benches/with_hashes.rs b/datafusion/common/benches/with_hashes.rs
index 8154c20df..56b970fdc 100644
--- a/datafusion/common/benches/with_hashes.rs
+++ b/datafusion/common/benches/with_hashes.rs
@@ -41,6 +41,7 @@ struct BenchData {
fn criterion_benchmark(c: &mut Criterion) {
let pool = StringPool::new(100, 64);
+ println!("StringPool strings:\n{:#?}", pool.strings);
// poll with small strings for string view tests (<=12 bytes are inlined)
let small_pool = StringPool::new(100, 5);
let cases = [
@@ -139,6 +140,7 @@ pub fn make_rng() -> StdRng {
}
/// String pool for generating low cardinality data (for dictionaries and string views)
+#[derive(Debug)]
struct StringPool {
strings: Vec<String>,
}Run 1
StringPool strings:
[
"hPi3oZCna",
"Pi3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdD",
"i3oZCnaWvL2oIeA0",
"3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ",
"oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM4",
"ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFW",
"ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
"CnaWvL2oIeA07mg3ZtJzh0NoAK",
"CnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
"naW",
"aWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTc",
"WvL2oIeA07mg3ZtJzh0NoAKhdDq",
"vL2oIeA07mg3ZtJzh0NoAKh",
"L2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
"2oIeA07mg3Zt",
"oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzO",
"IeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZK",
"eA07mg3Zt",
"A07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWT",
"0",
"7mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
"mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"g3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
"3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZ",
"ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
"tJzh0NoAKhdDqpQ2dfgaDFWTcI",
"Jzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWk",
"zh0NoAKhdD",
"h0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
"0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM47",
"NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4c",
"oAKhdDqpQ2dfga",
"AKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
"K",
"hdDqpQ2dfga",
"dDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppz",
"DqpQ2dfgaDFWTcIylNhZKp3bM477b3",
"qpQ2",
"pQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"Q2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"2dfgaDFWTcIylNhZK",
"dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28",
"fgaDFWTcIylNhZKp3bM477b3ppzOWk",
"gaDFWTcIylNhZKp3bM477b3ppzOWkYYm",
"gaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ",
"aDFWTcIylNhZKp3bM477b3ppzOWkYYmEG",
"DFWTcIylNhZKp3bM477b3ppzOWk",
"FWTc",
"WTcIyl",
"TcIylNhZKp3bM477b3ppzOW",
"cIylNhZKp3bM477b3ppz",
"IylNhZKp3bM477b3ppzOWkYYmEGbC",
"ylNhZKp3b",
"lNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2",
"NhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAf",
"hZKp3bM477b3pp",
"ZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQY",
"Kp3bM477b3ppzOWkYYmEGbCym4",
"p3bM477b3pp",
"3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xy",
"bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe7",
"M477b3ppzOWkYYmEGbCym4cPB4JQ",
"477b3ppzOWkYY",
"77b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kv",
"7b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
"b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioH",
"3ppzOWkYYmEGbCym4cPB4JQYAfz9",
"ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJi",
"pzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
"zOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P",
"OWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJ",
"WkYYmEGbCym4cPB",
"kYYmEGbCym4cPB4JQYAfz9f",
"YYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
"YmEGbCym4cPB4JQYAfz9f28i8",
"mEGbCym4cPB4JQYAfz9f28i8x",
"EGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe",
"GbCym",
"bCym4cP",
"Cym4cPB4JQYAfz9f28i8xyzk2PYZ",
"ym4",
"ym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
"m4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2",
"4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kvd",
"cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny",
"PB4JQYAfz9f28i8xyzk2PYZ4O9P4o",
"B4JQYAfz9f28i8xy",
"4J",
"JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
"QYAfz9f28i",
"YAfz9f28i8xyzk2PY",
"Afz9f28i8xyzk2PYZ4O9P4oTe",
"f",
"z9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
"9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNf",
"f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wj",
"28i8xyzk2PYZ4O9P4oTe798KvdJioHjV",
"8i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zC",
"i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wjUe",
"8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2r",
]
Run 2
StringPool strings:
[
"hPi3oZCna",
"Pi3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdD",
"i3oZCnaWvL2oIeA0",
"3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ",
"oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM4",
"ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFW",
"ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
"CnaWvL2oIeA07mg3ZtJzh0NoAK",
"CnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
"naW",
"aWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTc",
"WvL2oIeA07mg3ZtJzh0NoAKhdDq",
"vL2oIeA07mg3ZtJzh0NoAKh",
"L2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
"2oIeA07mg3Zt",
"oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzO",
"IeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZK",
"eA07mg3Zt",
"A07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWT",
"0",
"7mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
"mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"g3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
"3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZ",
"ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
"tJzh0NoAKhdDqpQ2dfgaDFWTcI",
"Jzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWk",
"zh0NoAKhdD",
"h0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
"0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM47",
"NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4c",
"oAKhdDqpQ2dfga",
"AKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
"K",
"hdDqpQ2dfga",
"dDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppz",
"DqpQ2dfgaDFWTcIylNhZKp3bM477b3",
"qpQ2",
"pQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"Q2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"2dfgaDFWTcIylNhZK",
"dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28",
"fgaDFWTcIylNhZKp3bM477b3ppzOWk",
"gaDFWTcIylNhZKp3bM477b3ppzOWkYYm",
"gaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ",
"aDFWTcIylNhZKp3bM477b3ppzOWkYYmEG",
"DFWTcIylNhZKp3bM477b3ppzOWk",
"FWTc",
"WTcIyl",
"TcIylNhZKp3bM477b3ppzOW",
"cIylNhZKp3bM477b3ppz",
"IylNhZKp3bM477b3ppzOWkYYmEGbC",
"ylNhZKp3b",
"lNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2",
"NhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAf",
"hZKp3bM477b3pp",
"ZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQY",
"Kp3bM477b3ppzOWkYYmEGbCym4",
"p3bM477b3pp",
"3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xy",
"bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe7",
"M477b3ppzOWkYYmEGbCym4cPB4JQ",
"477b3ppzOWkYY",
"77b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kv",
"7b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
"b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioH",
"3ppzOWkYYmEGbCym4cPB4JQYAfz9",
"ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJi",
"pzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
"zOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P",
"OWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJ",
"WkYYmEGbCym4cPB",
"kYYmEGbCym4cPB4JQYAfz9f",
"YYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
"YmEGbCym4cPB4JQYAfz9f28i8",
"mEGbCym4cPB4JQYAfz9f28i8x",
"EGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe",
"GbCym",
"bCym4cP",
"Cym4cPB4JQYAfz9f28i8xyzk2PYZ",
"ym4",
"ym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
"m4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2",
"4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kvd",
"cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny",
"PB4JQYAfz9f28i8xyzk2PYZ4O9P4o",
"B4JQYAfz9f28i8xy",
"4J",
"JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
"QYAfz9f28i",
"YAfz9f28i8xyzk2PY",
"Afz9f28i8xyzk2PYZ4O9P4oTe",
"f",
"z9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
"9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNf",
"f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wj",
"28i8xyzk2PYZ4O9P4oTe798KvdJioHjV",
"8i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zC",
"i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wjUe",
"8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2r",
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I had missed that
|
Let's merge and tweak if needed |
|
Thanks @Dandandan |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adriangb and @Dandandan
| } | ||
|
|
||
| fn criterion_benchmark(c: &mut Criterion) { | ||
| let pool = StringPool::new(100, 64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WHile it is randomized, the random number generator uses a fixed seed:
pub fn make_rng() -> StdRng {
StdRng::seed_from_u64(42)
}So the same strings are used each time.
I double checked the strings are the same each run by printing them out to confirm.
diff --git a/datafusion/common/benches/with_hashes.rs b/datafusion/common/benches/with_hashes.rs
index 8154c20df..56b970fdc 100644
--- a/datafusion/common/benches/with_hashes.rs
+++ b/datafusion/common/benches/with_hashes.rs
@@ -41,6 +41,7 @@ struct BenchData {
fn criterion_benchmark(c: &mut Criterion) {
let pool = StringPool::new(100, 64);
+ println!("StringPool strings:\n{:#?}", pool.strings);
// poll with small strings for string view tests (<=12 bytes are inlined)
let small_pool = StringPool::new(100, 5);
let cases = [
@@ -139,6 +140,7 @@ pub fn make_rng() -> StdRng {
}
/// String pool for generating low cardinality data (for dictionaries and string views)
+#[derive(Debug)]
struct StringPool {
strings: Vec<String>,
}Run 1
StringPool strings:
[
"hPi3oZCna",
"Pi3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdD",
"i3oZCnaWvL2oIeA0",
"3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ",
"oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM4",
"ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFW",
"ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
"CnaWvL2oIeA07mg3ZtJzh0NoAK",
"CnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
"naW",
"aWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTc",
"WvL2oIeA07mg3ZtJzh0NoAKhdDq",
"vL2oIeA07mg3ZtJzh0NoAKh",
"L2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
"2oIeA07mg3Zt",
"oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzO",
"IeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZK",
"eA07mg3Zt",
"A07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWT",
"0",
"7mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
"mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"g3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
"3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZ",
"ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
"tJzh0NoAKhdDqpQ2dfgaDFWTcI",
"Jzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWk",
"zh0NoAKhdD",
"h0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
"0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM47",
"NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4c",
"oAKhdDqpQ2dfga",
"AKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
"K",
"hdDqpQ2dfga",
"dDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppz",
"DqpQ2dfgaDFWTcIylNhZKp3bM477b3",
"qpQ2",
"pQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"Q2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"2dfgaDFWTcIylNhZK",
"dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28",
"fgaDFWTcIylNhZKp3bM477b3ppzOWk",
"gaDFWTcIylNhZKp3bM477b3ppzOWkYYm",
"gaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ",
"aDFWTcIylNhZKp3bM477b3ppzOWkYYmEG",
"DFWTcIylNhZKp3bM477b3ppzOWk",
"FWTc",
"WTcIyl",
"TcIylNhZKp3bM477b3ppzOW",
"cIylNhZKp3bM477b3ppz",
"IylNhZKp3bM477b3ppzOWkYYmEGbC",
"ylNhZKp3b",
"lNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2",
"NhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAf",
"hZKp3bM477b3pp",
"ZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQY",
"Kp3bM477b3ppzOWkYYmEGbCym4",
"p3bM477b3pp",
"3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xy",
"bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe7",
"M477b3ppzOWkYYmEGbCym4cPB4JQ",
"477b3ppzOWkYY",
"77b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kv",
"7b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
"b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioH",
"3ppzOWkYYmEGbCym4cPB4JQYAfz9",
"ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJi",
"pzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
"zOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P",
"OWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJ",
"WkYYmEGbCym4cPB",
"kYYmEGbCym4cPB4JQYAfz9f",
"YYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
"YmEGbCym4cPB4JQYAfz9f28i8",
"mEGbCym4cPB4JQYAfz9f28i8x",
"EGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe",
"GbCym",
"bCym4cP",
"Cym4cPB4JQYAfz9f28i8xyzk2PYZ",
"ym4",
"ym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
"m4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2",
"4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kvd",
"cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny",
"PB4JQYAfz9f28i8xyzk2PYZ4O9P4o",
"B4JQYAfz9f28i8xy",
"4J",
"JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
"QYAfz9f28i",
"YAfz9f28i8xyzk2PY",
"Afz9f28i8xyzk2PYZ4O9P4oTe",
"f",
"z9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
"9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNf",
"f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wj",
"28i8xyzk2PYZ4O9P4oTe798KvdJioHjV",
"8i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zC",
"i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wjUe",
"8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2r",
]
Run 2
StringPool strings:
[
"hPi3oZCna",
"Pi3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdD",
"i3oZCnaWvL2oIeA0",
"3oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ",
"oZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM4",
"ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFW",
"ZCnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
"CnaWvL2oIeA07mg3ZtJzh0NoAK",
"CnaWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOW",
"naW",
"aWvL2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTc",
"WvL2oIeA07mg3ZtJzh0NoAKhdDq",
"vL2oIeA07mg3ZtJzh0NoAKh",
"L2oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
"2oIeA07mg3Zt",
"oIeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzO",
"IeA07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZK",
"eA07mg3Zt",
"A07mg3ZtJzh0NoAKhdDqpQ2dfgaDFWT",
"0",
"7mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
"mg3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"g3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM",
"3ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZ",
"ZtJzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
"tJzh0NoAKhdDqpQ2dfgaDFWTcI",
"Jzh0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWk",
"zh0NoAKhdD",
"h0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCy",
"0NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM47",
"NoAKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4c",
"oAKhdDqpQ2dfga",
"AKhdDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkY",
"K",
"hdDqpQ2dfga",
"dDqpQ2dfgaDFWTcIylNhZKp3bM477b3ppz",
"DqpQ2dfgaDFWTcIylNhZKp3bM477b3",
"qpQ2",
"pQ2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"Q2dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym",
"2dfgaDFWTcIylNhZK",
"dfgaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28",
"fgaDFWTcIylNhZKp3bM477b3ppzOWk",
"gaDFWTcIylNhZKp3bM477b3ppzOWkYYm",
"gaDFWTcIylNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ",
"aDFWTcIylNhZKp3bM477b3ppzOWkYYmEG",
"DFWTcIylNhZKp3bM477b3ppzOWk",
"FWTc",
"WTcIyl",
"TcIylNhZKp3bM477b3ppzOW",
"cIylNhZKp3bM477b3ppz",
"IylNhZKp3bM477b3ppzOWkYYmEGbC",
"ylNhZKp3b",
"lNhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2",
"NhZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQYAf",
"hZKp3bM477b3pp",
"ZKp3bM477b3ppzOWkYYmEGbCym4cPB4JQY",
"Kp3bM477b3ppzOWkYYmEGbCym4",
"p3bM477b3pp",
"3bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xy",
"bM477b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe7",
"M477b3ppzOWkYYmEGbCym4cPB4JQ",
"477b3ppzOWkYY",
"77b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kv",
"7b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
"b3ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioH",
"3ppzOWkYYmEGbCym4cPB4JQYAfz9",
"ppzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJi",
"pzOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
"zOWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P",
"OWkYYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJ",
"WkYYmEGbCym4cPB",
"kYYmEGbCym4cPB4JQYAfz9f",
"YYmEGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9",
"YmEGbCym4cPB4JQYAfz9f28i8",
"mEGbCym4cPB4JQYAfz9f28i8x",
"EGbCym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe",
"GbCym",
"bCym4cP",
"Cym4cPB4JQYAfz9f28i8xyzk2PYZ",
"ym4",
"ym4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
"m4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2",
"4cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798Kvd",
"cPB4JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny",
"PB4JQYAfz9f28i8xyzk2PYZ4O9P4o",
"B4JQYAfz9f28i8xy",
"4J",
"JQYAfz9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkN",
"QYAfz9f28i",
"YAfz9f28i8xyzk2PY",
"Afz9f28i8xyzk2PYZ4O9P4oTe",
"f",
"z9f28i8xyzk2PYZ4O9P4oTe798KvdJio",
"9f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNf",
"f28i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wj",
"28i8xyzk2PYZ4O9P4oTe798KvdJioHjV",
"8i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zC",
"i8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2rSDte9ny3QkNfucH7zCQf1wjUe",
"8xyzk2PYZ4O9P4oTe798KvdJioHjVwPSk2r",
]
## Which issue does this PR close? - builds on #19373 - part of #18411 - Broken out of #19344 - Closes #19344 ## Rationale for this change While looking at performance as part of #18411, I noticed we could speed up string view hashing by optimizing for small strings ## What changes are included in this PR? Optimize StringView hashing, specifically by using the inlined view for short strings ## Are these changes tested? Functionally by existing coverage Performance by benchmarks (added in #19373) which show * 15%-20% faster for mixed short/long strings * 50%-70% faster for "short" arrays where we know there are no strings longer than 12 bytes ``` utf8_view (small): multiple, no nulls 1.00 47.9±1.71µs ? ?/sec 4.00 191.6±1.15µs ? ?/sec utf8_view (small): multiple, nulls 1.00 78.4±0.48µs ? ?/sec 3.08 241.6±1.11µs ? ?/sec utf8_view (small): single, no nulls 1.00 13.9±0.19µs ? ?/sec 4.29 59.7±0.30µs ? ?/sec utf8_view (small): single, nulls 1.00 23.8±0.20µs ? ?/sec 3.10 73.7±1.03µs ? ?/sec utf8_view: multiple, no nulls 1.00 235.4±2.14µs ? ?/sec 1.11 262.2±1.34µs ? ?/sec utf8_view: multiple, nulls 1.00 227.2±2.11µs ? ?/sec 1.34 303.9±2.23µs ? ?/sec utf8_view: single, no nulls 1.00 71.6±0.74µs ? ?/sec 1.05 75.2±1.27µs ? ?/sec utf8_view: single, nulls 1.00 71.5±1.92µs ? ?/sec 1.28 91.6±4.65µs ``` <details><summary>Details</summary> <p> ``` Gnuplot not found, using plotters backend utf8_view: single, no nulls time: [20.872 µs 20.906 µs 20.944 µs] change: [−15.863% −15.614% −15.331%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 8 (8.00%) high mild 5 (5.00%) high severe utf8_view: single, nulls time: [22.968 µs 23.050 µs 23.130 µs] change: [−17.796% −17.384% −16.918%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe utf8_view: multiple, no nulls time: [66.005 µs 66.155 µs 66.325 µs] change: [−19.077% −18.785% −18.512%] (p = 0.00 < 0.05) Performance has improved. utf8_view: multiple, nulls time: [72.155 µs 72.375 µs 72.649 µs] change: [−17.944% −17.612% −17.266%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe utf8_view (small): single, no nulls time: [6.1401 µs 6.1563 µs 6.1747 µs] change: [−69.623% −69.484% −69.333%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe utf8_view (small): single, nulls time: [10.234 µs 10.250 µs 10.270 µs] change: [−53.969% −53.815% −53.666%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high severe utf8_view (small): multiple, no nulls time: [20.853 µs 20.905 µs 20.961 µs] change: [−66.006% −65.883% −65.759%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe utf8_view (small): multiple, nulls time: [32.519 µs 32.600 µs 32.675 µs] change: [−53.937% −53.581% −53.232%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ``` </p> </details> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
Which issue does this PR close?
Rationale for this change
I want to optimize hashing for StringViewArray. In order to do I would like a benchmark to show it works
What changes are included in this PR?
Add benchmark for
with_hashesRun like
Note I did not add all the possible types of arrays as I don't plan to optimize othrs
Are these changes tested?
I ran it manually
Are there any user-facing changes?