[SPARK-57103][SQL][TEST][FOLLOWUP] Add test coverage for max_by/min_by over nanosecond-precision timestamp types#56612
Conversation
Resolve conflicts against SPARK-57527 (unix_nanos over nanos), which edited the
same nanosecond-timestamp test files:
- TimestampNanosFunctionsSuiteBase.scala: keep both the SPARK-57103 max_by/min_by
tests and the SPARK-57527 unix_nanos tests.
- timestamp-{ntz,ltz}-nanos.sql: keep both query sections.
- results/ and analyzer-results/ golden files regenerated.
Co-authored-by: Isaac
uros-b
left a comment
There was a problem hiding this comment.
Thank you for this PR @stevomitric! Please address the linter failures (https://github.com/stevomitric/spark/actions/runs/27833368757/job/82375857257). Also, please note that we should generally avoid making "[FOLLOWUP]" tickets in Spark. Please create a new Jira ticket for these improvements, and track the new patch there.
MaxGekk
left a comment
There was a problem hiding this comment.
0 blocking, 2 non-blocking, 0 nits.
The test logic is correct, but the E2E tests should hit the Scala functions.* Column API, not only the SQL path (which is already covered by the golden .sql files this PR adds).
Design / architecture (1)
- TimestampNanosFunctionsSuiteBase.scala:478:
max_by/min_byare invoked only viaselectExpr; use thefunctions.*Column API per the suite's convention (also affects :508, :531) — see inline
Suggestions (1)
- ExpressionTypeCheckingSuite.scala:415: build each aggregate once via
Seq(MaxBy, MinBy).foreach— see inline
Verification
Tests-only, so I verified the test logic rather than production code:
- Precision/type preservation is asserted explicitly (
res.schema.map(_.dataType)and the contract suite'sdataType == dt), not left tocheckAnswer, which compares rows not metadata — so a precision regression can't slip past. - Fixtures genuinely exercise nanos despite precision flooring: test 1 uses 100ns multiples (exact at every
p∈[7,9]); the sub-µs ordering test pinsp=9; the GROUP BY test asserts only ordering, which survives flooring (1ns→0 < 999ns→900atp=7). - Goldens consistent:
timestamp_{ntz,ltz}(9)schema, correct extremes, LTZ zone shift.
The line-length lint and the [FOLLOWUP]-ticket convention are already covered by @uros-b's review, so nothing to add there.
MaxGekk
left a comment
There was a problem hiding this comment.
2 addressed, 0 remaining, 0 new.
Both prior findings are applied in c72477f (only the two Scala test files changed since the last round): the three E2E tests now use the functions.max_by/min_by(Column, Column) API instead of selectExpr/expr, and the contract test loops over Seq(MaxBy, MinBy).foreach. No new findings.
Verification
Re-verified the tests-only coverage: the precision test asserts the returned dataType explicitly (precision is metadata, invisible to checkAnswer); the sub-microsecond ordering test genuinely needs the full TimestampNanosVal comparison (both values share epochMicros, differ only in nanosWithinMicro) and runs codegen + interpreted; fixtures survive precision flooring as their comments claim; goldens carry the correct extremes and timestamp_{ntz,ltz}(9) schema. The comment's contract claim matches production: MaxBy/MinBy dataType = valueExpr.dataType, gating only on ordering orderability (MaxByAndMinBy.scala:47,49-50).
|
+1, LGTM. Merging to master/4.x. |
…y over nanosecond-precision timestamp types ### What changes were proposed in this pull request? Tests-only follow-up to SPARK-57103, extending nanos coverage to `max_by`/`min_by` over `TIMESTAMP_NTZ(p)`/`TIMESTAMP_LTZ(p)` (`p` in `[7,9]`). No production change: `MaxMinby` gates only on the ordering's orderability (which nanos pass) and returns `valueExpr.dataType`, so a nanosecond value keeps its precision. Adds `ExpressionTypeCheckingSuite` contract asserts, `TimestampNanosFunctionsSuiteBase` E2E tests (value-is-nanos precision; sub-microsecond ordering on codegen + interpreted; NULL ordering; GROUP BY), and golden `max_by`/`min_by` queries in `timestamp-{ntz,ltz}-nanos.sql`. Two-arg form only; `k`-variant and mixed precision out of scope. ### Why are the changes needed? `max_by`/`min_by` over nanos had no coverage, especially the returned-value precision case. Locks in correct behavior and guards against regressions. ### Does this PR introduce any user-facing change? No. Tests only; types stay behind `spark.sql.timestampNanosTypes.enabled`. ### How was this patch tested? New tests on JDK 17: `ExpressionTypeCheckingSuite`, `TimestampNanosFunctionsAnsiOn/OffSuite`, and `SQLQueryTestSuite -z nanos` (goldens regenerated, then verified clean). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8) Closes #56612 from stevomitric/stevomitric/min-by-max-by. Authored-by: Stevo Mitric <stevomitric2000@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 7818539) Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
Tests-only follow-up to SPARK-57103, extending nanos coverage to
max_by/min_byoverTIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)(pin[7,9]). No production change:MaxMinbygates only on the ordering's orderability (which nanos pass) and returnsvalueExpr.dataType, so a nanosecond value keeps its precision. AddsExpressionTypeCheckingSuitecontract asserts,TimestampNanosFunctionsSuiteBaseE2E tests (value-is-nanos precision; sub-microsecond ordering on codegen + interpreted; NULL ordering; GROUP BY), and goldenmax_by/min_byqueries intimestamp-{ntz,ltz}-nanos.sql. Two-arg form only;k-variant and mixed precision out of scope.Why are the changes needed?
max_by/min_byover nanos had no coverage, especially the returned-value precision case. Locks in correct behavior and guards against regressions.Does this PR introduce any user-facing change?
No. Tests only; types stay behind
spark.sql.timestampNanosTypes.enabled.How was this patch tested?
New tests on JDK 17:
ExpressionTypeCheckingSuite,TimestampNanosFunctionsAnsiOn/OffSuite, andSQLQueryTestSuite -z nanos(goldens regenerated, then verified clean).Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)