[SPARK-57103][SQL][TEST][FOLLOWUP] Add test coverage for max_by/min_by over nanosecond-precision timestamp types by stevomitric · Pull Request #56612 · apache/spark

stevomitric · 2026-06-19T14:41:28Z

What changes were proposed in this pull request?

Tests-only follow-up to SPARK-57103, extending nanos coverage to max_by/min_by over TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p) (p in [7,9]). No production change: MaxMinby gates only on the ordering's orderability (which nanos pass) and returns valueExpr.dataType, so a nanosecond value keeps its precision. Adds ExpressionTypeCheckingSuite contract asserts, TimestampNanosFunctionsSuiteBase E2E tests (value-is-nanos precision; sub-microsecond ordering on codegen + interpreted; NULL ordering; GROUP BY), and golden max_by/min_by queries in timestamp-{ntz,ltz}-nanos.sql. Two-arg form only; k-variant and mixed precision out of scope.

Why are the changes needed?

max_by/min_by over nanos had no coverage, especially the returned-value precision case. Locks in correct behavior and guards against regressions.

Does this PR introduce any user-facing change?

No. Tests only; types stay behind spark.sql.timestampNanosTypes.enabled.

How was this patch tested?

New tests on JDK 17: ExpressionTypeCheckingSuite, TimestampNanosFunctionsAnsiOn/OffSuite, and SQLQueryTestSuite -z nanos (goldens regenerated, then verified clean).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Resolve conflicts against SPARK-57527 (unix_nanos over nanos), which edited the same nanosecond-timestamp test files: - TimestampNanosFunctionsSuiteBase.scala: keep both the SPARK-57103 max_by/min_by tests and the SPARK-57527 unix_nanos tests. - timestamp-{ntz,ltz}-nanos.sql: keep both query sections. - results/ and analyzer-results/ golden files regenerated. Co-authored-by: Isaac

uros-b

Thank you for this PR @stevomitric! Please address the linter failures (https://github.com/stevomitric/spark/actions/runs/27833368757/job/82375857257). Also, please note that we should generally avoid making "[FOLLOWUP]" tickets in Spark. Please create a new Jira ticket for these improvements, and track the new patch there.

MaxGekk

0 blocking, 2 non-blocking, 0 nits.
The test logic is correct, but the E2E tests should hit the Scala functions.* Column API, not only the SQL path (which is already covered by the golden .sql files this PR adds).

Design / architecture (1)

TimestampNanosFunctionsSuiteBase.scala:478: max_by/min_by are invoked only via selectExpr; use the functions.* Column API per the suite's convention (also affects :508, :531) — see inline

Suggestions (1)

ExpressionTypeCheckingSuite.scala:415: build each aggregate once via Seq(MaxBy, MinBy).foreach — see inline

Verification

Tests-only, so I verified the test logic rather than production code:

Precision/type preservation is asserted explicitly (res.schema.map(_.dataType) and the contract suite's dataType == dt), not left to checkAnswer, which compares rows not metadata — so a precision regression can't slip past.
Fixtures genuinely exercise nanos despite precision flooring: test 1 uses 100ns multiples (exact at every p∈[7,9]); the sub-µs ordering test pins p=9; the GROUP BY test asserts only ordering, which survives flooring (1ns→0 < 999ns→900 at p=7).
Goldens consistent: timestamp_{ntz,ltz}(9) schema, correct extremes, LTZ zone shift.

The line-length lint and the [FOLLOWUP]-ticket convention are already covered by @uros-b's review, so nothing to add there.

MaxGekk

2 addressed, 0 remaining, 0 new.
Both prior findings are applied in c72477f (only the two Scala test files changed since the last round): the three E2E tests now use the functions.max_by/min_by(Column, Column) API instead of selectExpr/expr, and the contract test loops over Seq(MaxBy, MinBy).foreach. No new findings.

Verification

Re-verified the tests-only coverage: the precision test asserts the returned dataType explicitly (precision is metadata, invisible to checkAnswer); the sub-microsecond ordering test genuinely needs the full TimestampNanosVal comparison (both values share epochMicros, differ only in nanosWithinMicro) and runs codegen + interpreted; fixtures survive precision flooring as their comments claim; goldens carry the correct extremes and timestamp_{ntz,ltz}(9) schema. The comment's contract claim matches production: MaxBy/MinBy dataType = valueExpr.dataType, gating only on ordering orderability (MaxByAndMinBy.scala:47,49-50).

MaxGekk · 2026-06-21T20:27:36Z

+1, LGTM. Merging to master/4.x.
Thank you, @stevomitric and @uros-b for review.

…y over nanosecond-precision timestamp types ### What changes were proposed in this pull request? Tests-only follow-up to SPARK-57103, extending nanos coverage to `max_by`/`min_by` over `TIMESTAMP_NTZ(p)`/`TIMESTAMP_LTZ(p)` (`p` in `[7,9]`). No production change: `MaxMinby` gates only on the ordering's orderability (which nanos pass) and returns `valueExpr.dataType`, so a nanosecond value keeps its precision. Adds `ExpressionTypeCheckingSuite` contract asserts, `TimestampNanosFunctionsSuiteBase` E2E tests (value-is-nanos precision; sub-microsecond ordering on codegen + interpreted; NULL ordering; GROUP BY), and golden `max_by`/`min_by` queries in `timestamp-{ntz,ltz}-nanos.sql`. Two-arg form only; `k`-variant and mixed precision out of scope. ### Why are the changes needed? `max_by`/`min_by` over nanos had no coverage, especially the returned-value precision case. Locks in correct behavior and guards against regressions. ### Does this PR introduce any user-facing change? No. Tests only; types stay behind `spark.sql.timestampNanosTypes.enabled`. ### How was this patch tested? New tests on JDK 17: `ExpressionTypeCheckingSuite`, `TimestampNanosFunctionsAnsiOn/OffSuite`, and `SQLQueryTestSuite -z nanos` (goldens regenerated, then verified clean). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8) Closes #56612 from stevomitric/stevomitric/min-by-max-by. Authored-by: Stevo Mitric <stevomitric2000@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 7818539) Signed-off-by: Max Gekk <max.gekk@gmail.com>

stevomitric added 2 commits June 19, 2026 14:36

Initial commit

61c99f8

uros-b approved these changes Jun 19, 2026

View reviewed changes

MaxGekk reviewed Jun 20, 2026

View reviewed changes

Comment thread sql/core/src/test/scala/org/apache/spark/sql/TimestampNanosFunctionsSuiteBase.scala Outdated

Comment thread ...lyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala

stevomitric added 2 commits June 21, 2026 12:36

resolve ci failiures

f3955fb

resolve comments

c72477f

MaxGekk approved these changes Jun 21, 2026

View reviewed changes

MaxGekk closed this in 7818539 Jun 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57103][SQL][TEST][FOLLOWUP] Add test coverage for max_by/min_by over nanosecond-precision timestamp types#56612

[SPARK-57103][SQL][TEST][FOLLOWUP] Add test coverage for max_by/min_by over nanosecond-precision timestamp types#56612
stevomitric wants to merge 4 commits into
apache:masterfrom
stevomitric:stevomitric/min-by-max-by

stevomitric commented Jun 19, 2026

Uh oh!

uros-b left a comment

Uh oh!

MaxGekk left a comment

Uh oh!

Uh oh!

Uh oh!

MaxGekk left a comment

Uh oh!

MaxGekk commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stevomitric commented Jun 19, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Design / architecture (1)

Suggestions (1)

Verification

Uh oh!

Uh oh!

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Verification

Uh oh!

MaxGekk commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants