Skip to content

[SPARK-57103][SQL][TEST][FOLLOWUP] Add test coverage for max_by/min_by over nanosecond-precision timestamp types#56612

Closed
stevomitric wants to merge 4 commits into
apache:masterfrom
stevomitric:stevomitric/min-by-max-by
Closed

[SPARK-57103][SQL][TEST][FOLLOWUP] Add test coverage for max_by/min_by over nanosecond-precision timestamp types#56612
stevomitric wants to merge 4 commits into
apache:masterfrom
stevomitric:stevomitric/min-by-max-by

Conversation

@stevomitric

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Tests-only follow-up to SPARK-57103, extending nanos coverage to max_by/min_by over TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p) (p in [7,9]). No production change: MaxMinby gates only on the ordering's orderability (which nanos pass) and returns valueExpr.dataType, so a nanosecond value keeps its precision. Adds ExpressionTypeCheckingSuite contract asserts, TimestampNanosFunctionsSuiteBase E2E tests (value-is-nanos precision; sub-microsecond ordering on codegen + interpreted; NULL ordering; GROUP BY), and golden max_by/min_by queries in timestamp-{ntz,ltz}-nanos.sql. Two-arg form only; k-variant and mixed precision out of scope.

Why are the changes needed?

max_by/min_by over nanos had no coverage, especially the returned-value precision case. Locks in correct behavior and guards against regressions.

Does this PR introduce any user-facing change?

No. Tests only; types stay behind spark.sql.timestampNanosTypes.enabled.

How was this patch tested?

New tests on JDK 17: ExpressionTypeCheckingSuite, TimestampNanosFunctionsAnsiOn/OffSuite, and SQLQueryTestSuite -z nanos (goldens regenerated, then verified clean).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Resolve conflicts against SPARK-57527 (unix_nanos over nanos), which edited the
same nanosecond-timestamp test files:
- TimestampNanosFunctionsSuiteBase.scala: keep both the SPARK-57103 max_by/min_by
  tests and the SPARK-57527 unix_nanos tests.
- timestamp-{ntz,ltz}-nanos.sql: keep both query sections.
- results/ and analyzer-results/ golden files regenerated.

Co-authored-by: Isaac

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @stevomitric! Please address the linter failures (https://github.com/stevomitric/spark/actions/runs/27833368757/job/82375857257). Also, please note that we should generally avoid making "[FOLLOWUP]" tickets in Spark. Please create a new Jira ticket for these improvements, and track the new patch there.

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 2 non-blocking, 0 nits.
The test logic is correct, but the E2E tests should hit the Scala functions.* Column API, not only the SQL path (which is already covered by the golden .sql files this PR adds).

Design / architecture (1)

  • TimestampNanosFunctionsSuiteBase.scala:478: max_by/min_by are invoked only via selectExpr; use the functions.* Column API per the suite's convention (also affects :508, :531) — see inline

Suggestions (1)

  • ExpressionTypeCheckingSuite.scala:415: build each aggregate once via Seq(MaxBy, MinBy).foreach — see inline

Verification

Tests-only, so I verified the test logic rather than production code:

  • Precision/type preservation is asserted explicitly (res.schema.map(_.dataType) and the contract suite's dataType == dt), not left to checkAnswer, which compares rows not metadata — so a precision regression can't slip past.
  • Fixtures genuinely exercise nanos despite precision flooring: test 1 uses 100ns multiples (exact at every p∈[7,9]); the sub-µs ordering test pins p=9; the GROUP BY test asserts only ordering, which survives flooring (1ns→0 < 999ns→900 at p=7).
  • Goldens consistent: timestamp_{ntz,ltz}(9) schema, correct extremes, LTZ zone shift.

The line-length lint and the [FOLLOWUP]-ticket convention are already covered by @uros-b's review, so nothing to add there.

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 addressed, 0 remaining, 0 new.
Both prior findings are applied in c72477f (only the two Scala test files changed since the last round): the three E2E tests now use the functions.max_by/min_by(Column, Column) API instead of selectExpr/expr, and the contract test loops over Seq(MaxBy, MinBy).foreach. No new findings.

Verification

Re-verified the tests-only coverage: the precision test asserts the returned dataType explicitly (precision is metadata, invisible to checkAnswer); the sub-microsecond ordering test genuinely needs the full TimestampNanosVal comparison (both values share epochMicros, differ only in nanosWithinMicro) and runs codegen + interpreted; fixtures survive precision flooring as their comments claim; goldens carry the correct extremes and timestamp_{ntz,ltz}(9) schema. The comment's contract claim matches production: MaxBy/MinBy dataType = valueExpr.dataType, gating only on ordering orderability (MaxByAndMinBy.scala:47,49-50).

@MaxGekk

MaxGekk commented Jun 21, 2026

Copy link
Copy Markdown
Member

+1, LGTM. Merging to master/4.x.
Thank you, @stevomitric and @uros-b for review.

@MaxGekk MaxGekk closed this in 7818539 Jun 21, 2026
MaxGekk pushed a commit that referenced this pull request Jun 21, 2026
…y over nanosecond-precision timestamp types

### What changes were proposed in this pull request?
Tests-only follow-up to SPARK-57103, extending nanos coverage to `max_by`/`min_by` over `TIMESTAMP_NTZ(p)`/`TIMESTAMP_LTZ(p)` (`p` in `[7,9]`). No production change: `MaxMinby` gates only on the ordering's orderability (which nanos pass) and returns `valueExpr.dataType`, so a nanosecond value keeps its precision. Adds `ExpressionTypeCheckingSuite` contract asserts, `TimestampNanosFunctionsSuiteBase` E2E tests (value-is-nanos precision; sub-microsecond ordering on codegen + interpreted; NULL ordering; GROUP BY), and golden `max_by`/`min_by` queries in `timestamp-{ntz,ltz}-nanos.sql`. Two-arg form only; `k`-variant and mixed precision out of scope.

### Why are the changes needed?
`max_by`/`min_by` over nanos had no coverage, especially the returned-value precision case. Locks in correct behavior and guards against regressions.

### Does this PR introduce any user-facing change?
No. Tests only; types stay behind `spark.sql.timestampNanosTypes.enabled`.

### How was this patch tested?
New tests on JDK 17: `ExpressionTypeCheckingSuite`, `TimestampNanosFunctionsAnsiOn/OffSuite`, and `SQLQueryTestSuite -z nanos` (goldens regenerated, then verified clean).

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)

Closes #56612 from stevomitric/stevomitric/min-by-max-by.

Authored-by: Stevo Mitric <stevomitric2000@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 7818539)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants