[SPARK-57454][SQL] Add type coercion and widening rules for nanosecond-precision timestamp types by MaxGekk · Pull Request #56638 · apache/spark

MaxGekk · 2026-06-21T07:38:49Z

What changes were proposed in this pull request?

Extend findWiderDateTimeType in TypeCoercionHelper to cover the nanosecond-capable timestamp types TIMESTAMP_LTZ(p) / TIMESTAMP_NTZ(p) (p in [7, 9]). The method is generalized into a total function over datetime pairs (except TimeType, which stays None) that widens along two independent axes:

Time-zone family: the result is LTZ if either input is LTZ-family (TimestampType / TimestampLTZNanosType), otherwise NTZ. This mirrors the existing microsecond precedent where TIMESTAMP + TIMESTAMP_NTZ widens to TIMESTAMP. DATE is family-neutral and adopts the family of the other side.
Precision: the maximum of the two precisions, where the micro types and DATE count as 6 and the nanos types contribute their own p.

The resulting (family, precision) pair maps back to a concrete type: precision 6 yields the micro type (TimestampType / TimestampNTZType), precision in [7, 9] yields the nanos type (TimestampLTZNanosType(p) / TimestampNTZNanosType(p)). This reproduces every existing microsecond result exactly and adds:

nanos(p) + micro (same or mixed family) -> nanos at precision p
nanos(p1) + nanos(p2) -> nanos at max(p1, p2)
nanos(p) + DATE -> nanos(p)

Because both TypeCoercion.findTightestCommonType and AnsiTypeCoercion.findTightestCommonType route (d1: DatetimeType, d2: DatetimeType) through this single shared method (and findWiderTypeForTwo / findWiderCommonType / the binary-operator coercion path build on it), the change applies to UNION / CASE / coalesce / IN / binary-comparison coercion in both standard and ANSI modes with no further wiring. String promotion (nanos + string -> string) already works because the nanos types are AtomicType.

Why are the changes needed?

Umbrella: SPARK-56822 (Timestamps with nanosecond precision). Previously findWiderDateTimeType was microsecond-only and had no default arm, so any pair involving a nanosecond type (micro+nanos or nanos(p1)+nanos(p2)) was unhandled and could fail analysis. This brings implicit type coercion and widening to parity with the microsecond TimestampType / TimestampNTZType.

Does this PR introduce any user-facing change?

No. The nanosecond-capable timestamp types are gated behind a preview flag (spark.sql.timestampNanosTypes.enabled) and are not yet released.

How was this patch tested?

Added nanosecond widening cases to TypeCoercionSuite and AnsiTypeCoercionSuite (findTightestCommonType and findWiderTypeForTwo): nanos/nanos, micro/nanos, mixed time-zone families, nanos/date, nanos/time (-> None), and nanos/string.
Added a new end-to-end TimestampNanosWideningSuiteBase with ANSI-on and ANSI-off subclasses, covering UNION ALL, coalesce, IN, CASE WHEN, and binary comparisons (=, <) across micro/nanos and nanos(p1)/nanos(p2) for both the LTZ and NTZ families.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

…d-precision timestamp types ### What changes were proposed in this pull request? Extend `findWiderDateTimeType` in `TypeCoercionHelper` to cover the nanosecond-capable timestamp types `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` (p in [7, 9]). The method is generalized into a total function over datetime pairs (except `TimeType`, which stays `None`) that widens along two axes: time-zone family (LTZ wins over NTZ, mirroring the existing `TIMESTAMP` + `TIMESTAMP_NTZ` -> `TIMESTAMP` precedent; `DATE` is family-neutral) and precision (max, where the micro types and `DATE` count as 6 and the nanos types contribute their own p). The resulting (family, precision) pair maps back to the micro type at precision 6 and the nanos type at precision 7-9. Because both `TypeCoercion.findTightestCommonType` and `AnsiTypeCoercion.findTightestCommonType` route datetime pairs through this single shared method, the change applies to UNION/CASE/coalesce/IN/binary-comparison coercion in both standard and ANSI modes with no further wiring. ### Why are the changes needed? Part of the nanosecond timestamp preview (SPARK-56822). Previously `findWiderDateTimeType` was microsecond-only with no default arm, so any pair involving a nanosecond type could fail analysis. This brings type coercion and widening to parity with the microsecond `TimestampType` / `TimestampNTZType`. ### Does this PR introduce any user-facing change? No. The nanosecond timestamp types are behind a preview flag and not yet released. ### How was this patch tested? Added unit cases to `TypeCoercionSuite` and `AnsiTypeCoercionSuite`, and a new end-to-end `TimestampNanosWideningSuiteBase` (ANSI on/off) covering UNION ALL, coalesce, IN, CASE WHEN, and binary comparisons across micro/nanos and nanos/nanos for both time-zone families.

…econd timestamps Add SPARK-57454 sections to timestamp-ntz-nanos.sql and timestamp-ltz-nanos.sql covering implicit type coercion / widening over the nanosecond timestamp types: UNION ALL, coalesce, CASE WHEN, greatest/least, and array/map element unification across micro/nanos and nanos(p1)/nanos(p2), plus the mixed time-zone family rule (LTZ wins) in the LTZ file. The resolved common type is locked with typeof() and the end-to-end rendered values are checked. Standalone boolean comparison / IN cases and the type-mapping matrix are intentionally omitted to avoid duplicating TimestampNanosWideningSuite and TypeCoercionSuite / AnsiTypeCoercionSuite.

…den tests Spread the SPARK-57454 widening cases in timestamp-ntz-nanos.sql and timestamp-ltz-nanos.sql across the value range instead of reusing one literal: the min/max supported timestamps, the 1582 Julian/Gregorian boundary (proleptic Gregorian), pre/post epoch, near-current values, varied fractional parts and precisions, and, for TIMESTAMP_LTZ, non-standard source zones (the sub-hour offsets Asia/Kolkata +05:30 and Asia/Kathmandu +05:45). Bare LTZ literals are interpreted in the session zone so they round-trip on rendering.

…ted datetime types Address self-review of nanos widening in findWiderDateTimeType: - Document why common-type resolution is intentionally more permissive than Cast.canUpCast / canANSIStoreAssign for nanos (cross-family and DATE <-> nanos), which stay explicit-CAST-only while the nanos types are unreleased. - Fail fast (SparkException.internalError) on an unexpected DatetimeType subtype instead of silently mis-widening it via the family-neutral catch-all. Co-authored-by: Isaac

…omments Reflow two comment lines that exceeded the 100-character limit. Co-authored-by: Isaac

MaxGekk · 2026-06-21T09:30:15Z

@stevomitric Could you review this PR, please.

uros-b · 2026-06-21T14:16:51Z

+            s"Unexpected datetime types in findWiderDateTimeType: $d1, $d2")
+        } else if (!isLtz(d1) && !isNtz(d1) && !isLtz(d2) && !isNtz(d2)) {
+          // Both sides are DATE; callers short-circuit equal types, so this is just defensive.
+          Some(DateType)


Just to confirm - this is unreachable via current callers?

Confirmed - unreachable via the current callers. findWiderDateTimeType is only called from TypeCoercion.findTightestCommonType and AnsiTypeCoercion.findTightestCommonType, both of which short-circuit case (t1, t2) if t1 == t2 => Some(t1) before reaching here, and DateType is a singleton case object, so a DATE/DATE pair never gets this far. The Some(DateType) arm is just a defensive, semantically-correct default (hence the comment). If you'd prefer no dead code, I can fold it into the internalError guard instead - happy to go either way.

uros-b · 2026-06-21T14:23:53Z

      case (_, _: TimeType) => None
      case (_: TimeType, _) => None


Nit: Should we future-proof the TimeType guards here? The arms match _: TimeType specifically rather than _: AnyTimeType. If another AnyTimeType subtype is ever added, it would fall through to case _ and hit the internalError (acceptable fail-fast), but matching _: AnyTimeType there would be marginally more intentional.

Good eye - TimeType is currently the only AnyTimeType subtype, so the two match the same set today. I'd lean toward keeping _: TimeType though, for two reasons: (1) these two arms are pre-existing (this PR only adds the case _ => arm), so broadening them is orthogonal to nanos widening; and (2) it would partly defeat the fail-fast guard just below - the intent is that a genuinely new DatetimeType (a future AnyTimeType subtype included) trips internalError and gets triaged explicitly, rather than silently resolving to None, which may not be the right common-type semantics for it (it could warrant its own precision-widening rule, like the timestamp families do). Happy to switch if you feel strongly, but my preference is the minimal pre-existing form.

stevomitric

LGTM, mostly nits.

stevomitric · 2026-06-21T21:17:55Z

+
+-- !query
+SELECT typeof(c) FROM (
+    SELECT TIMESTAMP_NTZ '1582-10-15 00:00:00' AS c


The mixed-family UNION/coalesce/CASE goldens assert typeof(...) only — reasonable, since the value is session-zone dependent, but it means the one place this PR introduces an implicit cross-family conversion never has its zone-shifted value locked. If the inserted Cast's sessionLocalTimeZone wiring ever regressed, these would still pass. Could you add one mixed-family case pinned to a deterministic source zone (like the same-family Asia/Kolkata coalesce already does), or is type-only sufficient for the preview?

stevomitric · 2026-06-21T21:19:13Z

    widenTest(IntegerType, TimestampType, None)
    widenTest(StringType, TimestampType, None)

+    // Nanosecond-precision timestamp types (SPARK-57454).


This ANSI block is a strict subset of the non-ANSI one in TypeCoercionSuite (~lines 656–674): it's missing TimestampNTZNanosType(9) + TimeType(6) → None, the TimestampLTZNanosType(7) + TimestampNTZType → TimestampLTZNanosType(7) mixed-family-with-micro cell, and the nanos(8) + nanos(8) self-pair. findWiderDateTimeType is shared so risk is low, but the asymmetry looks accidental and weakens "both ANSI modes" for those cells — mirror the three, or add a comment if intentional?

stevomitric · 2026-06-21T21:20:33Z

+        def precisionOf(d: DatetimeType): Int = d match {
+          case t: TimestampLTZNanosType => t.precision
+          case t: TimestampNTZNanosType => t.precision
+          case _ => 6 // DateType / TimestampType / TimestampNTZType


Minor: 6 (micro precision) appears here and again at lines 293/295 (p <= 6). A local val MicrosPrecision = 6 would centralize it and self-document the [7,9] boundary.

MaxGekk added 5 commits June 21, 2026 09:36

[SPARK-57454][SQL] Fix scalastyle line-length in TypeCoercionHelper c…

fbfe68d

…omments Reflow two comment lines that exceeded the 100-character limit. Co-authored-by: Isaac

uros-b reviewed Jun 21, 2026

View reviewed changes

stevomitric approved these changes Jun 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57454][SQL] Add type coercion and widening rules for nanosecond-precision timestamp types#56638

[SPARK-57454][SQL] Add type coercion and widening rules for nanosecond-precision timestamp types#56638
MaxGekk wants to merge 5 commits into
apache:masterfrom
MaxGekk:nanos-widening

MaxGekk commented Jun 21, 2026

Uh oh!

MaxGekk commented Jun 21, 2026

Uh oh!

uros-b Jun 21, 2026

Uh oh!

MaxGekk Jun 21, 2026

Uh oh!

uros-b Jun 21, 2026

Uh oh!

MaxGekk Jun 21, 2026

Uh oh!

stevomitric left a comment

Uh oh!

stevomitric Jun 21, 2026

Uh oh!

stevomitric Jun 21, 2026

Uh oh!

stevomitric Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MaxGekk commented Jun 21, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 21, 2026

Uh oh!

uros-b Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

stevomitric left a comment

Choose a reason for hiding this comment

Uh oh!

stevomitric Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

stevomitric Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

stevomitric Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants