Skip to content

[SPARK-57454][SQL] Add type coercion and widening rules for nanosecond-precision timestamp types#56638

Open
MaxGekk wants to merge 5 commits into
apache:masterfrom
MaxGekk:nanos-widening
Open

[SPARK-57454][SQL] Add type coercion and widening rules for nanosecond-precision timestamp types#56638
MaxGekk wants to merge 5 commits into
apache:masterfrom
MaxGekk:nanos-widening

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 21, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Extend findWiderDateTimeType in TypeCoercionHelper to cover the nanosecond-capable timestamp types TIMESTAMP_LTZ(p) / TIMESTAMP_NTZ(p) (p in [7, 9]). The method is generalized into a total function over datetime pairs (except TimeType, which stays None) that widens along two independent axes:

  • Time-zone family: the result is LTZ if either input is LTZ-family (TimestampType / TimestampLTZNanosType), otherwise NTZ. This mirrors the existing microsecond precedent where TIMESTAMP + TIMESTAMP_NTZ widens to TIMESTAMP. DATE is family-neutral and adopts the family of the other side.
  • Precision: the maximum of the two precisions, where the micro types and DATE count as 6 and the nanos types contribute their own p.

The resulting (family, precision) pair maps back to a concrete type: precision 6 yields the micro type (TimestampType / TimestampNTZType), precision in [7, 9] yields the nanos type (TimestampLTZNanosType(p) / TimestampNTZNanosType(p)). This reproduces every existing microsecond result exactly and adds:

  • nanos(p) + micro (same or mixed family) -> nanos at precision p
  • nanos(p1) + nanos(p2) -> nanos at max(p1, p2)
  • nanos(p) + DATE -> nanos(p)

Because both TypeCoercion.findTightestCommonType and AnsiTypeCoercion.findTightestCommonType route (d1: DatetimeType, d2: DatetimeType) through this single shared method (and findWiderTypeForTwo / findWiderCommonType / the binary-operator coercion path build on it), the change applies to UNION / CASE / coalesce / IN / binary-comparison coercion in both standard and ANSI modes with no further wiring. String promotion (nanos + string -> string) already works because the nanos types are AtomicType.

Why are the changes needed?

Umbrella: SPARK-56822 (Timestamps with nanosecond precision). Previously findWiderDateTimeType was microsecond-only and had no default arm, so any pair involving a nanosecond type (micro+nanos or nanos(p1)+nanos(p2)) was unhandled and could fail analysis. This brings implicit type coercion and widening to parity with the microsecond TimestampType / TimestampNTZType.

Does this PR introduce any user-facing change?

No. The nanosecond-capable timestamp types are gated behind a preview flag (spark.sql.timestampNanosTypes.enabled) and are not yet released.

How was this patch tested?

  • Added nanosecond widening cases to TypeCoercionSuite and AnsiTypeCoercionSuite (findTightestCommonType and findWiderTypeForTwo): nanos/nanos, micro/nanos, mixed time-zone families, nanos/date, nanos/time (-> None), and nanos/string.
  • Added a new end-to-end TimestampNanosWideningSuiteBase with ANSI-on and ANSI-off subclasses, covering UNION ALL, coalesce, IN, CASE WHEN, and binary comparisons (=, <) across micro/nanos and nanos(p1)/nanos(p2) for both the LTZ and NTZ families.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

MaxGekk added 5 commits June 21, 2026 09:36
…d-precision timestamp types

### What changes were proposed in this pull request?

Extend `findWiderDateTimeType` in `TypeCoercionHelper` to cover the nanosecond-capable
timestamp types `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` (p in [7, 9]). The method is
generalized into a total function over datetime pairs (except `TimeType`, which stays
`None`) that widens along two axes: time-zone family (LTZ wins over NTZ, mirroring the
existing `TIMESTAMP` + `TIMESTAMP_NTZ` -> `TIMESTAMP` precedent; `DATE` is family-neutral)
and precision (max, where the micro types and `DATE` count as 6 and the nanos types
contribute their own p). The resulting (family, precision) pair maps back to the micro
type at precision 6 and the nanos type at precision 7-9.

Because both `TypeCoercion.findTightestCommonType` and
`AnsiTypeCoercion.findTightestCommonType` route datetime pairs through this single shared
method, the change applies to UNION/CASE/coalesce/IN/binary-comparison coercion in both
standard and ANSI modes with no further wiring.

### Why are the changes needed?

Part of the nanosecond timestamp preview (SPARK-56822). Previously
`findWiderDateTimeType` was microsecond-only with no default arm, so any pair involving a
nanosecond type could fail analysis. This brings type coercion and widening to parity with
the microsecond `TimestampType` / `TimestampNTZType`.

### Does this PR introduce any user-facing change?

No. The nanosecond timestamp types are behind a preview flag and not yet released.

### How was this patch tested?

Added unit cases to `TypeCoercionSuite` and `AnsiTypeCoercionSuite`, and a new end-to-end
`TimestampNanosWideningSuiteBase` (ANSI on/off) covering UNION ALL, coalesce, IN, CASE
WHEN, and binary comparisons across micro/nanos and nanos/nanos for both time-zone families.
…econd timestamps

Add SPARK-57454 sections to timestamp-ntz-nanos.sql and timestamp-ltz-nanos.sql
covering implicit type coercion / widening over the nanosecond timestamp types:
UNION ALL, coalesce, CASE WHEN, greatest/least, and array/map element unification
across micro/nanos and nanos(p1)/nanos(p2), plus the mixed time-zone family rule
(LTZ wins) in the LTZ file. The resolved common type is locked with typeof() and the
end-to-end rendered values are checked. Standalone boolean comparison / IN cases and
the type-mapping matrix are intentionally omitted to avoid duplicating
TimestampNanosWideningSuite and TypeCoercionSuite / AnsiTypeCoercionSuite.
…den tests

Spread the SPARK-57454 widening cases in timestamp-ntz-nanos.sql and
timestamp-ltz-nanos.sql across the value range instead of reusing one literal:
the min/max supported timestamps, the 1582 Julian/Gregorian boundary (proleptic
Gregorian), pre/post epoch, near-current values, varied fractional parts and
precisions, and, for TIMESTAMP_LTZ, non-standard source zones (the sub-hour
offsets Asia/Kolkata +05:30 and Asia/Kathmandu +05:45). Bare LTZ literals are
interpreted in the session zone so they round-trip on rendering.
…ted datetime types

Address self-review of nanos widening in findWiderDateTimeType:
- Document why common-type resolution is intentionally more permissive than
  Cast.canUpCast / canANSIStoreAssign for nanos (cross-family and DATE <-> nanos),
  which stay explicit-CAST-only while the nanos types are unreleased.
- Fail fast (SparkException.internalError) on an unexpected DatetimeType subtype
  instead of silently mis-widening it via the family-neutral catch-all.

Co-authored-by: Isaac
…omments

Reflow two comment lines that exceeded the 100-character limit.

Co-authored-by: Isaac
@MaxGekk

MaxGekk commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

@stevomitric Could you review this PR, please.

s"Unexpected datetime types in findWiderDateTimeType: $d1, $d2")
} else if (!isLtz(d1) && !isNtz(d1) && !isLtz(d2) && !isNtz(d2)) {
// Both sides are DATE; callers short-circuit equal types, so this is just defensive.
Some(DateType)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm - this is unreachable via current callers?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed - unreachable via the current callers. findWiderDateTimeType is only called from TypeCoercion.findTightestCommonType and AnsiTypeCoercion.findTightestCommonType, both of which short-circuit case (t1, t2) if t1 == t2 => Some(t1) before reaching here, and DateType is a singleton case object, so a DATE/DATE pair never gets this far. The Some(DateType) arm is just a defensive, semantically-correct default (hence the comment). If you'd prefer no dead code, I can fold it into the internalError guard instead - happy to go either way.

Comment on lines 248 to 249
case (_, _: TimeType) => None
case (_: TimeType, _) => None

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we future-proof the TimeType guards here? The arms match _: TimeType specifically rather than _: AnyTimeType. If another AnyTimeType subtype is ever added, it would fall through to case _ and hit the internalError (acceptable fail-fast), but matching _: AnyTimeType there would be marginally more intentional.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye - TimeType is currently the only AnyTimeType subtype, so the two match the same set today. I'd lean toward keeping _: TimeType though, for two reasons: (1) these two arms are pre-existing (this PR only adds the case _ => arm), so broadening them is orthogonal to nanos widening; and (2) it would partly defeat the fail-fast guard just below - the intent is that a genuinely new DatetimeType (a future AnyTimeType subtype included) trips internalError and gets triaged explicitly, rather than silently resolving to None, which may not be the right common-type semantics for it (it could warrant its own precision-widening rule, like the timestamp families do). Happy to switch if you feel strongly, but my preference is the minimal pre-existing form.

@stevomitric stevomitric left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly nits.


-- !query
SELECT typeof(c) FROM (
SELECT TIMESTAMP_NTZ '1582-10-15 00:00:00' AS c

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mixed-family UNION/coalesce/CASE goldens assert typeof(...) only — reasonable, since the value is session-zone dependent, but it means the one place this PR introduces an implicit cross-family conversion never has its zone-shifted value locked. If the inserted Cast's sessionLocalTimeZone wiring ever regressed, these would still pass. Could you add one mixed-family case pinned to a deterministic source zone (like the same-family Asia/Kolkata coalesce already does), or is type-only sufficient for the preview?

widenTest(IntegerType, TimestampType, None)
widenTest(StringType, TimestampType, None)

// Nanosecond-precision timestamp types (SPARK-57454).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ANSI block is a strict subset of the non-ANSI one in TypeCoercionSuite (~lines 656–674): it's missing TimestampNTZNanosType(9) + TimeType(6) → None, the TimestampLTZNanosType(7) + TimestampNTZType → TimestampLTZNanosType(7) mixed-family-with-micro cell, and the nanos(8) + nanos(8) self-pair. findWiderDateTimeType is shared so risk is low, but the asymmetry looks accidental and weakens "both ANSI modes" for those cells — mirror the three, or add a comment if intentional?

def precisionOf(d: DatetimeType): Int = d match {
case t: TimestampLTZNanosType => t.precision
case t: TimestampNTZNanosType => t.precision
case _ => 6 // DateType / TimestampType / TimestampNTZType

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: 6 (micro precision) appears here and again at lines 293/295 (p <= 6). A local val MicrosPrecision = 6 would centralize it and self-document the [7,9] boundary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants