[SPARK-57454][SQL] Add type coercion and widening rules for nanosecond-precision timestamp types#56638
[SPARK-57454][SQL] Add type coercion and widening rules for nanosecond-precision timestamp types#56638MaxGekk wants to merge 5 commits into
Conversation
…d-precision timestamp types ### What changes were proposed in this pull request? Extend `findWiderDateTimeType` in `TypeCoercionHelper` to cover the nanosecond-capable timestamp types `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` (p in [7, 9]). The method is generalized into a total function over datetime pairs (except `TimeType`, which stays `None`) that widens along two axes: time-zone family (LTZ wins over NTZ, mirroring the existing `TIMESTAMP` + `TIMESTAMP_NTZ` -> `TIMESTAMP` precedent; `DATE` is family-neutral) and precision (max, where the micro types and `DATE` count as 6 and the nanos types contribute their own p). The resulting (family, precision) pair maps back to the micro type at precision 6 and the nanos type at precision 7-9. Because both `TypeCoercion.findTightestCommonType` and `AnsiTypeCoercion.findTightestCommonType` route datetime pairs through this single shared method, the change applies to UNION/CASE/coalesce/IN/binary-comparison coercion in both standard and ANSI modes with no further wiring. ### Why are the changes needed? Part of the nanosecond timestamp preview (SPARK-56822). Previously `findWiderDateTimeType` was microsecond-only with no default arm, so any pair involving a nanosecond type could fail analysis. This brings type coercion and widening to parity with the microsecond `TimestampType` / `TimestampNTZType`. ### Does this PR introduce any user-facing change? No. The nanosecond timestamp types are behind a preview flag and not yet released. ### How was this patch tested? Added unit cases to `TypeCoercionSuite` and `AnsiTypeCoercionSuite`, and a new end-to-end `TimestampNanosWideningSuiteBase` (ANSI on/off) covering UNION ALL, coalesce, IN, CASE WHEN, and binary comparisons across micro/nanos and nanos/nanos for both time-zone families.
…econd timestamps Add SPARK-57454 sections to timestamp-ntz-nanos.sql and timestamp-ltz-nanos.sql covering implicit type coercion / widening over the nanosecond timestamp types: UNION ALL, coalesce, CASE WHEN, greatest/least, and array/map element unification across micro/nanos and nanos(p1)/nanos(p2), plus the mixed time-zone family rule (LTZ wins) in the LTZ file. The resolved common type is locked with typeof() and the end-to-end rendered values are checked. Standalone boolean comparison / IN cases and the type-mapping matrix are intentionally omitted to avoid duplicating TimestampNanosWideningSuite and TypeCoercionSuite / AnsiTypeCoercionSuite.
…den tests Spread the SPARK-57454 widening cases in timestamp-ntz-nanos.sql and timestamp-ltz-nanos.sql across the value range instead of reusing one literal: the min/max supported timestamps, the 1582 Julian/Gregorian boundary (proleptic Gregorian), pre/post epoch, near-current values, varied fractional parts and precisions, and, for TIMESTAMP_LTZ, non-standard source zones (the sub-hour offsets Asia/Kolkata +05:30 and Asia/Kathmandu +05:45). Bare LTZ literals are interpreted in the session zone so they round-trip on rendering.
…ted datetime types Address self-review of nanos widening in findWiderDateTimeType: - Document why common-type resolution is intentionally more permissive than Cast.canUpCast / canANSIStoreAssign for nanos (cross-family and DATE <-> nanos), which stay explicit-CAST-only while the nanos types are unreleased. - Fail fast (SparkException.internalError) on an unexpected DatetimeType subtype instead of silently mis-widening it via the family-neutral catch-all. Co-authored-by: Isaac
…omments Reflow two comment lines that exceeded the 100-character limit. Co-authored-by: Isaac
|
@stevomitric Could you review this PR, please. |
| s"Unexpected datetime types in findWiderDateTimeType: $d1, $d2") | ||
| } else if (!isLtz(d1) && !isNtz(d1) && !isLtz(d2) && !isNtz(d2)) { | ||
| // Both sides are DATE; callers short-circuit equal types, so this is just defensive. | ||
| Some(DateType) |
There was a problem hiding this comment.
Just to confirm - this is unreachable via current callers?
There was a problem hiding this comment.
Confirmed - unreachable via the current callers. findWiderDateTimeType is only called from TypeCoercion.findTightestCommonType and AnsiTypeCoercion.findTightestCommonType, both of which short-circuit case (t1, t2) if t1 == t2 => Some(t1) before reaching here, and DateType is a singleton case object, so a DATE/DATE pair never gets this far. The Some(DateType) arm is just a defensive, semantically-correct default (hence the comment). If you'd prefer no dead code, I can fold it into the internalError guard instead - happy to go either way.
| case (_, _: TimeType) => None | ||
| case (_: TimeType, _) => None |
There was a problem hiding this comment.
Nit: Should we future-proof the TimeType guards here? The arms match _: TimeType specifically rather than _: AnyTimeType. If another AnyTimeType subtype is ever added, it would fall through to case _ and hit the internalError (acceptable fail-fast), but matching _: AnyTimeType there would be marginally more intentional.
There was a problem hiding this comment.
Good eye - TimeType is currently the only AnyTimeType subtype, so the two match the same set today. I'd lean toward keeping _: TimeType though, for two reasons: (1) these two arms are pre-existing (this PR only adds the case _ => arm), so broadening them is orthogonal to nanos widening; and (2) it would partly defeat the fail-fast guard just below - the intent is that a genuinely new DatetimeType (a future AnyTimeType subtype included) trips internalError and gets triaged explicitly, rather than silently resolving to None, which may not be the right common-type semantics for it (it could warrant its own precision-widening rule, like the timestamp families do). Happy to switch if you feel strongly, but my preference is the minimal pre-existing form.
|
|
||
| -- !query | ||
| SELECT typeof(c) FROM ( | ||
| SELECT TIMESTAMP_NTZ '1582-10-15 00:00:00' AS c |
There was a problem hiding this comment.
The mixed-family UNION/coalesce/CASE goldens assert typeof(...) only — reasonable, since the value is session-zone dependent, but it means the one place this PR introduces an implicit cross-family conversion never has its zone-shifted value locked. If the inserted Cast's sessionLocalTimeZone wiring ever regressed, these would still pass. Could you add one mixed-family case pinned to a deterministic source zone (like the same-family Asia/Kolkata coalesce already does), or is type-only sufficient for the preview?
| widenTest(IntegerType, TimestampType, None) | ||
| widenTest(StringType, TimestampType, None) | ||
|
|
||
| // Nanosecond-precision timestamp types (SPARK-57454). |
There was a problem hiding this comment.
This ANSI block is a strict subset of the non-ANSI one in TypeCoercionSuite (~lines 656–674): it's missing TimestampNTZNanosType(9) + TimeType(6) → None, the TimestampLTZNanosType(7) + TimestampNTZType → TimestampLTZNanosType(7) mixed-family-with-micro cell, and the nanos(8) + nanos(8) self-pair. findWiderDateTimeType is shared so risk is low, but the asymmetry looks accidental and weakens "both ANSI modes" for those cells — mirror the three, or add a comment if intentional?
| def precisionOf(d: DatetimeType): Int = d match { | ||
| case t: TimestampLTZNanosType => t.precision | ||
| case t: TimestampNTZNanosType => t.precision | ||
| case _ => 6 // DateType / TimestampType / TimestampNTZType |
There was a problem hiding this comment.
Minor: 6 (micro precision) appears here and again at lines 293/295 (p <= 6). A local val MicrosPrecision = 6 would centralize it and self-document the [7,9] boundary.
What changes were proposed in this pull request?
Extend
findWiderDateTimeTypeinTypeCoercionHelperto cover the nanosecond-capable timestamp typesTIMESTAMP_LTZ(p)/TIMESTAMP_NTZ(p)(pin[7, 9]). The method is generalized into a total function over datetime pairs (exceptTimeType, which staysNone) that widens along two independent axes:TimestampType/TimestampLTZNanosType), otherwise NTZ. This mirrors the existing microsecond precedent whereTIMESTAMP+TIMESTAMP_NTZwidens toTIMESTAMP.DATEis family-neutral and adopts the family of the other side.DATEcount as 6 and the nanos types contribute their ownp.The resulting
(family, precision)pair maps back to a concrete type: precision 6 yields the micro type (TimestampType/TimestampNTZType), precision in[7, 9]yields the nanos type (TimestampLTZNanosType(p)/TimestampNTZNanosType(p)). This reproduces every existing microsecond result exactly and adds:Because both
TypeCoercion.findTightestCommonTypeandAnsiTypeCoercion.findTightestCommonTyperoute(d1: DatetimeType, d2: DatetimeType)through this single shared method (andfindWiderTypeForTwo/findWiderCommonType/ the binary-operator coercion path build on it), the change applies toUNION/CASE/coalesce/IN/ binary-comparison coercion in both standard and ANSI modes with no further wiring. String promotion (nanos + string -> string) already works because the nanos types areAtomicType.Why are the changes needed?
Umbrella: SPARK-56822 (Timestamps with nanosecond precision). Previously
findWiderDateTimeTypewas microsecond-only and had no default arm, so any pair involving a nanosecond type (micro+nanos or nanos(p1)+nanos(p2)) was unhandled and could fail analysis. This brings implicit type coercion and widening to parity with the microsecondTimestampType/TimestampNTZType.Does this PR introduce any user-facing change?
No. The nanosecond-capable timestamp types are gated behind a preview flag (
spark.sql.timestampNanosTypes.enabled) and are not yet released.How was this patch tested?
TypeCoercionSuiteandAnsiTypeCoercionSuite(findTightestCommonTypeandfindWiderTypeForTwo): nanos/nanos, micro/nanos, mixed time-zone families, nanos/date, nanos/time (->None), and nanos/string.TimestampNanosWideningSuiteBasewith ANSI-on and ANSI-off subclasses, coveringUNION ALL,coalesce,IN,CASE WHEN, and binary comparisons (=,<) across micro/nanos and nanos(p1)/nanos(p2) for both the LTZ and NTZ families.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)