Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

package org.apache.spark.sql.catalyst.analysis

import org.apache.spark.SparkException
import org.apache.spark.sql.catalyst.analysis.TypeCoercion.PromoteStrings.conf
import org.apache.spark.sql.catalyst.expressions.{
Alias,
Expand Down Expand Up @@ -82,6 +83,8 @@ import org.apache.spark.sql.types.{
StringType,
StringTypeExpression,
StructType,
TimestampLTZNanosType,
TimestampNTZNanosType,
TimestampNTZType,
TimestampType,
TimestampTypeExpression,
Expand Down Expand Up @@ -244,14 +247,58 @@ abstract class TypeCoercionHelper {
(d1, d2) match {
case (_, _: TimeType) => None
case (_: TimeType, _) => None
Comment on lines 248 to 249

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we future-proof the TimeType guards here? The arms match _: TimeType specifically rather than _: AnyTimeType. If another AnyTimeType subtype is ever added, it would fall through to case _ and hit the internalError (acceptable fail-fast), but matching _: AnyTimeType there would be marginally more intentional.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye - TimeType is currently the only AnyTimeType subtype, so the two match the same set today. I'd lean toward keeping _: TimeType though, for two reasons: (1) these two arms are pre-existing (this PR only adds the case _ => arm), so broadening them is orthogonal to nanos widening; and (2) it would partly defeat the fail-fast guard just below - the intent is that a genuinely new DatetimeType (a future AnyTimeType subtype included) trips internalError and gets triaged explicitly, rather than silently resolving to None, which may not be the right common-type semantics for it (it could warrant its own precision-widening rule, like the timestamp families do). Happy to switch if you feel strongly, but my preference is the minimal pre-existing form.

case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) =>
Some(TimestampType)

case (_: TimestampType, _: TimestampNTZType) | (_: TimestampNTZType, _: TimestampType) =>
Some(TimestampType)

case (_: TimestampNTZType, _: DateType) | (_: DateType, _: TimestampNTZType) =>
Some(TimestampNTZType)
// The remaining datetime types (DATE and the micro/nanos TIMESTAMP_LTZ / TIMESTAMP_NTZ
// families) widen along two independent axes:
// - time-zone family: the result is LTZ if either input is LTZ-family, otherwise NTZ. This
// mirrors the microsecond precedent where TIMESTAMP + TIMESTAMP_NTZ widens to TIMESTAMP.
// DATE is family-neutral and adopts the family of the other side.
// - precision: the maximum of the two precisions, where the micro types and DATE count as 6
// and the nanos types contribute their own precision p in [7, 9].
// The (family, precision) pair then maps back to a concrete type: precision 6 yields the
// micro type, precision in [7, 9] yields the nanos type.
//
// Note: this common-type resolution is intentionally more permissive than the nanosecond
// conversion rules in Cast.canUpCast / Cast.canANSIStoreAssign, which keep cross-family and
// DATE <-> nanos casts explicit-CAST-only while the nanos types are unreleased (SPARK-57323
// etc.). Coercion here mirrors the microsecond precedent so that UNION / CASE / coalesce /
// IN / comparison resolve a common type the same way they do for the micro families; the
// stricter explicit-only stance is deliberately scoped to up-cast and store assignment, not
// to common-type resolution.
case _ =>
// Fractional-seconds precision of the microsecond timestamp types; the nanos types carry
// 7-9. DATE has no time component and is treated as the micro precision so that
// DATE <-> micro widens to the micro type and DATE <-> nanos to the nanos type.
val MicrosPrecision = 6
def isLtz(d: DatetimeType): Boolean =
d.isInstanceOf[TimestampType] || d.isInstanceOf[TimestampLTZNanosType]
def isNtz(d: DatetimeType): Boolean =
d.isInstanceOf[TimestampNTZType] || d.isInstanceOf[TimestampNTZNanosType]
def precisionOf(d: DatetimeType): Int = d match {
case t: TimestampLTZNanosType => t.precision
case t: TimestampNTZNanosType => t.precision
case _ => MicrosPrecision // DateType / TimestampType / TimestampNTZType
}
// Beyond TimeType (handled above), the only datetime types are DATE and the micro/nanos
// timestamp families. Guard so that a future DatetimeType subtype fails fast here instead
// of being silently mis-widened (treated as a family-neutral precision-6 type and folded
// into DATE) when it should be wired in explicitly.
def isWidenable(d: DatetimeType): Boolean =
isLtz(d) || isNtz(d) || d.isInstanceOf[DateType]
if (!isWidenable(d1) || !isWidenable(d2)) {
throw SparkException.internalError(
s"Unexpected datetime types in findWiderDateTimeType: $d1, $d2")
} else if (!isLtz(d1) && !isNtz(d1) && !isLtz(d2) && !isNtz(d2)) {
// Both sides are DATE; callers short-circuit equal types, so this is just defensive.
Some(DateType)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm - this is unreachable via current callers?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed - unreachable via the current callers. findWiderDateTimeType is only called from TypeCoercion.findTightestCommonType and AnsiTypeCoercion.findTightestCommonType, both of which short-circuit case (t1, t2) if t1 == t2 => Some(t1) before reaching here, and DateType is a singleton case object, so a DATE/DATE pair never gets this far. The Some(DateType) arm is just a defensive, semantically-correct default (hence the comment). If you'd prefer no dead code, I can fold it into the internalError guard instead - happy to go either way.

} else {
val p = math.max(precisionOf(d1), precisionOf(d2))
if (isLtz(d1) || isLtz(d2)) {
Some(if (p <= MicrosPrecision) TimestampType else TimestampLTZNanosType(p))
} else {
Some(if (p <= MicrosPrecision) TimestampNTZType else TimestampNTZNanosType(p))
}
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,15 @@ class AnsiTypeCoercionSuite extends TypeCoercionSuiteBase {
Seq(DateType, TimestampType, BinaryType, BooleanType).foreach { dt =>
widenTest(dt, StringType, Some(dt))
}

// Nanosecond-precision timestamp types (SPARK-57454).
Seq(7, 8, 9).foreach { p =>
widenTest(TimestampLTZNanosType(p), StringType, Some(TimestampLTZNanosType(p)))
widenTest(TimestampNTZNanosType(p), StringType, Some(TimestampNTZNanosType(p)))
}
widenTest(TimestampType, TimestampLTZNanosType(9), Some(TimestampLTZNanosType(9)))
widenTest(TimestampLTZNanosType(7), TimestampNTZNanosType(9), Some(TimestampLTZNanosType(9)))
widenTest(DateType, TimestampNTZNanosType(7), Some(TimestampNTZNanosType(7)))
}

test("tightest common bound for types") {
Expand Down Expand Up @@ -219,6 +228,29 @@ class AnsiTypeCoercionSuite extends TypeCoercionSuiteBase {
widenTest(IntegerType, TimestampType, None)
widenTest(StringType, TimestampType, None)

// Nanosecond-precision timestamp types (SPARK-57454). Kept in sync with the same block in
// TypeCoercionSuite, since both findTightestCommonType impls share findWiderDateTimeType.
// nanos(p1) <-> nanos(p2) within the same family widen to the max precision (incl. self-pair).
widenTest(TimestampLTZNanosType(7), TimestampLTZNanosType(9), Some(TimestampLTZNanosType(9)))
widenTest(TimestampLTZNanosType(8), TimestampLTZNanosType(8), Some(TimestampLTZNanosType(8)))
widenTest(TimestampNTZNanosType(7), TimestampNTZNanosType(9), Some(TimestampNTZNanosType(9)))
// micro <-> nanos within the same family widen to the nanos type.
widenTest(TimestampType, TimestampLTZNanosType(7), Some(TimestampLTZNanosType(7)))
widenTest(TimestampNTZType, TimestampNTZNanosType(8), Some(TimestampNTZNanosType(8)))
// Mixed time-zone families widen to the LTZ family (mirrors TIMESTAMP + TIMESTAMP_NTZ).
widenTest(TimestampLTZNanosType(7), TimestampNTZNanosType(9), Some(TimestampLTZNanosType(9)))
widenTest(TimestampLTZNanosType(7), TimestampNTZType, Some(TimestampLTZNanosType(7)))
widenTest(TimestampType, TimestampNTZNanosType(9), Some(TimestampLTZNanosType(9)))
// nanos <-> date widen to the nanos type of the same family.
widenTest(DateType, TimestampLTZNanosType(8), Some(TimestampLTZNanosType(8)))
widenTest(DateType, TimestampNTZNanosType(7), Some(TimestampNTZNanosType(7)))
// nanos <-> TIME has no common datetime type.
widenTest(TimestampLTZNanosType(9), TimeType(6), None)
widenTest(TimestampNTZNanosType(9), TimeType(6), None)
// No common type with non-datetime types.
widenTest(IntegerType, TimestampLTZNanosType(9), None)
widenTest(StringType, TimestampNTZNanosType(9), None)

// ComplexType
widenTest(NullType,
MapType(IntegerType, StringType, false),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -652,6 +652,28 @@ class TypeCoercionSuite extends TypeCoercionSuiteBase {
widenTest(IntegerType, TimestampType, None)
widenTest(StringType, TimestampType, None)

// Nanosecond-precision timestamp types (SPARK-57454).
// nanos(p1) <-> nanos(p2) within the same family widen to the max precision.
widenTest(TimestampLTZNanosType(7), TimestampLTZNanosType(9), Some(TimestampLTZNanosType(9)))
widenTest(TimestampLTZNanosType(8), TimestampLTZNanosType(8), Some(TimestampLTZNanosType(8)))
widenTest(TimestampNTZNanosType(7), TimestampNTZNanosType(9), Some(TimestampNTZNanosType(9)))
// micro <-> nanos within the same family widen to the nanos type.
widenTest(TimestampType, TimestampLTZNanosType(7), Some(TimestampLTZNanosType(7)))
widenTest(TimestampNTZType, TimestampNTZNanosType(8), Some(TimestampNTZNanosType(8)))
// Mixed time-zone families widen to the LTZ family (mirrors TIMESTAMP + TIMESTAMP_NTZ).
widenTest(TimestampLTZNanosType(7), TimestampNTZNanosType(9), Some(TimestampLTZNanosType(9)))
widenTest(TimestampLTZNanosType(7), TimestampNTZType, Some(TimestampLTZNanosType(7)))
widenTest(TimestampType, TimestampNTZNanosType(9), Some(TimestampLTZNanosType(9)))
// nanos <-> date widen to the nanos type of the same family.
widenTest(DateType, TimestampLTZNanosType(8), Some(TimestampLTZNanosType(8)))
widenTest(DateType, TimestampNTZNanosType(7), Some(TimestampNTZNanosType(7)))
// nanos <-> TIME has no common datetime type.
widenTest(TimestampLTZNanosType(9), TimeType(6), None)
widenTest(TimestampNTZNanosType(9), TimeType(6), None)
// No common type with non-datetime types.
widenTest(IntegerType, TimestampLTZNanosType(9), None)
widenTest(StringType, TimestampNTZNanosType(9), None)

// ComplexType
widenTest(NullType,
MapType(IntegerType, StringType, false),
Expand Down Expand Up @@ -962,6 +984,22 @@ class TypeCoercionSuite extends TypeCoercionSuiteBase {
new StructType().add("a", StringType),
new StructType().add("a", IntegerType),
Some(new StructType().add("a", StringType)))

// Nanosecond-precision timestamp types (SPARK-57454).
widenTestWithStringPromotion(
TimestampType, TimestampLTZNanosType(9), Some(TimestampLTZNanosType(9)))
widenTestWithStringPromotion(
TimestampLTZNanosType(7), TimestampNTZNanosType(9), Some(TimestampLTZNanosType(9)))
widenTestWithStringPromotion(
DateType, TimestampNTZNanosType(7), Some(TimestampNTZNanosType(7)))
widenTestWithoutStringPromotion(
TimestampType, TimestampLTZNanosType(9), Some(TimestampLTZNanosType(9)))
widenTestWithoutStringPromotion(
ArrayType(TimestampType), ArrayType(TimestampNTZNanosType(8)),
Some(ArrayType(TimestampLTZNanosType(8))))
// nanos <-> string promotes to string with promotion, no common type without it.
widenTestWithStringPromotion(StringType, TimestampLTZNanosType(9), Some(StringType))
widenTestWithoutStringPromotion(StringType, TimestampNTZNanosType(9), None)
}

test("cast NullType for expressions that implement ExpectsInputTypes") {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -762,3 +762,141 @@ SELECT unix_nanos(NULL :: timestamp_ltz(9))
-- !query analysis
Project [unix_nanos(cast(null as timestamp_ltz(9))) AS unix_nanos(CAST(NULL AS TIMESTAMP_LTZ(9)))#x]
+- OneRowRelation


-- !query
SELECT typeof(c), c FROM (
SELECT TIMESTAMP_LTZ '0001-01-01 00:00:00' AS c
UNION ALL SELECT TIMESTAMP_LTZ '9999-12-31 23:59:59.999999999') ORDER BY c
-- !query analysis
[Analyzer test output redacted due to nondeterminism]


-- !query
SELECT typeof(c), c FROM (
SELECT '1582-10-04 12:30:45.1234567' :: timestamp_ltz(7) AS c
UNION ALL SELECT '1582-10-15 23:59:59.123456789' :: timestamp_ltz(9)) ORDER BY c
-- !query analysis
Sort [c#x ASC NULLS FIRST], true
+- Project [typeof(c#x) AS typeof(c)#x, c#x]
+- SubqueryAlias __auto_generated_subquery_name
+- Union false, false
:- Project [cast(c#x as timestamp_ltz(9)) AS c#x]
: +- Project [cast(1582-10-04 12:30:45.1234567 as timestamp_ltz(7)) AS c#x]
: +- OneRowRelation
+- Project [cast(1582-10-15 23:59:59.123456789 as timestamp_ltz(9)) AS CAST(1582-10-15 23:59:59.123456789 AS TIMESTAMP_LTZ(9))#x]
+- OneRowRelation


-- !query
SELECT typeof(v), v FROM (SELECT coalesce(
'1969-12-31 23:59:59.0000001 Asia/Kolkata' :: timestamp_ltz(7),
'1969-12-31 23:59:59.999999999 UTC' :: timestamp_ltz(9)) AS v)
-- !query analysis
Project [typeof(v#x) AS typeof(v)#x, v#x]
+- SubqueryAlias __auto_generated_subquery_name
+- Project [coalesce(cast(cast(1969-12-31 23:59:59.0000001 Asia/Kolkata as timestamp_ltz(7)) as timestamp_ltz(9)), cast(1969-12-31 23:59:59.999999999 UTC as timestamp_ltz(9))) AS v#x]
+- OneRowRelation


-- !query
SELECT typeof(v), v FROM (SELECT CASE WHEN true
THEN TIMESTAMP_LTZ '2026-06-21 10:16:30 Asia/Kathmandu'
ELSE '2026-06-21 10:16:30.987654321 UTC' :: timestamp_ltz(9) END AS v)
-- !query analysis
[Analyzer test output redacted due to nondeterminism]


-- !query
SELECT typeof(v), v FROM (SELECT coalesce(
DATE '0001-01-01', '2020-01-01 00:00:00.12345678' :: timestamp_ltz(8)) AS v)
-- !query analysis
[Analyzer test output redacted due to nondeterminism]


-- !query
SELECT typeof(greatest(TIMESTAMP_LTZ '0001-01-01 00:00:00',
'9999-12-31 23:59:59.999999999' :: timestamp_ltz(9)))
-- !query analysis
[Analyzer test output redacted due to nondeterminism]


-- !query
SELECT greatest(TIMESTAMP_LTZ '1500-03-01 12:00:00',
'1582-10-15 00:00:00.123456789' :: timestamp_ltz(9),
TIMESTAMP_LTZ '2026-06-21 10:16:30.5')
-- !query analysis
[Analyzer test output redacted due to nondeterminism]


-- !query
SELECT least('1970-01-01 00:00:00.0000001' :: timestamp_ltz(7),
'1969-12-31 23:59:59.999999999' :: timestamp_ltz(9))
-- !query analysis
Project [least(cast(cast(1970-01-01 00:00:00.0000001 as timestamp_ltz(7)) as timestamp_ltz(9)), cast(1969-12-31 23:59:59.999999999 as timestamp_ltz(9))) AS least(CAST(1970-01-01 00:00:00.0000001 AS TIMESTAMP_LTZ(7)), CAST(1969-12-31 23:59:59.999999999 AS TIMESTAMP_LTZ(9)))#x]
+- OneRowRelation


-- !query
SELECT array('0001-01-01 00:00:00.0000001' :: timestamp_ltz(7),
TIMESTAMP_LTZ '2026-06-21 10:16:30 Asia/Kolkata',
'9999-12-31 23:59:59.999999999' :: timestamp_ltz(9))
-- !query analysis
[Analyzer test output redacted due to nondeterminism]


-- !query
SELECT typeof(array(TIMESTAMP_LTZ '9999-12-31 23:59:59',
'0001-01-01 00:00:00.000000001' :: timestamp_ltz(9)))
-- !query analysis
[Analyzer test output redacted due to nondeterminism]


-- !query
SELECT map('min', '0001-01-01 00:00:00.000000001' :: timestamp_ltz(9),
'max', TIMESTAMP_LTZ '9999-12-31 23:59:59.999999')
-- !query analysis
[Analyzer test output redacted due to nondeterminism]


-- !query
SELECT typeof(v), v FROM (SELECT coalesce(
TIMESTAMP_NTZ '2026-06-21 10:16:30.123456789',
'1970-01-01 00:00:00.000000001 UTC' :: timestamp_ltz(9)) AS v)
-- !query analysis
Project [typeof(v#x) AS typeof(v)#x, v#x]
+- SubqueryAlias __auto_generated_subquery_name
+- Project [coalesce(cast(2026-06-21 10:16:30.123456789 as timestamp_ltz(9)), cast(1970-01-01 00:00:00.000000001 UTC as timestamp_ltz(9))) AS v#x]
+- OneRowRelation


-- !query
SELECT typeof(c) FROM (
SELECT TIMESTAMP_NTZ '1582-10-15 00:00:00' AS c
UNION ALL SELECT '9999-12-31 23:59:59.999999999' :: timestamp_ltz(9))
-- !query analysis
Project [typeof(c#x) AS typeof(c)#x]
+- SubqueryAlias __auto_generated_subquery_name
+- Union false, false
:- Project [cast(c#x as timestamp_ltz(9)) AS c#x]
: +- Project [1582-10-15 00:00:00 AS c#x]
: +- OneRowRelation
+- Project [cast(9999-12-31 23:59:59.999999999 as timestamp_ltz(9)) AS CAST(9999-12-31 23:59:59.999999999 AS TIMESTAMP_LTZ(9))#x]
+- OneRowRelation


-- !query
SELECT typeof(coalesce('0001-01-01 00:00:00.0000001' :: timestamp_ntz(7),
'2026-06-21 10:16:30.123456789 UTC' :: timestamp_ltz(9)))
-- !query analysis
Project [typeof(coalesce(cast(cast(0001-01-01 00:00:00.0000001 as timestamp_ntz(7)) as timestamp_ltz(9)), cast(2026-06-21 10:16:30.123456789 UTC as timestamp_ltz(9)))) AS typeof(coalesce(CAST(0001-01-01 00:00:00.0000001 AS TIMESTAMP_NTZ(7)), CAST(2026-06-21 10:16:30.123456789 UTC AS TIMESTAMP_LTZ(9))))#x]
+- OneRowRelation


-- !query
SELECT typeof(CASE WHEN true
THEN '1969-12-31 23:59:59.1234567' :: timestamp_ntz(7)
ELSE '1970-01-01 00:00:00.123456789 UTC' :: timestamp_ltz(9) END)
-- !query analysis
Project [typeof(CASE WHEN true THEN cast(cast(1969-12-31 23:59:59.1234567 as timestamp_ntz(7)) as timestamp_ltz(9)) ELSE cast(1970-01-01 00:00:00.123456789 UTC as timestamp_ltz(9)) END) AS typeof(CASE WHEN true THEN CAST(1969-12-31 23:59:59.1234567 AS TIMESTAMP_NTZ(7)) ELSE CAST(1970-01-01 00:00:00.123456789 UTC AS TIMESTAMP_LTZ(9)) END)#x]
+- OneRowRelation
Loading