Skip to content

[SPARK-57575][SQL] Support TIME type in to_char/to_varchar formatting#56637

Open
shrirangmhalgi wants to merge 3 commits into
apache:masterfrom
shrirangmhalgi:SPARK-57575-time-to-char
Open

[SPARK-57575][SQL] Support TIME type in to_char/to_varchar formatting#56637
shrirangmhalgi wants to merge 3 commits into
apache:masterfrom
shrirangmhalgi:SPARK-57575-time-to-char

Conversation

@shrirangmhalgi

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Extend DateFormatClass (used by to_char / to_varchar / date_format) to accept TimeType input. When the input is TimeType, the expression uses TimeFormatter to format the time-of-day value instead of TimestampFormatter. Date-only pattern fields (e.g., yyyy-MM-dd) are rejected at runtime with a clear error.

Why are the changes needed?

to_char and to_varchar accept DateType and TimestampType but not TimeType. With the TIME data type now supported in Spark, users need a way to format time values to strings using standard datetime patterns (e.g., to_char(time_col, 'HH:mm:ss')).

Does this PR introduce any user-facing change?

Yes. to_char(time_value, format) and to_varchar(time_value, format) now work for TimeType inputs, formatting time-of-day fields (HH, mm, ss, fractional seconds). Passing date-only patterns to a TIME input raises an error.

How was this patch tested?

  • Unit tests in DateExpressionsSuite (formatting, null handling, date-pattern rejection)
  • SQL golden file tests in datetime-formatting.sql verifying to_char and to_varchar with TIME literals
  • All existing DateFormat tests pass (no regressions)

Was this patch authored or co-authored using generative AI tooling?

Yes. Co-Authored using Claude Opus 4.6

@shrirangmhalgi

Copy link
Copy Markdown
Contributor Author

@MaxGekk could you please review the PR. It is adding TimeType support to to_char / to_varchar using TimeFormatter, with date-pattern rejection and golden file tests.

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 blocking, 1 non-blocking, 1 nit.
Solid feature with correct value handling and golden coverage. One blocking item: declaring TIME input as the concrete TimeType(6) silently restricts the feature to a single precision.

Design / architecture (1)

  • datetimeExpressions.scala:1142: TIME input declared as concrete TimeType(6) instead of AnyTimeType; non-6 precisions fail at analysis — see inline

Correctness (1)

  • DateExpressionsSuite.scala:348: date-pattern rejection raises a raw java.time exception (not a "clear error") and the test under-pins it — see inline

Nits: 1 minor item (see inline comments).

Verification

Traced the input-type coercion for a non-6 TIME. ToCharacterBuilder routes any DatetimeType to DateFormatClass, so TIME(3) reaches it. TypeCollection(TimestampType, TimeType(6)) does not accept TIME(3) by exact match, so ImplicitTypeCasts tries the members in order and coerces it to the first one, TimestampType (canANSIStoreAssign/TypeCoercion treat any DatetimeType->DatetimeType as castable). But TIME->TIMESTAMP has no case in canAnsiCast/canCast, so that inserted cast is invalid and the query fails at analysis. TIME(6) works only because it matches the member's exact precision; AnyTimeType accepts all precisions directly with no cast.


override def inputTypes: Seq[AbstractDataType] =
Seq(TimestampType, StringTypeWithCollation(supportsTrimCollation = true))
Seq(TypeCollection(TimestampType, TimeType(TimeType.DEFAULT_PRECISION)), StringTypeWithCollation(supportsTrimCollation = true))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This accepts only TIME(6). A TIME value of any other precision (TIME(3), current_time(3), a CAST(... AS TIME(0)) column) doesn't match by exact precision, so it gets implicitly coerced to TimestampType (the first collection member) — and TIME -> TIMESTAMP isn't a valid cast, so the query fails at analysis instead of formatting. Every other TIME-accepting expression (MinutesOfTime, HoursOfTime, TimeFromMicros) uses AnyTimeType, which accepts all precisions 0-6 directly.

Suggested change
Seq(TypeCollection(TimestampType, TimeType(TimeType.DEFAULT_PRECISION)), StringTypeWithCollation(supportsTrimCollation = true))
Seq(TypeCollection(TimestampType, AnyTimeType), StringTypeWithCollation(supportsTrimCollation = true))

Worth adding a regression test on a non-6 precision input — every current test and golden query uses a TIME'...' literal, which defaults to TIME(6), so this gap is untested today.


// Date-only pattern fields should error for TIME input
val datePatternExpr = DateFormatClass(timeLit, Literal("yyyy-MM-dd"), UTC_OPT)
intercept[Exception] {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intercept[Exception] passes on any throwable and only exercises the interpreted eval, not codegen. The exception it catches is a raw java.time DateTimeException from LocalTime.format (the pattern's syntax is valid, so validatePatternString passes and the failure only surfaces at format time) — there's no Spark error class, so the PR description's "clear error" is overstated. Consider rejecting date-only patterns with a proper Spark error, then pin this test to that specific exception/message and cover the codegen path too.


test("SPARK-57575: DateFormat with TimeType (to_char/to_varchar)") {
// 12:13:14 = (12*3600 + 13*60 + 14) seconds, stored as nanoseconds
val timeMicros = (12L * 3600 + 13 * 60 + 14) * 1000000000L

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeMicros holds a nanosecond value (the comment says "stored as nanoseconds", and TimeType is encoded in nanos-since-midnight). Rename to timeNanos to avoid misleading the next reader.

@shrirangmhalgi

Copy link
Copy Markdown
Contributor Author

Thanks @MaxGekk for the review. All addressed in the latest commit:

  • Switched to AnyTimeType from TimeType. Good catch, TIME(3) / TIME(0) would have failed at analysis otherwise.
  • Pinned the test to DateTimeException and added a TIME(0) precision regression test
  • Renamed timeMicrostimeNanos

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 addressed, 1 remaining, 0 new.
Prior blocking finding (declare TIME input as AnyTimeType) is fixed; variable renamed to timeNanos; TIME(0) regression test added. Value handling and golden coverage are correct.

Correctness (1)

  • datetimeExpressions.scala:1152 (follow-up on the prior thread, remaining/partial): date-only-pattern rejection still throws a raw java.time.DateTimeException (no Spark error class) and is asserted only on the interpreted path — fix proposal inline. Non-blocking.

Verification

Re-confirmed the prior blocking fix: AnyTimeType.acceptsType matches every TimeType precision, so TIME(0)/TIME(3) now bind directly with no implicit cast. Also checked the newly-reachable optimizer rule SimplifyDateTimeConversions: both rewrite cases require a GetTimestamp(...TimestampType...) child, which a TIME-typed DateFormatClass never is, so TIME is structurally gated out — no wrong rewrite.

case _: TimeType =>
val tf = timeFormatterOption.getOrElse(
TimeFormatter(format.toString, isParsing = false))
UTF8String.fromString(tf.format(timestamp.asInstanceOf[Long]))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up on the prior thread (now that the exception type is pinned). A date-only pattern like yyyy-MM-dd is syntactically valid, so validatePatternString() passes at construction and this format call throws a raw java.time.UnsupportedTemporalTypeException at runtime — there's no Spark error class, so the PR description's "rejected at runtime with a clear error" is still overstated, and the test pins it only on the interpreted .eval path.

Centralizing the TIME format in one helper that both nullSafeEval and doGenCode call maps it to a proper error and covers codegen for free:

def formatTime(tf: TimeFormatter, nanos: Long, pattern: String): UTF8String =
  try UTF8String.fromString(tf.format(nanos))
  catch { case e: DateTimeException =>
    throw QueryExecutionErrors.invalidPatternError("to_char", pattern, e) }

QueryExecutionErrors.invalidPatternError already exists (INVALID_PARAMETER_VALUE.PATTERN) — no new error class needed. Then swap the test's interpreted-only intercept[...] { expr.eval(InternalRow(...)) } for checkErrorInExpression[SparkRuntimeException](expr, "INVALID_PARAMETER_VALUE.PATTERN", ...), which runs interpreted and codegen. If a Spark error is out of scope here, the minimal alternative is to drop "clear error" from the description and use checkExceptionInExpression[DateTimeException] (which still covers codegen). Non-blocking — fine to defer.

@shrirangmhalgi shrirangmhalgi force-pushed the SPARK-57575-time-to-char branch from f12a132 to 948e69c Compare June 21, 2026 20:14
@shrirangmhalgi

Copy link
Copy Markdown
Contributor Author

Thanks @MaxGekk. Wrapped the date-pattern rejection in QueryExecutionErrors.invalidPatternError via a shared DateFormatClass.formatTimeWithError helper that both eval and codegen call. The test now uses checkErrorInExpression[SparkRuntimeException] for full path coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants