[SPARK-57526][SQL] Add the `timestamp_nanos` function to create nanosecond-precision timestamps from numeric nanoseconds by MaxGekk · Pull Request #56616 · apache/spark

MaxGekk · 2026-06-19T15:56:38Z

What changes were proposed in this pull request?

Adds a built-in timestamp_nanos(expr) function. It reads expr as a count of nanoseconds since 1970-01-01 00:00:00 UTC and returns a nanosecond-precision TIMESTAMP_LTZ(9) — the natural inverse of unix_nanos.

The argument is an integral or DECIMAL count. DECIMAL is what lets it reach the whole [0001, 9999] calendar range, since year-9999 nanoseconds (~2.5e20) overflow a 64-bit BIGINT — the same reason unix_nanos returns DECIMAL(21, 0). FLOAT/DOUBLE/STRING are rejected at analysis (a fractional or string nanosecond count isn't meaningful), and a count outside the representable range fails with the DATETIME_OVERFLOW error condition.

Implementation: a new NanosToTimestamp expression in datetimeExpressions.scala (interpreted + codegen), registered in FunctionRegistry, and exposed as functions.timestamp_nanos in the shared sql/api module so the Scala Spark Connect client picks it up automatically. PySpark and R are out of scope and tracked as follow-ups; timestamp_nanos is on the PySpark function-parity allowlist meanwhile.

Follow-up: the peer timestamp_seconds/timestamp_millis/timestamp_micros still throw a raw ArithmeticException on overflow; migrating them to DATETIME_OVERFLOW is tracked in SPARK-57577.

Why are the changes needed?

Part of the SPARK-56822 umbrella (nanosecond-precision timestamps). Spark has timestamp_seconds / timestamp_millis / timestamp_micros but no nanosecond counterpart.

Does this PR introduce any user-facing change?

Yes — a new timestamp_nanos(expr) function in SQL and the Scala API (including the Scala Spark Connect client), returning TIMESTAMP_LTZ(9). This is a change only within the unreleased nanosecond-timestamp preview.

SELECT timestamp_nanos(1230219000123456789);
-- 2008-12-25 07:30:00.123456789

How was this patch tested?

build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'
build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'
build/sbt 'sql/testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite org.apache.spark.sql.ExpressionsSchemaSuite'
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'
./dev/scalastyle

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

…econd-precision timestamps from numeric nanoseconds ### What changes were proposed in this pull request? This PR adds a new built-in function `timestamp_nanos(expr)` that interprets `expr` as the number of nanoseconds since `1970-01-01 00:00:00 UTC` and returns a nanosecond-precision `TIMESTAMP_LTZ(9)`. Concretely: - Adds a `NanosToTimestamp` expression in `datetimeExpressions.scala`. It declares a single `DECIMAL` input type with `ImplicitCastInputTypes`, so integral arguments are coerced to their natural decimal automatically while `DECIMAL` arguments are accepted as-is. - Maps the nanosecond count `N` to the internal `(epochMicros, nanosWithinMicro)` pair with floor semantics (`epochMicros = floorDiv(N, 1000)`, `nanosWithinMicro = floorMod(N, 1000)`, always in `[0, 999]`), computed via `BigInteger` in both the interpreted (`eval`) and codegen (`doGenCode`) paths. `longValueExact` throws `ArithmeticException` when the value is outside the representable timestamp range. - A `DECIMAL` input (rather than `BIGINT`) is required to reach the full `[0001, 9999]` calendar range: nanoseconds for year 9999 (~2.5e20) overflow a 64-bit `BIGINT`, the same reason the inverse `unix_nanos` returns `DECIMAL(21, 0)`. - Registers `timestamp_nanos` in `FunctionRegistry` and adds the Scala `functions.timestamp_nanos`. - Adds catalyst unit tests (interpreted + codegen, full-range and round-trip with `unix_nanos`, overflow), Scala/SQL end-to-end tests, and SQL golden-file coverage. Scope notes: the PySpark API (classic and Spark Connect Python) and R are out of scope here and tracked as follow-ups; `timestamp_nanos` is recorded in the PySpark function-parity allowlist in the meantime. The Scala Spark Connect client picks up `timestamp_nanos` automatically because `functions.scala` lives in the shared `sql/api` module. ### Why are the changes needed? Part of the [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) umbrella (timestamps with nanosecond precision). Spark has `timestamp_seconds` / `timestamp_millis` / `timestamp_micros` but no nanosecond counterpart, which is the natural inverse of `unix_nanos`. ### Does this PR introduce _any_ user-facing change? Yes. A new `timestamp_nanos(expr)` function is available in SQL and the Scala API (including the Scala Spark Connect client). It returns `TIMESTAMP_LTZ(9)`. This is a change only within the unreleased nanosecond-timestamp preview. Example: ```sql SELECT timestamp_nanos(1230219000123456789); -- 2008-12-25 07:30:00.123456789 ``` ### How was this patch tested? - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'` - `build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'` - `build/sbt 'sql/testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite org.apache.spark.sql.ExpressionsSchemaSuite'` - `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'` - `./dev/scalastyle` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor

… analysis `NanosToTimestamp` declared `inputTypes = Seq(DecimalType)` with `ImplicitCastInputTypes`, which silently coerced FLOAT/DOUBLE/STRING to DECIMAL(14,7)/(30,15)/(38,18). Those targets hold far fewer integer digits than a realistic nanosecond count, so a finite FLOAT/DOUBLE argument overflowed the coerced decimal and yielded NULL (ANSI off) or an overflow error (ANSI on) instead of a timestamp -- contrary to the documented "accepted and floored" behavior. Switch to `ExpectsInputTypes` with `Seq(TypeCollection(IntegralType, DecimalType))` so only integral and DECIMAL nanosecond counts are accepted; FLOAT/DOUBLE/STRING now fail at analysis with a clear DATATYPE_MISMATCH, matching the "count of time units" semantics of timestamp_micros/millis. The interpreted and codegen paths widen an integral argument to BigInteger directly and keep the DECIMAL floor path unchanged. Add catalyst coverage for the integral path and the FLOAT/DOUBLE/STRING rejection, a SQL rejection case, and regenerate the golden files. Co-authored-by: Isaac

…ow and add negative tests `NanosToTimestamp` let `BigInteger.longValueExact()` throw a raw `java.lang.ArithmeticException` when `epochMicros` overflows a 64-bit long. Surface it instead as a proper Spark error condition: add `QueryExecutionErrors.timestampNanosOverflowError`, which raises a `SparkArithmeticException` with the `DATETIME_OVERFLOW` condition (SQLSTATE 22008), and catch/rethrow in both the interpreted and codegen paths. Strengthen the negative coverage: the catalyst FLOAT/DOUBLE/STRING rejection now asserts the `UNEXPECTED_INPUT_TYPE` `DataTypeMismatch` (not just `isFailure`), the overflow test asserts the `DATETIME_OVERFLOW` condition via `checkErrorInExpression`, and a SQL golden case exercises the runtime overflow end-to-end. Regenerate the golden files. Co-authored-by: Isaac

uros-b · 2026-06-19T23:53:06Z

+    checkAnswer(sqlRes, Row(instant))
+    assert(sqlRes.schema.head.dataType === TimestampLTZNanosType(9))
+
+    // A BIGINT argument is implicitly cast to DECIMAL, so the integral literal works directly.


Nit: This comment seems inaccurate; the expression uses ExpectsInputTypes (not ImplicitCastInputTypes), so a BIGINT is not cast to DECIMAL — it goes through the dedicated IntegralType path (BigInteger.valueOf(... longValue())).

Good catch, fixed in e81da36. The comment was left over from the original ImplicitCastInputTypes + Seq(DecimalType) design; updated it to describe the dedicated IntegralType path (widened to BigInteger, no DECIMAL cast).

uros-b · 2026-06-19T23:55:04Z

+    val micros = try {
+      n.subtract(rem).divide(thousand).longValueExact()
+    } catch {
+      case _: ArithmeticException => throw QueryExecutionErrors.timestampNanosOverflowError(n)


One question here for my curiosity: Overflow guard only catches epochMicros not fitting in a 64-bit long, not the documented calendar range. This is consistent with timestamp_micros (which also does no calendar-range validation); so I'm wondering - is it intentional?

Inputs whose epochMicros fits in a long but represents a year > 9999 (or < 0001) — up to ~year 292471 — silently produce an out-of-range TimestampNanosVal, since fromParts validates only nanosWithinMicro.

Intentional. It matches the sibling timestamp_micros/timestamp_millis/timestamp_seconds, which likewise guard only the 64-bit boundary (Math.multiplyExact) and do not validate the [0001, 9999] calendar range, so an epochMicros that fits in a long but lands past year 9999 (up to the long-micros maximum, ~year 294247) yields an out-of-range value rather than an error. I added an inline comment in e81da36 documenting this so the behavior is explicit. I kept it consistent with the micro constructors rather than introducing calendar-range validation here; happy to add that in a follow-up if we'd prefer the stricter behavior across all of them.

uros-b · 2026-06-19T23:56:51Z

+    checkEvaluation(NanosToTimestamp(Literal(-1L)), nanosVal(-1L, 999))
+    checkEvaluation(NanosToTimestamp(Literal(1000)), nanosVal(1L, 0))


Nit about integral-width coverage: the catalyst test exercises Int (Literal(1000)) and Long, which is enough to cover the (long) $c codegen cast, but a TINYINT/SMALLINT case would fully nail the IntegralType branch.

Added TINYINT (Literal(2.toByte)) and SMALLINT (Literal(1000.toShort)) cases in e81da36 so every integral width exercises the (long) codegen cast.

- Fix a stale test comment that still claimed a BIGINT argument is implicitly cast to DECIMAL; after the switch to ExpectsInputTypes it goes through the dedicated IntegralType path (widened to BigInteger), so the comment is updated to match. - Document that, like timestamp_micros/millis/seconds, NanosToTimestamp does not validate the [0001, 9999] calendar range: only the 64-bit epochMicros boundary is guarded (counts up to ~year 294247 are accepted), which is intentional for consistency with the microsecond constructors. - Extend the catalyst IntegralType coverage with TINYINT (Byte) and SMALLINT (Short) literals so every integral width exercises the (long) codegen cast.

MaxGekk · 2026-06-20T21:10:53Z

@stevomitric @uros-b Could you look at the PR, please.

MaxGekk added 3 commits June 19, 2026 17:54

uros-b reviewed Jun 19, 2026

View reviewed changes

MaxGekk requested a review from uros-b June 20, 2026 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57526][SQL] Add the `timestamp_nanos` function to create nanosecond-precision timestamps from numeric nanoseconds#56616

[SPARK-57526][SQL] Add the `timestamp_nanos` function to create nanosecond-precision timestamps from numeric nanoseconds#56616
MaxGekk wants to merge 4 commits into
apache:masterfrom
MaxGekk:timestamp_nanos

MaxGekk commented Jun 19, 2026 •

edited

Loading

Uh oh!

uros-b Jun 19, 2026

Uh oh!

MaxGekk Jun 20, 2026

Uh oh!

uros-b Jun 19, 2026

Uh oh!

MaxGekk Jun 20, 2026

Uh oh!

uros-b Jun 19, 2026

Uh oh!

MaxGekk Jun 20, 2026

Uh oh!

MaxGekk commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		checkEvaluation(NanosToTimestamp(Literal(-1L)), nanosVal(-1L, 999))
		checkEvaluation(NanosToTimestamp(Literal(1000)), nanosVal(1L, 0))

Conversation

MaxGekk commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

uros-b Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MaxGekk commented Jun 19, 2026 •

edited

Loading