Skip to content

[SPARK-57579][PYTHON] Add PySpark support for unix_nanos function#56626

Open
jubins wants to merge 3 commits into
apache:masterfrom
jubins:j-SPARK-57579-unix-nanos-pyspark-support
Open

[SPARK-57579][PYTHON] Add PySpark support for unix_nanos function#56626
jubins wants to merge 3 commits into
apache:masterfrom
jubins:j-SPARK-57579-unix-nanos-pyspark-support

Conversation

@jubins

@jubins jubins commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

What is the purpose of the change

Fixes SPARK-57579 (follow-up to SPARK-57527) — adds unix_nanos to the PySpark API (pyspark.sql.functions and PySpark Connect), completing the epoch-unit function family in Python.

The SQL function and Scala API were added in SPARK-57527, but Python support was explicitly deferred. The full family is:

Function PySpark before this PR PySpark after this PR
unix_seconds present present
unix_millis present present
unix_micros present present
unix_nanos missing added

The gap was acknowledged in the parity test (expected_missing_in_py) with a comment pointing to this follow-up.

Brief change log

  • python/pyspark/sql/functions/builtin.py: added unix_nanos(col) after unix_micros, decorated with @_try_remote_functions, with full docstring (versionadded:: 4.3.0, parameters, return type, See Also links, and two doctests covering a nanosecond-precision TIMESTAMP_NTZ input and a NULL input)
  • python/pyspark/sql/functions/__init__.py: exported unix_nanos in alphabetical order between unix_millis and unix_seconds
  • python/pyspark/sql/connect/functions/builtin.py: added Connect-side wrapper for unix_nanos inheriting its docstring from the main function, following the same pattern as unix_micros
  • python/pyspark/sql/tests/test_functions.py: removed "unix_nanos" from expected_missing_in_py (set is now empty)

Verifying this change

This change is covered by the existing parity test in FunctionsTestsMixin:

  • test_function_parity: previously allowlisted unix_nanos as an expected gap; removing it from expected_missing_in_py means the test will now fail if unix_nanos is ever missing from the Python API again
  • The two doctests in the unix_nanos docstring verify:
    • A nanosecond-precision TIMESTAMP_NTZ input returns the correct DECIMAL(21, 0) nanosecond count
    • A NULL input returns NULL

Does this pull request potentially affect one of the following parts

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public/@Evolving: yesunix_nanos is a new public PySpark function
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no — this is a Python wrapper only; the JVM expression UnixNanos is unchanged
  • Anything that affects deployment or recovery: no
  • The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? yespyspark.sql.functions.unix_nanos is a new public API

If yes, how is the feature documented? inline docstring with parameter description, return type, See Also links, and doctests in builtin.py

Was generative AI tooling used to co-author this PR?

  • Yes — Claude Code was used as a pair-programming assistant. All code was written, understood, and verified by the author.
    Generated-by: Claude Sonnet 4.8

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 blocking, 2 non-blocking, 0 nits.
The wrapper logic is correct and faithfully mirrors unix_micros. The blocker is a missing API-reference doc entry; the rest are minor.

Design / architecture (2)

  • python/docs/source/reference/pyspark.sql/functions.rst (~line 330): unix_nanos is not added to the API reference autosummary, where its siblings unix_micros/unix_millis/unix_seconds are listed. Without it the new public function won't appear in the rendered API docs. [blocking]
  • builtin.py:11753: placement breaks the family's alphabetical order (also in the Connect wrapper) — see inline. [non-blocking]

Correctness (1)

  • builtin.py:11779: doctests need spark.sql.timestampNanosTypes.enabled (off by default in prod) but don't set it — see inline. [non-blocking]

Comment thread python/pyspark/sql/functions/builtin.py Outdated


@_try_remote_functions
def unix_nanos(col: "ColumnOrName") -> Column:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: the unix_* family in this file is alphabetical (unix_date, unix_micros, unix_millis, unix_seconds), but unix_nanos is inserted between unix_micros and unix_millis. Placing it after unix_millis keeps the ordering — and matches __init__.py, where the export was correctly placed after unix_millis. Same applies to the Connect wrapper in connect/functions/builtin.py.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, moved unix_nanos after unix_millis in both builtin.py and connect/functions/builtin.py.


Examples
--------
>>> import pyspark.sql.functions as sf

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both doctests below use nanosecond-precision timestamp types (TIMESTAMP_NTZ '…123456789' and cast('timestamp_ntz(9)')), which only exist when spark.sql.timestampNanosTypes.enabled=true. That flag defaults to false in production (it's Utils.isTesting, so on only under tests), so these doctests pass in CI but a user running the rendered example in a default session hits an error — with the flag off the 9-digit literal isn't a nanos type, and UnixNanos (inputTypes = AnyTimestampNanoType) rejects a micros timestamp at analysis.

The Scala UnixNanos example handles this by prefixing SET spark.sql.timestampNanosTypes.enabled=true. Suggest doing the equivalent here so the example is reproducible:

Suggested change
>>> import pyspark.sql.functions as sf
>>> import pyspark.sql.functions as sf
>>> spark.conf.set("spark.sql.timestampNanosTypes.enabled", True)

(and >>> spark.conf.unset("spark.sql.timestampNanosTypes.enabled") at the end of the Examples block).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks! added spark.conf.set("spark.sql.timestampNanosTypes.enabled", True) at the top of the Examples block and spark.conf.unset("spark.sql.timestampNanosTypes.enabled") at the end, matching the Scala example.

Fix incorrect ordering/assignments between unix_millis and unix_nanos: swap their implementations and docstrings in python/pyspark/sql/functions/builtin.py and the connect variant so each function now invokes the correct underlying name. Update doc examples and versionadded notes accordingly, add unix_nanos to the functions index in docs (functions.rst), and adjust example usage (time zone and timestamp-nanos feature flags) to reflect the correct semantics.
@jubins

jubins commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review @MaxGekk! Addressed all three comments:

  • Added unix_nanos to the API reference autosummary in functions.rst between unix_millis and unix_seconds.
  • Moved unix_nanos after unix_millis in both builtin.py and connect/functions/builtin.py to restore alphabetical order.
  • Added spark.conf.set("spark.sql.timestampNanosTypes.enabled", True) at the start of the Examples block and spark.conf.unset("spark.sql.timestampNanosTypes.enabled") at the end, matching the Scala example.

Examples
--------
>>> import pyspark.sql.functions as sf
>>> spark.conf.set("spark.sql.timestampNanosTypes.enabled", True)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The new doctest uses a Python bool value (True), but every other doctest in builtin.py uses the string form, e.g. spark.conf.set("spark.sql.ansi.enabled", "true"). It works (the value is coerced to "True" and the boolean conf parser is case-insensitive), but "true" would be more consistent with the file's convention.

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jubins and @MaxGekk! Left one more nit, otherwise LGTM.

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 addressed, 0 remaining, 0 new.
All three prior findings are resolved in 3cfeefe: the API-reference entry was added to functions.rst (was blocking), unix_nanos is now in alphabetical order across builtin.py / the Connect wrapper / init.py, and the doctests now set+unset spark.sql.timestampNanosTypes.enabled so the example is reproducible in a default session. No new findings.

Verification

Confirmed the docstring contract matches the JVM expression: UnixNanos accepts only TIMESTAMP_LTZ/NTZ(p) (p in [7,9]) via AnyTimestampNanoType and returns DECIMAL(21,0). Checked the doctest expected value (1577885075123456789 = 1577885075s * 1e9 + 123456789) and that versionadded:: 4.3.0 matches the branch-4.x next release.

@MaxGekk MaxGekk requested a review from HyukjinKwon June 21, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants