[SPARK-57579][PYTHON] Add PySpark support for unix_nanos function by jubins · Pull Request #56626 · apache/spark

jubins · 2026-06-20T00:40:27Z

What is the purpose of the change

Fixes SPARK-57579 (follow-up to SPARK-57527) — adds unix_nanos to the PySpark API (pyspark.sql.functions and PySpark Connect), completing the epoch-unit function family in Python.

The SQL function and Scala API were added in SPARK-57527, but Python support was explicitly deferred. The full family is:

Function	PySpark before this PR	PySpark after this PR
`unix_seconds`	present	present
`unix_millis`	present	present
`unix_micros`	present	present
`unix_nanos`	missing	added

The gap was acknowledged in the parity test (expected_missing_in_py) with a comment pointing to this follow-up.

Brief change log

python/pyspark/sql/functions/builtin.py: added unix_nanos(col) after unix_micros, decorated with @_try_remote_functions, with full docstring (versionadded:: 4.3.0, parameters, return type, See Also links, and two doctests covering a nanosecond-precision TIMESTAMP_NTZ input and a NULL input)
python/pyspark/sql/functions/__init__.py: exported unix_nanos in alphabetical order between unix_millis and unix_seconds
python/pyspark/sql/connect/functions/builtin.py: added Connect-side wrapper for unix_nanos inheriting its docstring from the main function, following the same pattern as unix_micros
python/pyspark/sql/tests/test_functions.py: removed "unix_nanos" from expected_missing_in_py (set is now empty)

Verifying this change

This change is covered by the existing parity test in FunctionsTestsMixin:

test_function_parity: previously allowlisted unix_nanos as an expected gap; removing it from expected_missing_in_py means the test will now fail if unix_nanos is ever missing from the Python API again
The two doctests in the unix_nanos docstring verify:
- A nanosecond-precision TIMESTAMP_NTZ input returns the correct DECIMAL(21, 0) nanosecond count
- A NULL input returns NULL

Does this pull request potentially affect one of the following parts

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public/@Evolving: yes — unix_nanos is a new public PySpark function
The serializers: no
The runtime per-record code paths (performance sensitive): no — this is a Python wrapper only; the JVM expression UnixNanos is unchanged
Anything that affects deployment or recovery: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? yes — pyspark.sql.functions.unix_nanos is a new public API

If yes, how is the feature documented? inline docstring with parameter description, return type, See Also links, and doctests in builtin.py

Was generative AI tooling used to co-author this PR?

Yes — Claude Code was used as a pair-programming assistant. All code was written, understood, and verified by the author.
Generated-by: Claude Sonnet 4.8

MaxGekk

1 blocking, 2 non-blocking, 0 nits.
The wrapper logic is correct and faithfully mirrors unix_micros. The blocker is a missing API-reference doc entry; the rest are minor.

Design / architecture (2)

python/docs/source/reference/pyspark.sql/functions.rst (~line 330): unix_nanos is not added to the API reference autosummary, where its siblings unix_micros/unix_millis/unix_seconds are listed. Without it the new public function won't appear in the rendered API docs. [blocking]
builtin.py:11753: placement breaks the family's alphabetical order (also in the Connect wrapper) — see inline. [non-blocking]

Correctness (1)

builtin.py:11779: doctests need spark.sql.timestampNanosTypes.enabled (off by default in prod) but don't set it — see inline. [non-blocking]

MaxGekk · 2026-06-21T05:00:14Z



+@_try_remote_functions
+def unix_nanos(col: "ColumnOrName") -> Column:


Minor: the unix_* family in this file is alphabetical (unix_date, unix_micros, unix_millis, unix_seconds), but unix_nanos is inserted between unix_micros and unix_millis. Placing it after unix_millis keeps the ordering — and matches __init__.py, where the export was correctly placed after unix_millis. Same applies to the Connect wrapper in connect/functions/builtin.py.

fixed, moved unix_nanos after unix_millis in both builtin.py and connect/functions/builtin.py.

MaxGekk · 2026-06-21T05:00:14Z

+
+    Examples
+    --------
+    >>> import pyspark.sql.functions as sf


Both doctests below use nanosecond-precision timestamp types (TIMESTAMP_NTZ '…123456789' and cast('timestamp_ntz(9)')), which only exist when spark.sql.timestampNanosTypes.enabled=true. That flag defaults to false in production (it's Utils.isTesting, so on only under tests), so these doctests pass in CI but a user running the rendered example in a default session hits an error — with the flag off the 9-digit literal isn't a nanos type, and UnixNanos (inputTypes = AnyTimestampNanoType) rejects a micros timestamp at analysis.

The Scala UnixNanos example handles this by prefixing SET spark.sql.timestampNanosTypes.enabled=true. Suggest doing the equivalent here so the example is reproducible:

Suggested change

>>> import pyspark.sql.functions as sf

>>> import pyspark.sql.functions as sf

>>> spark.conf.set("spark.sql.timestampNanosTypes.enabled", True)

(and >>> spark.conf.unset("spark.sql.timestampNanosTypes.enabled") at the end of the Examples block).

good catch, thanks! added spark.conf.set("spark.sql.timestampNanosTypes.enabled", True) at the top of the Examples block and spark.conf.unset("spark.sql.timestampNanosTypes.enabled") at the end, matching the Scala example.

Fix incorrect ordering/assignments between unix_millis and unix_nanos: swap their implementations and docstrings in python/pyspark/sql/functions/builtin.py and the connect variant so each function now invokes the correct underlying name. Update doc examples and versionadded notes accordingly, add unix_nanos to the functions index in docs (functions.rst), and adjust example usage (time zone and timestamp-nanos feature flags) to reflect the correct semantics.

jubins · 2026-06-21T05:56:00Z

Thanks for the review @MaxGekk! Addressed all three comments:

Added unix_nanos to the API reference autosummary in functions.rst between unix_millis and unix_seconds.
Moved unix_nanos after unix_millis in both builtin.py and connect/functions/builtin.py to restore alphabetical order.
Added spark.conf.set("spark.sql.timestampNanosTypes.enabled", True) at the start of the Examples block and spark.conf.unset("spark.sql.timestampNanosTypes.enabled") at the end, matching the Scala example.

uros-b · 2026-06-21T14:25:44Z

+    Examples
+    --------
+    >>> import pyspark.sql.functions as sf
+    >>> spark.conf.set("spark.sql.timestampNanosTypes.enabled", True)


Nit: The new doctest uses a Python bool value (True), but every other doctest in builtin.py uses the string form, e.g. spark.conf.set("spark.sql.ansi.enabled", "true"). It works (the value is coerced to "True" and the boolean conf parser is case-insensitive), but "true" would be more consistent with the file's convention.

uros-b

Thank you @jubins and @MaxGekk! Left one more nit, otherwise LGTM.

MaxGekk

3 addressed, 0 remaining, 0 new.
All three prior findings are resolved in 3cfeefe: the API-reference entry was added to functions.rst (was blocking), unix_nanos is now in alphabetical order across builtin.py / the Connect wrapper / init.py, and the doctests now set+unset spark.sql.timestampNanosTypes.enabled so the example is reproducible in a default session. No new findings.

Verification

Confirmed the docstring contract matches the JVM expression: UnixNanos accepts only TIMESTAMP_LTZ/NTZ(p) (p in [7,9]) via AnyTimestampNanoType and returns DECIMAL(21,0). Checked the doctest expected value (1577885075123456789 = 1577885075s * 1e9 + 123456789) and that versionadded:: 4.3.0 matches the branch-4.x next release.

jubins added 2 commits June 19, 2026 17:39

[SPARK-57579][PYTHON] Add PySpark support for unix_nanos function

c5fb0bd

Attempt to fix failing tests

2c526ec

MaxGekk reviewed Jun 21, 2026

View reviewed changes

uros-b reviewed Jun 21, 2026

View reviewed changes

uros-b approved these changes Jun 21, 2026

View reviewed changes

MaxGekk approved these changes Jun 21, 2026

View reviewed changes

MaxGekk requested a review from HyukjinKwon June 21, 2026 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57579][PYTHON] Add PySpark support for unix_nanos function#56626

[SPARK-57579][PYTHON] Add PySpark support for unix_nanos function#56626
jubins wants to merge 3 commits into
apache:masterfrom
jubins:j-SPARK-57579-unix-nanos-pyspark-support

jubins commented Jun 20, 2026 •

edited

Loading

Uh oh!

MaxGekk left a comment

Uh oh!

MaxGekk Jun 21, 2026

Uh oh!

jubins Jun 21, 2026

Uh oh!

MaxGekk Jun 21, 2026

Uh oh!

jubins Jun 21, 2026

Uh oh!

jubins commented Jun 21, 2026

Uh oh!

uros-b Jun 21, 2026

Uh oh!

uros-b left a comment

Uh oh!

MaxGekk left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@_try_remote_functions
		def unix_nanos(col: "ColumnOrName") -> Column:

	>>> import pyspark.sql.functions as sf
	>>> import pyspark.sql.functions as sf
	>>> spark.conf.set("spark.sql.timestampNanosTypes.enabled", True)

Conversation

jubins commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts

Documentation

Was generative AI tooling used to co-author this PR?

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Design / architecture (2)

Correctness (1)

Uh oh!

MaxGekk Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

jubins Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

jubins Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

jubins commented Jun 21, 2026

Uh oh!

uros-b Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jubins commented Jun 20, 2026 •

edited

Loading