[SPARK-57579][PYTHON] Add PySpark support for unix_nanos function#56626
[SPARK-57579][PYTHON] Add PySpark support for unix_nanos function#56626jubins wants to merge 3 commits into
Conversation
MaxGekk
left a comment
There was a problem hiding this comment.
1 blocking, 2 non-blocking, 0 nits.
The wrapper logic is correct and faithfully mirrors unix_micros. The blocker is a missing API-reference doc entry; the rest are minor.
Design / architecture (2)
python/docs/source/reference/pyspark.sql/functions.rst(~line 330):unix_nanosis not added to the API reference autosummary, where its siblingsunix_micros/unix_millis/unix_secondsare listed. Without it the new public function won't appear in the rendered API docs. [blocking]- builtin.py:11753: placement breaks the family's alphabetical order (also in the Connect wrapper) — see inline. [non-blocking]
Correctness (1)
- builtin.py:11779: doctests need
spark.sql.timestampNanosTypes.enabled(off by default in prod) but don't set it — see inline. [non-blocking]
|
|
||
|
|
||
| @_try_remote_functions | ||
| def unix_nanos(col: "ColumnOrName") -> Column: |
There was a problem hiding this comment.
Minor: the unix_* family in this file is alphabetical (unix_date, unix_micros, unix_millis, unix_seconds), but unix_nanos is inserted between unix_micros and unix_millis. Placing it after unix_millis keeps the ordering — and matches __init__.py, where the export was correctly placed after unix_millis. Same applies to the Connect wrapper in connect/functions/builtin.py.
There was a problem hiding this comment.
fixed, moved unix_nanos after unix_millis in both builtin.py and connect/functions/builtin.py.
|
|
||
| Examples | ||
| -------- | ||
| >>> import pyspark.sql.functions as sf |
There was a problem hiding this comment.
Both doctests below use nanosecond-precision timestamp types (TIMESTAMP_NTZ '…123456789' and cast('timestamp_ntz(9)')), which only exist when spark.sql.timestampNanosTypes.enabled=true. That flag defaults to false in production (it's Utils.isTesting, so on only under tests), so these doctests pass in CI but a user running the rendered example in a default session hits an error — with the flag off the 9-digit literal isn't a nanos type, and UnixNanos (inputTypes = AnyTimestampNanoType) rejects a micros timestamp at analysis.
The Scala UnixNanos example handles this by prefixing SET spark.sql.timestampNanosTypes.enabled=true. Suggest doing the equivalent here so the example is reproducible:
| >>> import pyspark.sql.functions as sf | |
| >>> import pyspark.sql.functions as sf | |
| >>> spark.conf.set("spark.sql.timestampNanosTypes.enabled", True) |
(and >>> spark.conf.unset("spark.sql.timestampNanosTypes.enabled") at the end of the Examples block).
There was a problem hiding this comment.
good catch, thanks! added spark.conf.set("spark.sql.timestampNanosTypes.enabled", True) at the top of the Examples block and spark.conf.unset("spark.sql.timestampNanosTypes.enabled") at the end, matching the Scala example.
Fix incorrect ordering/assignments between unix_millis and unix_nanos: swap their implementations and docstrings in python/pyspark/sql/functions/builtin.py and the connect variant so each function now invokes the correct underlying name. Update doc examples and versionadded notes accordingly, add unix_nanos to the functions index in docs (functions.rst), and adjust example usage (time zone and timestamp-nanos feature flags) to reflect the correct semantics.
|
Thanks for the review @MaxGekk! Addressed all three comments:
|
| Examples | ||
| -------- | ||
| >>> import pyspark.sql.functions as sf | ||
| >>> spark.conf.set("spark.sql.timestampNanosTypes.enabled", True) |
There was a problem hiding this comment.
Nit: The new doctest uses a Python bool value (True), but every other doctest in builtin.py uses the string form, e.g. spark.conf.set("spark.sql.ansi.enabled", "true"). It works (the value is coerced to "True" and the boolean conf parser is case-insensitive), but "true" would be more consistent with the file's convention.
MaxGekk
left a comment
There was a problem hiding this comment.
3 addressed, 0 remaining, 0 new.
All three prior findings are resolved in 3cfeefe: the API-reference entry was added to functions.rst (was blocking), unix_nanos is now in alphabetical order across builtin.py / the Connect wrapper / init.py, and the doctests now set+unset spark.sql.timestampNanosTypes.enabled so the example is reproducible in a default session. No new findings.
Verification
Confirmed the docstring contract matches the JVM expression: UnixNanos accepts only TIMESTAMP_LTZ/NTZ(p) (p in [7,9]) via AnyTimestampNanoType and returns DECIMAL(21,0). Checked the doctest expected value (1577885075123456789 = 1577885075s * 1e9 + 123456789) and that versionadded:: 4.3.0 matches the branch-4.x next release.
What is the purpose of the change
Fixes SPARK-57579 (follow-up to SPARK-57527) — adds
unix_nanosto the PySpark API (pyspark.sql.functionsand PySpark Connect), completing the epoch-unit function family in Python.The SQL function and Scala API were added in SPARK-57527, but Python support was explicitly deferred. The full family is:
unix_secondsunix_millisunix_microsunix_nanosThe gap was acknowledged in the parity test (
expected_missing_in_py) with a comment pointing to this follow-up.Brief change log
python/pyspark/sql/functions/builtin.py: addedunix_nanos(col)afterunix_micros, decorated with@_try_remote_functions, with full docstring (versionadded:: 4.3.0, parameters, return type, See Also links, and two doctests covering a nanosecond-precisionTIMESTAMP_NTZinput and aNULLinput)python/pyspark/sql/functions/__init__.py: exportedunix_nanosin alphabetical order betweenunix_millisandunix_secondspython/pyspark/sql/connect/functions/builtin.py: added Connect-side wrapper forunix_nanosinheriting its docstring from the main function, following the same pattern asunix_microspython/pyspark/sql/tests/test_functions.py: removed"unix_nanos"fromexpected_missing_in_py(set is now empty)Verifying this change
This change is covered by the existing parity test in
FunctionsTestsMixin:test_function_parity: previously allowlistedunix_nanosas an expected gap; removing it fromexpected_missing_in_pymeans the test will now fail ifunix_nanosis ever missing from the Python API againunix_nanosdocstring verify:TIMESTAMP_NTZinput returns the correctDECIMAL(21, 0)nanosecond countNULLinput returnsNULLDoes this pull request potentially affect one of the following parts
@Public/@Evolving: yes —unix_nanosis a new public PySpark functionUnixNanosis unchangedDocumentation
Does this pull request introduce a new feature? yes —
pyspark.sql.functions.unix_nanosis a new public APIIf yes, how is the feature documented? inline docstring with parameter description, return type, See Also links, and doctests in
builtin.pyWas generative AI tooling used to co-author this PR?
Generated-by: Claude Sonnet 4.8