[SPARK-57574][PANDAS] Support the TIME data type in pandas API on Spark by marcuslin123 · Pull Request #56635 · apache/spark

marcuslin123 · 2026-06-20T23:10:11Z

What changes were proposed in this pull request?

Add support for TimeType columns in pandas API on Spark (pyspark.pandas):

Map datetime.time to TimeType in the dtype translation layer (typehints.py)
Map TimeType to np.dtype("object") for pandas representation
Create TimeOps class for column operations (comparisons supported, arithmetic rejected)
Register TimeOps in the dispatch system (base.py)

Why are the changes needed?

pyspark.pandas does not handle TimeType — the Spark-to-pandas dtype machinery treats datetime.time as a generic object with no explicit mapping. Without these changes, creating a pandas-on-Spark DataFrame with datetime.time values fails, and column operations on TIME columns crash with TypeError.

The underlying Arrow conversion already supports TIME (SPARK-53263 / SPARK-53305), so this wires up the remaining pyspark.pandas layer.

Does this PR introduce any user-facing change?

Yes. Users can now work with TimeType columns in pyspark.pandas:

import pyspark.pandas as ps
import datetime

df = ps.DataFrame({"shift_start": [datetime.time(8, 0), datetime.time(14, 0)]})
df["shift_start"].dtype  # returns object
afternoon = df[df["shift_start"] > datetime.time(12, 0)]  # comparisons work

Previously this would fail with a TypeError or produce incorrect results.

How was this patch tested?

Added datetime.time mapping to test_typedef.py
Added new test_time_ops.py covering arithmetic rejection and comparison operations
All tests pass locally: python/run-tests --testnames pyspark.pandas.tests.test_typedef and python/run-tests --testnames pyspark.pandas.tests.data_type_ops.test_time_ops

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (used as an assistive tool for implementation guidance)

MaxGekk

3 blocking, 2 non-blocking, 0 nits.
The implementation faithfully mirrors the verified DateOps analogue and looks correct. The blockers are all test wiring/coverage: the new tests don't run in CI, there's no Spark Connect parity test, and the custom astype is untested.

Design / architecture (3)

dev/sparktestsupport/modules.py (~line 905): test_time_ops is not registered, so CI never collects or runs it (discovery uses explicit goal lists, not globbing). Add it next to test_date_ops/test_datetime_ops. [blocking]
new file python/pyspark/pandas/tests/connect/data_type_ops/test_parity_time_ops.py: no Spark Connect parity test. Every peer data_type_ops test has a ~10-line test_parity_* subclass registered under the connect module (modules.py:1340-1341); without one, TimeOps is untested under Spark Connect. [blocking]
python/docs/source/tutorial/pandas_on_spark/types.rst (~line 190): add the datetime.time → TimeType row next to the existing datetime.date → DateType. [non-blocking]

Correctness (2)

time_ops.py:62: custom astype has no test — see inline. [blocking]
test_time_ops.py:27: coverage gaps vs the DateOps suite (eq/ne, isnull, value round-trip, mixed-type TypeError) — see inline. [non-blocking]

MaxGekk · 2026-06-21T04:49:21Z

+        _sanitize_list_like(right)
+        return column_op(PySparkColumn.__gt__)(left, right)
+
+    def astype(self, index_ops: IndexOpsLike, dtype: Union[str, type, Dtype]) -> IndexOpsLike:


astype is the only custom (non-inherited) logic in TimeOps — categorical / bool / string / other branches — but the suite has no test_astype. test_date_ops.py:190 tests astype(str), astype(bool), and a categorical cast; please mirror it.

The string branch is the one to watch: null_str=str(None) plus Spark CAST(TIME AS STRING) is exactly where pandas-vs-Spark formatting can diverge for sub-second precision (pandas str(time(.., 500000)) → "...:00.500000" vs Spark "...:00.5"). A test_astype would confirm or refute this.

MaxGekk · 2026-06-21T04:49:21Z

+from pyspark.pandas.tests.data_type_ops.testing_utils import OpsTestBase
+
+
+class TimeOpsTestsMixin:


This suite covers arithmetic rejection and the four ordering comparisons, but is missing cases the peer DateOpsTestsMixin has:

test_eq / test_ne — eq/ne are inherited and reachable for TimeType but never exercised here.

test_isnull.

test_from_to_pandas — nothing asserts the spark→pandas round-trip of actual TIME values (the new TimeType → object mapping); the comparison tests only assert boolean results.

The peer comparison tests also assert that a pandas-Series RHS raises TypeError (e.g. self.assertRaises(TypeError, lambda: psdf["this"] == pdf["this"])); worth adding here too.

…st_astype

…s, TypeError assertions, docs update

…, not arithmetic)

HyukjinKwon · 2026-06-21T23:50:07Z

 bool              BooleanType
 datetime.datetime TimestampType
 datetime.date     DateType
+datetime.time     TimeType


We should probably make it properly supported in PySpark itself first.

marcus added 3 commits June 20, 2026 16:53

WIP: Add datetime.time <-> TimeType dtype mapping in pyspark.pandas

1d00b13

WIP: Add TimeOps class and register in dispatch for pyspark.pandas

60707bc

[SPARK-57574][PANDAS] Support the TIME data type in pandas API on Spark

d121c72

MaxGekk reviewed Jun 21, 2026

View reviewed changes

marcus added 3 commits June 21, 2026 11:24

Address review: register tests in CI, add Connect parity test, add te…

5e3c0db

…st_astype

Address review: add test_eq, test_ne, test_isnull, test_from_to_panda…

a464138

…s, TypeError assertions, docs update

Fix test_rmod: remove string modulo assertion (% on str is formatting…

c1de5fd

…, not arithmetic)

HyukjinKwon reviewed Jun 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57574][PANDAS] Support the TIME data type in pandas API on Spark#56635

[SPARK-57574][PANDAS] Support the TIME data type in pandas API on Spark#56635
marcuslin123 wants to merge 6 commits into
apache:masterfrom
marcuslin123:SPARK-57574-time-type-pandas

marcuslin123 commented Jun 20, 2026 •

edited

Loading

Uh oh!

MaxGekk left a comment

Uh oh!

MaxGekk Jun 21, 2026

Uh oh!

MaxGekk Jun 21, 2026

Uh oh!

HyukjinKwon Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from pyspark.pandas.tests.data_type_ops.testing_utils import OpsTestBase


		class TimeOpsTestsMixin:

Conversation

marcuslin123 commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Design / architecture (3)

Correctness (2)

Uh oh!

MaxGekk Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

marcuslin123 commented Jun 20, 2026 •

edited

Loading